Class Word2Vec.Builder
- java.lang.Object
-
- org.deeplearning4j.models.sequencevectors.SequenceVectors.Builder<VocabWord>
-
- org.deeplearning4j.models.word2vec.Word2Vec.Builder
-
- Direct Known Subclasses:
ParagraphVectors.Builder
- Enclosing class:
- Word2Vec
public static class Word2Vec.Builder extends SequenceVectors.Builder<VocabWord>
-
-
Field Summary
Fields Modifier and Type Field Description protected boolean
allowParallelTokenization
protected LabelAwareIterator
labelAwareIterator
protected SentenceIterator
sentenceIterator
protected TokenizerFactory
tokenizerFactory
-
Fields inherited from class org.deeplearning4j.models.sequencevectors.SequenceVectors.Builder
batchSize, configuration, elementsLearningAlgorithm, enableScavenger, existingVectors, hugeModelExpected, intersectVectors, iterations, iterator, layerSize, learningRate, learningRateDecayWords, lockFactor, lookupTable, minLearningRate, minWordFrequency, modelUtils, negative, numEpochs, preciseMode, preciseWeightInit, resetModel, sampling, seed, sequenceLearningAlgorithm, STOP, stopWords, trainElementsVectors, trainSequenceVectors, UNK, unknownElement, useAdaGrad, useHierarchicSoftmax, useUnknown, variableWindows, vectorsListeners, vocabCache, vocabLimit, window, workers
-
-
Constructor Summary
Constructors Constructor Description Builder()
Builder(@NonNull VectorsConfiguration configuration)
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description Word2Vec.Builder
allowParallelTokenization(boolean allow)
This method enables/disables parallel tokenization.Word2Vec.Builder
batchSize(int batchSize)
This method defines mini-batch sizeWord2Vec
build()
Build SequenceVectors instance with defined settings/optionsWord2Vec.Builder
elementsLearningAlgorithm(@NonNull String algorithm)
* Sets specific LearningAlgorithm as Elements Learning AlgorithmWord2Vec.Builder
elementsLearningAlgorithm(@NonNull ElementsLearningAlgorithm<VocabWord> algorithm)
* Sets specific LearningAlgorithm as Elements Learning AlgorithmWord2Vec.Builder
enableScavenger(boolean reallyEnable)
This method ebables/disables periodical vocab truncation during construction Default value: disabledWord2Vec.Builder
epochs(int numEpochs)
This method defines number of epochs (iterations over whole training corpus) for trainingWord2Vec.Builder
intersectModel(@NonNull SequenceVectors vectors, boolean isLocked)
Word2Vec.Builder
iterate(@NonNull SequenceIterator<VocabWord> iterator)
This method used to feed SequenceIterator, that contains training corpus, into ParagraphVectorsWord2Vec.Builder
iterate(@NonNull DocumentIterator iterator)
Word2Vec.Builder
iterate(@NonNull LabelAwareIterator iterator)
This method used to feed LabelAwareIterator, that is usually usedWord2Vec.Builder
iterate(@NonNull SentenceIterator iterator)
This method used to feed SentenceIterator, that contains training corpus, into ParagraphVectorsWord2Vec.Builder
iterations(int iterations)
This method defines number of iterations done for each mini-batch during trainingWord2Vec.Builder
layerSize(int layerSize)
This method defines number of dimensions for output vectorsWord2Vec.Builder
learningRate(double learningRate)
This method defines initial learning rate for model trainingWord2Vec.Builder
limitVocabularySize(int limit)
This method sets vocabulary limit during construction.Word2Vec.Builder
lookupTable(@NonNull WeightLookupTable<VocabWord> lookupTable)
This method allows to define external WeightLookupTable to be usedWord2Vec.Builder
minLearningRate(double minLearningRate)
This method defines minimal learning rate value for trainingWord2Vec.Builder
minWordFrequency(int minWordFrequency)
This method defines minimal word frequency in training corpus.Word2Vec.Builder
modelUtils(@NonNull ModelUtils<VocabWord> modelUtils)
Sets ModelUtils that gonna be used as provider for utility methods: similarity(), wordsNearest(), accuracy(), etcWord2Vec.Builder
negativeSample(double negative)
This method defines whether negative sampling should be used or not PLEASE NOTE: If you're going to use negative sampling, you might want to disable HierarchicSoftmax, which is enabled by default Default value: 0Word2Vec.Builder
resetModel(boolean reallyReset)
This method defines whether model should be totally wiped out prior building, or notWord2Vec.Builder
sampling(double sampling)
This method defines whether subsampling should be used or notWord2Vec.Builder
seed(long randomSeed)
This method defines random seed for random numbers generatorWord2Vec.Builder
setVectorsListeners(@NonNull Collection<VectorsListener<VocabWord>> vectorsListeners)
This method sets VectorsListeners for this SequenceVectors modelWord2Vec.Builder
stopWords(@NonNull Collection<VocabWord> stopList)
This method defines stop words that should be ignored during trainingWord2Vec.Builder
stopWords(@NonNull List<String> stopList)
This method defines stop words that should be ignored during trainingWord2Vec.Builder
tokenizerFactory(@NonNull TokenizerFactory tokenizerFactory)
This method defines TokenizerFactory to be used for strings tokenization during training PLEASE NOTE: If external VocabCache is used, the same TokenizerFactory should be used to keep derived tokens equal.Word2Vec.Builder
trainElementsRepresentation(boolean trainElements)
This method is hardcoded to TRUE, since that's whole point of Word2VecWord2Vec.Builder
trainSequencesRepresentation(boolean trainSequences)
This method is hardcoded to FALSE, since that's whole point of Word2VecWord2Vec.Builder
unknownElement(VocabWord element)
This method allows you to specify SequenceElement that will be used as UNK element, if UNK is usedWord2Vec.Builder
useAdaGrad(boolean reallyUse)
This method defines whether adaptive gradients should be used or notprotected Word2Vec.Builder
useExistingWordVectors(@NonNull WordVectors vec)
This method has no effect for Word2VecWord2Vec.Builder
useHierarchicSoftmax(boolean reallyUse)
This method enables/disables Hierarchic softmax Default value: enabledWord2Vec.Builder
usePreciseMode(boolean reallyUse)
Word2Vec.Builder
usePreciseWeightInit(boolean reallyUse)
If set to true, initial weights for elements/sequences will be derived from elements themself.Word2Vec.Builder
useUnknown(boolean reallyUse)
This method allows you to specify, if UNK word should be used internallyWord2Vec.Builder
useVariableWindow(int... windows)
This method allows to use variable window size.Word2Vec.Builder
vocabCache(@NonNull VocabCache<VocabWord> vocabCache)
This method allows to define external VocabCache to be usedWord2Vec.Builder
windowSize(int windowSize)
This method defines context window sizeWord2Vec.Builder
workers(int numWorkers)
This method defines maximum number of concurrent threads available for training-
Methods inherited from class org.deeplearning4j.models.sequencevectors.SequenceVectors.Builder
presetTables, sequenceLearningAlgorithm, sequenceLearningAlgorithm
-
-
-
-
Field Detail
-
sentenceIterator
protected SentenceIterator sentenceIterator
-
labelAwareIterator
protected LabelAwareIterator labelAwareIterator
-
tokenizerFactory
protected TokenizerFactory tokenizerFactory
-
allowParallelTokenization
protected boolean allowParallelTokenization
-
-
Constructor Detail
-
Builder
public Builder()
-
Builder
public Builder(@NonNull @NonNull VectorsConfiguration configuration)
-
-
Method Detail
-
useExistingWordVectors
protected Word2Vec.Builder useExistingWordVectors(@NonNull @NonNull WordVectors vec)
This method has no effect for Word2Vec- Overrides:
useExistingWordVectors
in classSequenceVectors.Builder<VocabWord>
- Parameters:
vec
- existing WordVectors model- Returns:
-
iterate
public Word2Vec.Builder iterate(@NonNull @NonNull DocumentIterator iterator)
-
iterate
public Word2Vec.Builder iterate(@NonNull @NonNull SentenceIterator iterator)
This method used to feed SentenceIterator, that contains training corpus, into ParagraphVectors- Parameters:
iterator
-- Returns:
-
tokenizerFactory
public Word2Vec.Builder tokenizerFactory(@NonNull @NonNull TokenizerFactory tokenizerFactory)
This method defines TokenizerFactory to be used for strings tokenization during training PLEASE NOTE: If external VocabCache is used, the same TokenizerFactory should be used to keep derived tokens equal.- Parameters:
tokenizerFactory
-- Returns:
-
iterate
public Word2Vec.Builder iterate(@NonNull @NonNull SequenceIterator<VocabWord> iterator)
This method used to feed SequenceIterator, that contains training corpus, into ParagraphVectors- Overrides:
iterate
in classSequenceVectors.Builder<VocabWord>
- Parameters:
iterator
-- Returns:
-
iterate
public Word2Vec.Builder iterate(@NonNull @NonNull LabelAwareIterator iterator)
This method used to feed LabelAwareIterator, that is usually used- Parameters:
iterator
-- Returns:
-
batchSize
public Word2Vec.Builder batchSize(int batchSize)
This method defines mini-batch size- Overrides:
batchSize
in classSequenceVectors.Builder<VocabWord>
- Parameters:
batchSize
-- Returns:
-
iterations
public Word2Vec.Builder iterations(int iterations)
This method defines number of iterations done for each mini-batch during training- Overrides:
iterations
in classSequenceVectors.Builder<VocabWord>
- Parameters:
iterations
-- Returns:
-
epochs
public Word2Vec.Builder epochs(int numEpochs)
This method defines number of epochs (iterations over whole training corpus) for training- Overrides:
epochs
in classSequenceVectors.Builder<VocabWord>
- Parameters:
numEpochs
-- Returns:
-
layerSize
public Word2Vec.Builder layerSize(int layerSize)
This method defines number of dimensions for output vectors- Overrides:
layerSize
in classSequenceVectors.Builder<VocabWord>
- Parameters:
layerSize
-- Returns:
-
learningRate
public Word2Vec.Builder learningRate(double learningRate)
This method defines initial learning rate for model training- Overrides:
learningRate
in classSequenceVectors.Builder<VocabWord>
- Parameters:
learningRate
-- Returns:
-
minWordFrequency
public Word2Vec.Builder minWordFrequency(int minWordFrequency)
This method defines minimal word frequency in training corpus. All words below this threshold will be removed prior model training- Overrides:
minWordFrequency
in classSequenceVectors.Builder<VocabWord>
- Parameters:
minWordFrequency
-- Returns:
-
minLearningRate
public Word2Vec.Builder minLearningRate(double minLearningRate)
This method defines minimal learning rate value for training- Overrides:
minLearningRate
in classSequenceVectors.Builder<VocabWord>
- Parameters:
minLearningRate
-- Returns:
-
resetModel
public Word2Vec.Builder resetModel(boolean reallyReset)
This method defines whether model should be totally wiped out prior building, or not- Overrides:
resetModel
in classSequenceVectors.Builder<VocabWord>
- Parameters:
reallyReset
-- Returns:
-
limitVocabularySize
public Word2Vec.Builder limitVocabularySize(int limit)
This method sets vocabulary limit during construction. Default value: 0. Means no limit- Overrides:
limitVocabularySize
in classSequenceVectors.Builder<VocabWord>
- Parameters:
limit
-- Returns:
-
vocabCache
public Word2Vec.Builder vocabCache(@NonNull @NonNull VocabCache<VocabWord> vocabCache)
This method allows to define external VocabCache to be used- Overrides:
vocabCache
in classSequenceVectors.Builder<VocabWord>
- Parameters:
vocabCache
-- Returns:
-
lookupTable
public Word2Vec.Builder lookupTable(@NonNull @NonNull WeightLookupTable<VocabWord> lookupTable)
This method allows to define external WeightLookupTable to be used- Overrides:
lookupTable
in classSequenceVectors.Builder<VocabWord>
- Parameters:
lookupTable
-- Returns:
-
sampling
public Word2Vec.Builder sampling(double sampling)
This method defines whether subsampling should be used or not- Overrides:
sampling
in classSequenceVectors.Builder<VocabWord>
- Parameters:
sampling
- set > 0 to subsampling argument, or 0 to disable- Returns:
-
useAdaGrad
public Word2Vec.Builder useAdaGrad(boolean reallyUse)
This method defines whether adaptive gradients should be used or not- Overrides:
useAdaGrad
in classSequenceVectors.Builder<VocabWord>
- Parameters:
reallyUse
-- Returns:
-
negativeSample
public Word2Vec.Builder negativeSample(double negative)
This method defines whether negative sampling should be used or not PLEASE NOTE: If you're going to use negative sampling, you might want to disable HierarchicSoftmax, which is enabled by default Default value: 0- Overrides:
negativeSample
in classSequenceVectors.Builder<VocabWord>
- Parameters:
negative
- set > 0 as negative sampling argument, or 0 to disable- Returns:
-
stopWords
public Word2Vec.Builder stopWords(@NonNull @NonNull List<String> stopList)
This method defines stop words that should be ignored during training- Overrides:
stopWords
in classSequenceVectors.Builder<VocabWord>
- Parameters:
stopList
-- Returns:
-
trainElementsRepresentation
public Word2Vec.Builder trainElementsRepresentation(boolean trainElements)
This method is hardcoded to TRUE, since that's whole point of Word2Vec- Overrides:
trainElementsRepresentation
in classSequenceVectors.Builder<VocabWord>
- Parameters:
trainElements
-- Returns:
-
trainSequencesRepresentation
public Word2Vec.Builder trainSequencesRepresentation(boolean trainSequences)
This method is hardcoded to FALSE, since that's whole point of Word2Vec- Overrides:
trainSequencesRepresentation
in classSequenceVectors.Builder<VocabWord>
- Parameters:
trainSequences
-- Returns:
-
stopWords
public Word2Vec.Builder stopWords(@NonNull @NonNull Collection<VocabWord> stopList)
This method defines stop words that should be ignored during training- Overrides:
stopWords
in classSequenceVectors.Builder<VocabWord>
- Parameters:
stopList
-- Returns:
-
windowSize
public Word2Vec.Builder windowSize(int windowSize)
This method defines context window size- Overrides:
windowSize
in classSequenceVectors.Builder<VocabWord>
- Parameters:
windowSize
-- Returns:
-
seed
public Word2Vec.Builder seed(long randomSeed)
This method defines random seed for random numbers generator- Overrides:
seed
in classSequenceVectors.Builder<VocabWord>
- Parameters:
randomSeed
-- Returns:
-
workers
public Word2Vec.Builder workers(int numWorkers)
This method defines maximum number of concurrent threads available for training- Overrides:
workers
in classSequenceVectors.Builder<VocabWord>
- Parameters:
numWorkers
-- Returns:
-
modelUtils
public Word2Vec.Builder modelUtils(@NonNull @NonNull ModelUtils<VocabWord> modelUtils)
Sets ModelUtils that gonna be used as provider for utility methods: similarity(), wordsNearest(), accuracy(), etc- Overrides:
modelUtils
in classSequenceVectors.Builder<VocabWord>
- Parameters:
modelUtils
- model utils to be used- Returns:
-
useVariableWindow
public Word2Vec.Builder useVariableWindow(int... windows)
This method allows to use variable window size. In this case, every batch gets processed using one of predefined window sizes- Overrides:
useVariableWindow
in classSequenceVectors.Builder<VocabWord>
- Parameters:
windows
-- Returns:
-
unknownElement
public Word2Vec.Builder unknownElement(VocabWord element)
This method allows you to specify SequenceElement that will be used as UNK element, if UNK is used- Overrides:
unknownElement
in classSequenceVectors.Builder<VocabWord>
- Parameters:
element
-- Returns:
-
useUnknown
public Word2Vec.Builder useUnknown(boolean reallyUse)
This method allows you to specify, if UNK word should be used internally- Overrides:
useUnknown
in classSequenceVectors.Builder<VocabWord>
- Parameters:
reallyUse
-- Returns:
-
setVectorsListeners
public Word2Vec.Builder setVectorsListeners(@NonNull @NonNull Collection<VectorsListener<VocabWord>> vectorsListeners)
This method sets VectorsListeners for this SequenceVectors model- Overrides:
setVectorsListeners
in classSequenceVectors.Builder<VocabWord>
- Parameters:
vectorsListeners
-- Returns:
-
elementsLearningAlgorithm
public Word2Vec.Builder elementsLearningAlgorithm(@NonNull @NonNull String algorithm)
Description copied from class:SequenceVectors.Builder
* Sets specific LearningAlgorithm as Elements Learning Algorithm- Overrides:
elementsLearningAlgorithm
in classSequenceVectors.Builder<VocabWord>
- Parameters:
algorithm
- fully qualified class name- Returns:
-
elementsLearningAlgorithm
public Word2Vec.Builder elementsLearningAlgorithm(@NonNull @NonNull ElementsLearningAlgorithm<VocabWord> algorithm)
Description copied from class:SequenceVectors.Builder
* Sets specific LearningAlgorithm as Elements Learning Algorithm- Overrides:
elementsLearningAlgorithm
in classSequenceVectors.Builder<VocabWord>
- Parameters:
algorithm
- ElementsLearningAlgorithm implementation- Returns:
-
allowParallelTokenization
public Word2Vec.Builder allowParallelTokenization(boolean allow)
This method enables/disables parallel tokenization. Default value: TRUE- Parameters:
allow
-- Returns:
-
enableScavenger
public Word2Vec.Builder enableScavenger(boolean reallyEnable)
This method ebables/disables periodical vocab truncation during construction Default value: disabled- Overrides:
enableScavenger
in classSequenceVectors.Builder<VocabWord>
- Parameters:
reallyEnable
-- Returns:
-
useHierarchicSoftmax
public Word2Vec.Builder useHierarchicSoftmax(boolean reallyUse)
This method enables/disables Hierarchic softmax Default value: enabled- Overrides:
useHierarchicSoftmax
in classSequenceVectors.Builder<VocabWord>
- Parameters:
reallyUse
-- Returns:
-
usePreciseWeightInit
public Word2Vec.Builder usePreciseWeightInit(boolean reallyUse)
Description copied from class:SequenceVectors.Builder
If set to true, initial weights for elements/sequences will be derived from elements themself. However, this implies additional cycle through input iterator. Default value: FALSE- Overrides:
usePreciseWeightInit
in classSequenceVectors.Builder<VocabWord>
- Returns:
-
usePreciseMode
public Word2Vec.Builder usePreciseMode(boolean reallyUse)
- Overrides:
usePreciseMode
in classSequenceVectors.Builder<VocabWord>
-
intersectModel
public Word2Vec.Builder intersectModel(@NonNull @NonNull SequenceVectors vectors, boolean isLocked)
- Overrides:
intersectModel
in classSequenceVectors.Builder<VocabWord>
-
build
public Word2Vec build()
Description copied from class:SequenceVectors.Builder
Build SequenceVectors instance with defined settings/options- Overrides:
build
in classSequenceVectors.Builder<VocabWord>
- Returns:
-
-