Class Word2Vec.Builder
- java.lang.Object
-
- org.deeplearning4j.models.sequencevectors.SequenceVectors.Builder<VocabWord>
-
- org.deeplearning4j.models.word2vec.Word2Vec.Builder
-
- Direct Known Subclasses:
ParagraphVectors.Builder
- Enclosing class:
- Word2Vec
public static class Word2Vec.Builder extends SequenceVectors.Builder<VocabWord>
-
-
Field Summary
Fields Modifier and Type Field Description protected booleanallowParallelTokenizationprotected LabelAwareIteratorlabelAwareIteratorprotected SentenceIteratorsentenceIteratorprotected TokenizerFactorytokenizerFactory-
Fields inherited from class org.deeplearning4j.models.sequencevectors.SequenceVectors.Builder
batchSize, configuration, elementsLearningAlgorithm, enableScavenger, existingVectors, hugeModelExpected, intersectVectors, iterations, iterator, layerSize, learningRate, learningRateDecayWords, lockFactor, lookupTable, minLearningRate, minWordFrequency, modelUtils, negative, numEpochs, preciseMode, preciseWeightInit, resetModel, sampling, seed, sequenceLearningAlgorithm, STOP, stopWords, trainElementsVectors, trainSequenceVectors, UNK, unknownElement, useAdaGrad, useHierarchicSoftmax, useUnknown, variableWindows, vectorsListeners, vocabCache, vocabLimit, window, workers
-
-
Constructor Summary
Constructors Constructor Description Builder()Builder(@NonNull VectorsConfiguration configuration)
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description Word2Vec.BuilderallowParallelTokenization(boolean allow)This method enables/disables parallel tokenization.Word2Vec.BuilderbatchSize(int batchSize)This method defines mini-batch sizeWord2Vecbuild()Build SequenceVectors instance with defined settings/optionsWord2Vec.BuilderelementsLearningAlgorithm(@NonNull String algorithm)* Sets specific LearningAlgorithm as Elements Learning AlgorithmWord2Vec.BuilderelementsLearningAlgorithm(@NonNull ElementsLearningAlgorithm<VocabWord> algorithm)* Sets specific LearningAlgorithm as Elements Learning AlgorithmWord2Vec.BuilderenableScavenger(boolean reallyEnable)This method ebables/disables periodical vocab truncation during construction Default value: disabledWord2Vec.Builderepochs(int numEpochs)This method defines number of epochs (iterations over whole training corpus) for trainingWord2Vec.BuilderintersectModel(@NonNull SequenceVectors vectors, boolean isLocked)Word2Vec.Builderiterate(@NonNull SequenceIterator<VocabWord> iterator)This method used to feed SequenceIterator, that contains training corpus, into ParagraphVectorsWord2Vec.Builderiterate(@NonNull DocumentIterator iterator)Word2Vec.Builderiterate(@NonNull LabelAwareIterator iterator)This method used to feed LabelAwareIterator, that is usually usedWord2Vec.Builderiterate(@NonNull SentenceIterator iterator)This method used to feed SentenceIterator, that contains training corpus, into ParagraphVectorsWord2Vec.Builderiterations(int iterations)This method defines number of iterations done for each mini-batch during trainingWord2Vec.BuilderlayerSize(int layerSize)This method defines number of dimensions for output vectorsWord2Vec.BuilderlearningRate(double learningRate)This method defines initial learning rate for model trainingWord2Vec.BuilderlimitVocabularySize(int limit)This method sets vocabulary limit during construction.Word2Vec.BuilderlookupTable(@NonNull WeightLookupTable<VocabWord> lookupTable)This method allows to define external WeightLookupTable to be usedWord2Vec.BuilderminLearningRate(double minLearningRate)This method defines minimal learning rate value for trainingWord2Vec.BuilderminWordFrequency(int minWordFrequency)This method defines minimal word frequency in training corpus.Word2Vec.BuildermodelUtils(@NonNull ModelUtils<VocabWord> modelUtils)Sets ModelUtils that gonna be used as provider for utility methods: similarity(), wordsNearest(), accuracy(), etcWord2Vec.BuildernegativeSample(double negative)This method defines whether negative sampling should be used or not PLEASE NOTE: If you're going to use negative sampling, you might want to disable HierarchicSoftmax, which is enabled by default Default value: 0Word2Vec.BuilderresetModel(boolean reallyReset)This method defines whether model should be totally wiped out prior building, or notWord2Vec.Buildersampling(double sampling)This method defines whether subsampling should be used or notWord2Vec.Builderseed(long randomSeed)This method defines random seed for random numbers generatorWord2Vec.BuildersetVectorsListeners(@NonNull Collection<VectorsListener<VocabWord>> vectorsListeners)This method sets VectorsListeners for this SequenceVectors modelWord2Vec.BuilderstopWords(@NonNull Collection<VocabWord> stopList)This method defines stop words that should be ignored during trainingWord2Vec.BuilderstopWords(@NonNull List<String> stopList)This method defines stop words that should be ignored during trainingWord2Vec.BuildertokenizerFactory(@NonNull TokenizerFactory tokenizerFactory)This method defines TokenizerFactory to be used for strings tokenization during training PLEASE NOTE: If external VocabCache is used, the same TokenizerFactory should be used to keep derived tokens equal.Word2Vec.BuildertrainElementsRepresentation(boolean trainElements)This method is hardcoded to TRUE, since that's whole point of Word2VecWord2Vec.BuildertrainSequencesRepresentation(boolean trainSequences)This method is hardcoded to FALSE, since that's whole point of Word2VecWord2Vec.BuilderunknownElement(VocabWord element)This method allows you to specify SequenceElement that will be used as UNK element, if UNK is usedWord2Vec.BuilderuseAdaGrad(boolean reallyUse)This method defines whether adaptive gradients should be used or notprotected Word2Vec.BuilderuseExistingWordVectors(@NonNull WordVectors vec)This method has no effect for Word2VecWord2Vec.BuilderuseHierarchicSoftmax(boolean reallyUse)This method enables/disables Hierarchic softmax Default value: enabledWord2Vec.BuilderusePreciseMode(boolean reallyUse)Word2Vec.BuilderusePreciseWeightInit(boolean reallyUse)If set to true, initial weights for elements/sequences will be derived from elements themself.Word2Vec.BuilderuseUnknown(boolean reallyUse)This method allows you to specify, if UNK word should be used internallyWord2Vec.BuilderuseVariableWindow(int... windows)This method allows to use variable window size.Word2Vec.BuildervocabCache(@NonNull VocabCache<VocabWord> vocabCache)This method allows to define external VocabCache to be usedWord2Vec.BuilderwindowSize(int windowSize)This method defines context window sizeWord2Vec.Builderworkers(int numWorkers)This method defines maximum number of concurrent threads available for training-
Methods inherited from class org.deeplearning4j.models.sequencevectors.SequenceVectors.Builder
presetTables, sequenceLearningAlgorithm, sequenceLearningAlgorithm
-
-
-
-
Field Detail
-
sentenceIterator
protected SentenceIterator sentenceIterator
-
labelAwareIterator
protected LabelAwareIterator labelAwareIterator
-
tokenizerFactory
protected TokenizerFactory tokenizerFactory
-
allowParallelTokenization
protected boolean allowParallelTokenization
-
-
Constructor Detail
-
Builder
public Builder()
-
Builder
public Builder(@NonNull @NonNull VectorsConfiguration configuration)
-
-
Method Detail
-
useExistingWordVectors
protected Word2Vec.Builder useExistingWordVectors(@NonNull @NonNull WordVectors vec)
This method has no effect for Word2Vec- Overrides:
useExistingWordVectorsin classSequenceVectors.Builder<VocabWord>- Parameters:
vec- existing WordVectors model- Returns:
-
iterate
public Word2Vec.Builder iterate(@NonNull @NonNull DocumentIterator iterator)
-
iterate
public Word2Vec.Builder iterate(@NonNull @NonNull SentenceIterator iterator)
This method used to feed SentenceIterator, that contains training corpus, into ParagraphVectors- Parameters:
iterator-- Returns:
-
tokenizerFactory
public Word2Vec.Builder tokenizerFactory(@NonNull @NonNull TokenizerFactory tokenizerFactory)
This method defines TokenizerFactory to be used for strings tokenization during training PLEASE NOTE: If external VocabCache is used, the same TokenizerFactory should be used to keep derived tokens equal.- Parameters:
tokenizerFactory-- Returns:
-
iterate
public Word2Vec.Builder iterate(@NonNull @NonNull SequenceIterator<VocabWord> iterator)
This method used to feed SequenceIterator, that contains training corpus, into ParagraphVectors- Overrides:
iteratein classSequenceVectors.Builder<VocabWord>- Parameters:
iterator-- Returns:
-
iterate
public Word2Vec.Builder iterate(@NonNull @NonNull LabelAwareIterator iterator)
This method used to feed LabelAwareIterator, that is usually used- Parameters:
iterator-- Returns:
-
batchSize
public Word2Vec.Builder batchSize(int batchSize)
This method defines mini-batch size- Overrides:
batchSizein classSequenceVectors.Builder<VocabWord>- Parameters:
batchSize-- Returns:
-
iterations
public Word2Vec.Builder iterations(int iterations)
This method defines number of iterations done for each mini-batch during training- Overrides:
iterationsin classSequenceVectors.Builder<VocabWord>- Parameters:
iterations-- Returns:
-
epochs
public Word2Vec.Builder epochs(int numEpochs)
This method defines number of epochs (iterations over whole training corpus) for training- Overrides:
epochsin classSequenceVectors.Builder<VocabWord>- Parameters:
numEpochs-- Returns:
-
layerSize
public Word2Vec.Builder layerSize(int layerSize)
This method defines number of dimensions for output vectors- Overrides:
layerSizein classSequenceVectors.Builder<VocabWord>- Parameters:
layerSize-- Returns:
-
learningRate
public Word2Vec.Builder learningRate(double learningRate)
This method defines initial learning rate for model training- Overrides:
learningRatein classSequenceVectors.Builder<VocabWord>- Parameters:
learningRate-- Returns:
-
minWordFrequency
public Word2Vec.Builder minWordFrequency(int minWordFrequency)
This method defines minimal word frequency in training corpus. All words below this threshold will be removed prior model training- Overrides:
minWordFrequencyin classSequenceVectors.Builder<VocabWord>- Parameters:
minWordFrequency-- Returns:
-
minLearningRate
public Word2Vec.Builder minLearningRate(double minLearningRate)
This method defines minimal learning rate value for training- Overrides:
minLearningRatein classSequenceVectors.Builder<VocabWord>- Parameters:
minLearningRate-- Returns:
-
resetModel
public Word2Vec.Builder resetModel(boolean reallyReset)
This method defines whether model should be totally wiped out prior building, or not- Overrides:
resetModelin classSequenceVectors.Builder<VocabWord>- Parameters:
reallyReset-- Returns:
-
limitVocabularySize
public Word2Vec.Builder limitVocabularySize(int limit)
This method sets vocabulary limit during construction. Default value: 0. Means no limit- Overrides:
limitVocabularySizein classSequenceVectors.Builder<VocabWord>- Parameters:
limit-- Returns:
-
vocabCache
public Word2Vec.Builder vocabCache(@NonNull @NonNull VocabCache<VocabWord> vocabCache)
This method allows to define external VocabCache to be used- Overrides:
vocabCachein classSequenceVectors.Builder<VocabWord>- Parameters:
vocabCache-- Returns:
-
lookupTable
public Word2Vec.Builder lookupTable(@NonNull @NonNull WeightLookupTable<VocabWord> lookupTable)
This method allows to define external WeightLookupTable to be used- Overrides:
lookupTablein classSequenceVectors.Builder<VocabWord>- Parameters:
lookupTable-- Returns:
-
sampling
public Word2Vec.Builder sampling(double sampling)
This method defines whether subsampling should be used or not- Overrides:
samplingin classSequenceVectors.Builder<VocabWord>- Parameters:
sampling- set > 0 to subsampling argument, or 0 to disable- Returns:
-
useAdaGrad
public Word2Vec.Builder useAdaGrad(boolean reallyUse)
This method defines whether adaptive gradients should be used or not- Overrides:
useAdaGradin classSequenceVectors.Builder<VocabWord>- Parameters:
reallyUse-- Returns:
-
negativeSample
public Word2Vec.Builder negativeSample(double negative)
This method defines whether negative sampling should be used or not PLEASE NOTE: If you're going to use negative sampling, you might want to disable HierarchicSoftmax, which is enabled by default Default value: 0- Overrides:
negativeSamplein classSequenceVectors.Builder<VocabWord>- Parameters:
negative- set > 0 as negative sampling argument, or 0 to disable- Returns:
-
stopWords
public Word2Vec.Builder stopWords(@NonNull @NonNull List<String> stopList)
This method defines stop words that should be ignored during training- Overrides:
stopWordsin classSequenceVectors.Builder<VocabWord>- Parameters:
stopList-- Returns:
-
trainElementsRepresentation
public Word2Vec.Builder trainElementsRepresentation(boolean trainElements)
This method is hardcoded to TRUE, since that's whole point of Word2Vec- Overrides:
trainElementsRepresentationin classSequenceVectors.Builder<VocabWord>- Parameters:
trainElements-- Returns:
-
trainSequencesRepresentation
public Word2Vec.Builder trainSequencesRepresentation(boolean trainSequences)
This method is hardcoded to FALSE, since that's whole point of Word2Vec- Overrides:
trainSequencesRepresentationin classSequenceVectors.Builder<VocabWord>- Parameters:
trainSequences-- Returns:
-
stopWords
public Word2Vec.Builder stopWords(@NonNull @NonNull Collection<VocabWord> stopList)
This method defines stop words that should be ignored during training- Overrides:
stopWordsin classSequenceVectors.Builder<VocabWord>- Parameters:
stopList-- Returns:
-
windowSize
public Word2Vec.Builder windowSize(int windowSize)
This method defines context window size- Overrides:
windowSizein classSequenceVectors.Builder<VocabWord>- Parameters:
windowSize-- Returns:
-
seed
public Word2Vec.Builder seed(long randomSeed)
This method defines random seed for random numbers generator- Overrides:
seedin classSequenceVectors.Builder<VocabWord>- Parameters:
randomSeed-- Returns:
-
workers
public Word2Vec.Builder workers(int numWorkers)
This method defines maximum number of concurrent threads available for training- Overrides:
workersin classSequenceVectors.Builder<VocabWord>- Parameters:
numWorkers-- Returns:
-
modelUtils
public Word2Vec.Builder modelUtils(@NonNull @NonNull ModelUtils<VocabWord> modelUtils)
Sets ModelUtils that gonna be used as provider for utility methods: similarity(), wordsNearest(), accuracy(), etc- Overrides:
modelUtilsin classSequenceVectors.Builder<VocabWord>- Parameters:
modelUtils- model utils to be used- Returns:
-
useVariableWindow
public Word2Vec.Builder useVariableWindow(int... windows)
This method allows to use variable window size. In this case, every batch gets processed using one of predefined window sizes- Overrides:
useVariableWindowin classSequenceVectors.Builder<VocabWord>- Parameters:
windows-- Returns:
-
unknownElement
public Word2Vec.Builder unknownElement(VocabWord element)
This method allows you to specify SequenceElement that will be used as UNK element, if UNK is used- Overrides:
unknownElementin classSequenceVectors.Builder<VocabWord>- Parameters:
element-- Returns:
-
useUnknown
public Word2Vec.Builder useUnknown(boolean reallyUse)
This method allows you to specify, if UNK word should be used internally- Overrides:
useUnknownin classSequenceVectors.Builder<VocabWord>- Parameters:
reallyUse-- Returns:
-
setVectorsListeners
public Word2Vec.Builder setVectorsListeners(@NonNull @NonNull Collection<VectorsListener<VocabWord>> vectorsListeners)
This method sets VectorsListeners for this SequenceVectors model- Overrides:
setVectorsListenersin classSequenceVectors.Builder<VocabWord>- Parameters:
vectorsListeners-- Returns:
-
elementsLearningAlgorithm
public Word2Vec.Builder elementsLearningAlgorithm(@NonNull @NonNull String algorithm)
Description copied from class:SequenceVectors.Builder* Sets specific LearningAlgorithm as Elements Learning Algorithm- Overrides:
elementsLearningAlgorithmin classSequenceVectors.Builder<VocabWord>- Parameters:
algorithm- fully qualified class name- Returns:
-
elementsLearningAlgorithm
public Word2Vec.Builder elementsLearningAlgorithm(@NonNull @NonNull ElementsLearningAlgorithm<VocabWord> algorithm)
Description copied from class:SequenceVectors.Builder* Sets specific LearningAlgorithm as Elements Learning Algorithm- Overrides:
elementsLearningAlgorithmin classSequenceVectors.Builder<VocabWord>- Parameters:
algorithm- ElementsLearningAlgorithm implementation- Returns:
-
allowParallelTokenization
public Word2Vec.Builder allowParallelTokenization(boolean allow)
This method enables/disables parallel tokenization. Default value: TRUE- Parameters:
allow-- Returns:
-
enableScavenger
public Word2Vec.Builder enableScavenger(boolean reallyEnable)
This method ebables/disables periodical vocab truncation during construction Default value: disabled- Overrides:
enableScavengerin classSequenceVectors.Builder<VocabWord>- Parameters:
reallyEnable-- Returns:
-
useHierarchicSoftmax
public Word2Vec.Builder useHierarchicSoftmax(boolean reallyUse)
This method enables/disables Hierarchic softmax Default value: enabled- Overrides:
useHierarchicSoftmaxin classSequenceVectors.Builder<VocabWord>- Parameters:
reallyUse-- Returns:
-
usePreciseWeightInit
public Word2Vec.Builder usePreciseWeightInit(boolean reallyUse)
Description copied from class:SequenceVectors.BuilderIf set to true, initial weights for elements/sequences will be derived from elements themself. However, this implies additional cycle through input iterator. Default value: FALSE- Overrides:
usePreciseWeightInitin classSequenceVectors.Builder<VocabWord>- Returns:
-
usePreciseMode
public Word2Vec.Builder usePreciseMode(boolean reallyUse)
- Overrides:
usePreciseModein classSequenceVectors.Builder<VocabWord>
-
intersectModel
public Word2Vec.Builder intersectModel(@NonNull @NonNull SequenceVectors vectors, boolean isLocked)
- Overrides:
intersectModelin classSequenceVectors.Builder<VocabWord>
-
build
public Word2Vec build()
Description copied from class:SequenceVectors.BuilderBuild SequenceVectors instance with defined settings/options- Overrides:
buildin classSequenceVectors.Builder<VocabWord>- Returns:
-
-