Class ParagraphVectors.Builder
- java.lang.Object
-
- org.deeplearning4j.models.sequencevectors.SequenceVectors.Builder<VocabWord>
-
- org.deeplearning4j.models.word2vec.Word2Vec.Builder
-
- org.deeplearning4j.models.paragraphvectors.ParagraphVectors.Builder
-
- Enclosing class:
- ParagraphVectors
public static class ParagraphVectors.Builder extends Word2Vec.Builder
-
-
Field Summary
Fields Modifier and Type Field Description protected DocumentIterator
docIter
protected LabelAwareIterator
labelAwareIterator
protected LabelsSource
labelsSource
-
Fields inherited from class org.deeplearning4j.models.word2vec.Word2Vec.Builder
allowParallelTokenization, sentenceIterator, tokenizerFactory
-
Fields inherited from class org.deeplearning4j.models.sequencevectors.SequenceVectors.Builder
batchSize, configuration, elementsLearningAlgorithm, enableScavenger, existingVectors, hugeModelExpected, intersectVectors, iterations, iterator, layerSize, learningRate, learningRateDecayWords, lockFactor, lookupTable, minLearningRate, minWordFrequency, modelUtils, negative, numEpochs, preciseMode, preciseWeightInit, resetModel, sampling, seed, sequenceLearningAlgorithm, STOP, stopWords, trainElementsVectors, trainSequenceVectors, UNK, unknownElement, useAdaGrad, useHierarchicSoftmax, useUnknown, variableWindows, vectorsListeners, vocabCache, vocabLimit, window, workers
-
-
Constructor Summary
Constructors Constructor Description Builder()
Builder(@NonNull VectorsConfiguration configuration)
-
Method Summary
All Methods Instance Methods Concrete Methods Deprecated Methods Modifier and Type Method Description ParagraphVectors.Builder
allowParallelTokenization(boolean allow)
This method enables/disables parallel tokenization.ParagraphVectors.Builder
batchSize(int batchSize)
This method defines mini-batch sizeParagraphVectors
build()
Build SequenceVectors instance with defined settings/optionsParagraphVectors.Builder
elementsLearningAlgorithm(String algorithm)
* Sets specific LearningAlgorithm as Elements Learning AlgorithmParagraphVectors.Builder
elementsLearningAlgorithm(ElementsLearningAlgorithm<VocabWord> algorithm)
* Sets specific LearningAlgorithm as Elements Learning AlgorithmParagraphVectors.Builder
enableScavenger(boolean reallyEnable)
This method ebables/disables periodical vocab truncation during construction Default value: disabledParagraphVectors.Builder
epochs(int numEpochs)
This method defines number of epochs (iterations over whole training corpus) for trainingParagraphVectors.Builder
iterate(@NonNull SequenceIterator<VocabWord> iterator)
This method used to feed SequenceIterator, that contains training corpus, into ParagraphVectorsParagraphVectors.Builder
iterate(@NonNull DocumentIterator iterator)
This method used to feed DocumentIterator, that contains training corpus, into ParagraphVectorsParagraphVectors.Builder
iterate(@NonNull LabelAwareDocumentIterator iterator)
This method used to feed LabelAwareDocumentIterator, that contains training corpus, into ParagraphVectorsParagraphVectors.Builder
iterate(@NonNull LabelAwareIterator iterator)
This method used to feed LabelAwareIterator, that contains training corpus, into ParagraphVectorsParagraphVectors.Builder
iterate(@NonNull LabelAwareSentenceIterator iterator)
This method used to feed LabelAwareSentenceIterator, that contains training corpus, into ParagraphVectorsParagraphVectors.Builder
iterate(@NonNull SentenceIterator iterator)
This method used to feed SentenceIterator, that contains training corpus, into ParagraphVectorsParagraphVectors.Builder
iterations(int iterations)
This method defines number of iterations done for each mini-batch during trainingParagraphVectors.Builder
labels(@NonNull List<String> labels)
Deprecated.ParagraphVectors.Builder
labelsSource(@NonNull LabelsSource source)
This method attaches pre-defined labels source to ParagraphVectorsParagraphVectors.Builder
layerSize(int layerSize)
This method defines number of dimensions for output vectorsParagraphVectors.Builder
learningRate(double learningRate)
This method defines initial learning rate for model trainingParagraphVectors.Builder
limitVocabularySize(int limit)
This method sets vocabulary limit during construction.ParagraphVectors.Builder
lookupTable(@NonNull WeightLookupTable<VocabWord> lookupTable)
This method allows to define external WeightLookupTable to be usedParagraphVectors.Builder
minLearningRate(double minLearningRate)
This method defines minimal learning rate value for trainingParagraphVectors.Builder
minWordFrequency(int minWordFrequency)
This method defines minimal word frequency in training corpus.ParagraphVectors.Builder
modelUtils(@NonNull ModelUtils<VocabWord> modelUtils)
Sets ModelUtils that gonna be used as provider for utility methods: similarity(), wordsNearest(), accuracy(), etcParagraphVectors.Builder
negativeSample(double negative)
This method defines whether negative sampling should be used or not PLEASE NOTE: If you're going to use negative sampling, you might want to disable HierarchicSoftmax, which is enabled by default Default value: 0ParagraphVectors.Builder
resetModel(boolean reallyReset)
This method defines whether model should be totally wiped out prior building, or notParagraphVectors.Builder
sampling(double sampling)
This method defines whether subsampling should be used or notParagraphVectors.Builder
seed(long randomSeed)
This method defines random seed for random numbers generatorParagraphVectors.Builder
sequenceLearningAlgorithm(String algorithm)
Sets specific LearningAlgorithm as Sequence Learning AlgorithmParagraphVectors.Builder
sequenceLearningAlgorithm(SequenceLearningAlgorithm<VocabWord> algorithm)
Sets specific LearningAlgorithm as Sequence Learning AlgorithmParagraphVectors.Builder
setVectorsListeners(@NonNull Collection<VectorsListener<VocabWord>> vectorsListeners)
This method sets VectorsListeners for this SequenceVectors modelParagraphVectors.Builder
stopWords(@NonNull Collection<VocabWord> stopList)
This method defines stop words that should be ignored during trainingParagraphVectors.Builder
stopWords(@NonNull List<String> stopList)
This method defines stop words that should be ignored during trainingParagraphVectors.Builder
tokenizerFactory(@NonNull TokenizerFactory tokenizerFactory)
This method defines TokenizerFactory to be used for strings tokenization during training PLEASE NOTE: If external VocabCache is used, the same TokenizerFactory should be used to keep derived tokens equal.ParagraphVectors.Builder
trainElementsRepresentation(boolean trainElements)
This method defines, if words representation should be build together with documents representations.ParagraphVectors.Builder
trainSequencesRepresentation(boolean trainSequences)
This method is hardcoded to TRUE, since that's whole point of ParagraphVectorsParagraphVectors.Builder
trainWordVectors(boolean trainElements)
This method defines, if words representations should be build together with documents representations.ParagraphVectors.Builder
unknownElement(VocabWord element)
This method allows you to specify SequenceElement that will be used as UNK element, if UNK is usedParagraphVectors.Builder
useAdaGrad(boolean reallyUse)
This method defines whether adaptive gradients should be used or notParagraphVectors.Builder
useExistingWordVectors(@NonNull WordVectors vec)
This method allows you to use pre-built WordVectors model (e.g.ParagraphVectors.Builder
useHierarchicSoftmax(boolean reallyUse)
This method enables/disables Hierarchic softmax Default value: enabledParagraphVectors.Builder
usePreciseWeightInit(boolean reallyUse)
If set to true, initial weights for elements/sequences will be derived from elements themself.ParagraphVectors.Builder
useUnknown(boolean reallyUse)
This method allows you to specify, if UNK word should be used internallyParagraphVectors.Builder
useVariableWindow(int... windows)
This method has no effect for ParagraphVectorsParagraphVectors.Builder
vocabCache(@NonNull VocabCache<VocabWord> vocabCache)
This method allows to define external VocabCache to be usedParagraphVectors.Builder
windowSize(int windowSize)
This method defines context window sizeParagraphVectors.Builder
workers(int numWorkers)
This method defines maximum number of concurrent threads available for training-
Methods inherited from class org.deeplearning4j.models.word2vec.Word2Vec.Builder
intersectModel, usePreciseMode
-
Methods inherited from class org.deeplearning4j.models.sequencevectors.SequenceVectors.Builder
presetTables
-
-
-
-
Field Detail
-
labelAwareIterator
protected LabelAwareIterator labelAwareIterator
-
labelsSource
protected LabelsSource labelsSource
-
docIter
protected DocumentIterator docIter
-
-
Constructor Detail
-
Builder
public Builder()
-
Builder
public Builder(@NonNull @NonNull VectorsConfiguration configuration)
-
-
Method Detail
-
useExistingWordVectors
public ParagraphVectors.Builder useExistingWordVectors(@NonNull @NonNull WordVectors vec)
This method allows you to use pre-built WordVectors model (e.g. Word2Vec) for ParagraphVectors. Existing model will be transferred into new model before training starts. PLEASE NOTE: Non-normalized model is recommended to use here.- Overrides:
useExistingWordVectors
in classWord2Vec.Builder
- Parameters:
vec
- existing WordVectors model- Returns:
-
trainWordVectors
public ParagraphVectors.Builder trainWordVectors(boolean trainElements)
This method defines, if words representations should be build together with documents representations.- Parameters:
trainElements
-- Returns:
-
labelsSource
public ParagraphVectors.Builder labelsSource(@NonNull @NonNull LabelsSource source)
This method attaches pre-defined labels source to ParagraphVectors- Parameters:
source
-- Returns:
-
labels
@Deprecated public ParagraphVectors.Builder labels(@NonNull @NonNull List<String> labels)
Deprecated.This method builds new LabelSource instance from labels. PLEASE NOTE: Order synchro between labels and input documents delegated to end-user. PLEASE NOTE: Due to order issues it's recommended to use label aware iterators instead.- Parameters:
labels
-- Returns:
-
iterate
public ParagraphVectors.Builder iterate(@NonNull @NonNull LabelAwareDocumentIterator iterator)
This method used to feed LabelAwareDocumentIterator, that contains training corpus, into ParagraphVectors- Parameters:
iterator
-- Returns:
-
iterate
public ParagraphVectors.Builder iterate(@NonNull @NonNull LabelAwareSentenceIterator iterator)
This method used to feed LabelAwareSentenceIterator, that contains training corpus, into ParagraphVectors- Parameters:
iterator
-- Returns:
-
iterate
public ParagraphVectors.Builder iterate(@NonNull @NonNull LabelAwareIterator iterator)
This method used to feed LabelAwareIterator, that contains training corpus, into ParagraphVectors- Overrides:
iterate
in classWord2Vec.Builder
- Parameters:
iterator
-- Returns:
-
iterate
public ParagraphVectors.Builder iterate(@NonNull @NonNull DocumentIterator iterator)
This method used to feed DocumentIterator, that contains training corpus, into ParagraphVectors- Overrides:
iterate
in classWord2Vec.Builder
- Parameters:
iterator
-- Returns:
-
iterate
public ParagraphVectors.Builder iterate(@NonNull @NonNull SentenceIterator iterator)
This method used to feed SentenceIterator, that contains training corpus, into ParagraphVectors- Overrides:
iterate
in classWord2Vec.Builder
- Parameters:
iterator
-- Returns:
-
modelUtils
public ParagraphVectors.Builder modelUtils(@NonNull @NonNull ModelUtils<VocabWord> modelUtils)
Sets ModelUtils that gonna be used as provider for utility methods: similarity(), wordsNearest(), accuracy(), etc- Overrides:
modelUtils
in classWord2Vec.Builder
- Parameters:
modelUtils
- model utils to be used- Returns:
-
limitVocabularySize
public ParagraphVectors.Builder limitVocabularySize(int limit)
This method sets vocabulary limit during construction. Default value: 0. Means no limit- Overrides:
limitVocabularySize
in classWord2Vec.Builder
- Parameters:
limit
-- Returns:
-
unknownElement
public ParagraphVectors.Builder unknownElement(VocabWord element)
This method allows you to specify SequenceElement that will be used as UNK element, if UNK is used- Overrides:
unknownElement
in classWord2Vec.Builder
- Parameters:
element
-- Returns:
-
allowParallelTokenization
public ParagraphVectors.Builder allowParallelTokenization(boolean allow)
This method enables/disables parallel tokenization. Default value: TRUE- Overrides:
allowParallelTokenization
in classWord2Vec.Builder
- Parameters:
allow
-- Returns:
-
useUnknown
public ParagraphVectors.Builder useUnknown(boolean reallyUse)
This method allows you to specify, if UNK word should be used internally- Overrides:
useUnknown
in classWord2Vec.Builder
- Parameters:
reallyUse
-- Returns:
-
enableScavenger
public ParagraphVectors.Builder enableScavenger(boolean reallyEnable)
This method ebables/disables periodical vocab truncation during construction Default value: disabled- Overrides:
enableScavenger
in classWord2Vec.Builder
- Parameters:
reallyEnable
-- Returns:
-
build
public ParagraphVectors build()
Description copied from class:SequenceVectors.Builder
Build SequenceVectors instance with defined settings/options- Overrides:
build
in classWord2Vec.Builder
- Returns:
-
tokenizerFactory
public ParagraphVectors.Builder tokenizerFactory(@NonNull @NonNull TokenizerFactory tokenizerFactory)
This method defines TokenizerFactory to be used for strings tokenization during training PLEASE NOTE: If external VocabCache is used, the same TokenizerFactory should be used to keep derived tokens equal.- Overrides:
tokenizerFactory
in classWord2Vec.Builder
- Parameters:
tokenizerFactory
-- Returns:
-
iterate
public ParagraphVectors.Builder iterate(@NonNull @NonNull SequenceIterator<VocabWord> iterator)
This method used to feed SequenceIterator, that contains training corpus, into ParagraphVectors- Overrides:
iterate
in classWord2Vec.Builder
- Parameters:
iterator
-- Returns:
-
batchSize
public ParagraphVectors.Builder batchSize(int batchSize)
This method defines mini-batch size- Overrides:
batchSize
in classWord2Vec.Builder
- Parameters:
batchSize
-- Returns:
-
iterations
public ParagraphVectors.Builder iterations(int iterations)
This method defines number of iterations done for each mini-batch during training- Overrides:
iterations
in classWord2Vec.Builder
- Parameters:
iterations
-- Returns:
-
epochs
public ParagraphVectors.Builder epochs(int numEpochs)
This method defines number of epochs (iterations over whole training corpus) for training- Overrides:
epochs
in classWord2Vec.Builder
- Parameters:
numEpochs
-- Returns:
-
layerSize
public ParagraphVectors.Builder layerSize(int layerSize)
This method defines number of dimensions for output vectors- Overrides:
layerSize
in classWord2Vec.Builder
- Parameters:
layerSize
-- Returns:
-
setVectorsListeners
public ParagraphVectors.Builder setVectorsListeners(@NonNull @NonNull Collection<VectorsListener<VocabWord>> vectorsListeners)
This method sets VectorsListeners for this SequenceVectors model- Overrides:
setVectorsListeners
in classWord2Vec.Builder
- Parameters:
vectorsListeners
-- Returns:
-
learningRate
public ParagraphVectors.Builder learningRate(double learningRate)
This method defines initial learning rate for model training- Overrides:
learningRate
in classWord2Vec.Builder
- Parameters:
learningRate
-- Returns:
-
minWordFrequency
public ParagraphVectors.Builder minWordFrequency(int minWordFrequency)
This method defines minimal word frequency in training corpus. All words below this threshold will be removed prior model training- Overrides:
minWordFrequency
in classWord2Vec.Builder
- Parameters:
minWordFrequency
-- Returns:
-
minLearningRate
public ParagraphVectors.Builder minLearningRate(double minLearningRate)
This method defines minimal learning rate value for training- Overrides:
minLearningRate
in classWord2Vec.Builder
- Parameters:
minLearningRate
-- Returns:
-
resetModel
public ParagraphVectors.Builder resetModel(boolean reallyReset)
This method defines whether model should be totally wiped out prior building, or not- Overrides:
resetModel
in classWord2Vec.Builder
- Parameters:
reallyReset
-- Returns:
-
vocabCache
public ParagraphVectors.Builder vocabCache(@NonNull @NonNull VocabCache<VocabWord> vocabCache)
This method allows to define external VocabCache to be used- Overrides:
vocabCache
in classWord2Vec.Builder
- Parameters:
vocabCache
-- Returns:
-
lookupTable
public ParagraphVectors.Builder lookupTable(@NonNull @NonNull WeightLookupTable<VocabWord> lookupTable)
This method allows to define external WeightLookupTable to be used- Overrides:
lookupTable
in classWord2Vec.Builder
- Parameters:
lookupTable
-- Returns:
-
sampling
public ParagraphVectors.Builder sampling(double sampling)
This method defines whether subsampling should be used or not- Overrides:
sampling
in classWord2Vec.Builder
- Parameters:
sampling
- set > 0 to subsampling argument, or 0 to disable- Returns:
-
useAdaGrad
public ParagraphVectors.Builder useAdaGrad(boolean reallyUse)
This method defines whether adaptive gradients should be used or not- Overrides:
useAdaGrad
in classWord2Vec.Builder
- Parameters:
reallyUse
-- Returns:
-
negativeSample
public ParagraphVectors.Builder negativeSample(double negative)
This method defines whether negative sampling should be used or not PLEASE NOTE: If you're going to use negative sampling, you might want to disable HierarchicSoftmax, which is enabled by default Default value: 0- Overrides:
negativeSample
in classWord2Vec.Builder
- Parameters:
negative
- set > 0 as negative sampling argument, or 0 to disable- Returns:
-
stopWords
public ParagraphVectors.Builder stopWords(@NonNull @NonNull List<String> stopList)
This method defines stop words that should be ignored during training- Overrides:
stopWords
in classWord2Vec.Builder
- Parameters:
stopList
-- Returns:
-
trainElementsRepresentation
public ParagraphVectors.Builder trainElementsRepresentation(boolean trainElements)
This method defines, if words representation should be build together with documents representations.- Overrides:
trainElementsRepresentation
in classWord2Vec.Builder
- Parameters:
trainElements
-- Returns:
-
trainSequencesRepresentation
public ParagraphVectors.Builder trainSequencesRepresentation(boolean trainSequences)
This method is hardcoded to TRUE, since that's whole point of ParagraphVectors- Overrides:
trainSequencesRepresentation
in classWord2Vec.Builder
- Parameters:
trainSequences
-- Returns:
-
stopWords
public ParagraphVectors.Builder stopWords(@NonNull @NonNull Collection<VocabWord> stopList)
This method defines stop words that should be ignored during training- Overrides:
stopWords
in classWord2Vec.Builder
- Parameters:
stopList
-- Returns:
-
windowSize
public ParagraphVectors.Builder windowSize(int windowSize)
This method defines context window size- Overrides:
windowSize
in classWord2Vec.Builder
- Parameters:
windowSize
-- Returns:
-
workers
public ParagraphVectors.Builder workers(int numWorkers)
This method defines maximum number of concurrent threads available for training- Overrides:
workers
in classWord2Vec.Builder
- Parameters:
numWorkers
-- Returns:
-
sequenceLearningAlgorithm
public ParagraphVectors.Builder sequenceLearningAlgorithm(SequenceLearningAlgorithm<VocabWord> algorithm)
Description copied from class:SequenceVectors.Builder
Sets specific LearningAlgorithm as Sequence Learning Algorithm- Overrides:
sequenceLearningAlgorithm
in classSequenceVectors.Builder<VocabWord>
- Parameters:
algorithm
- SequenceLearningAlgorithm implementation- Returns:
-
sequenceLearningAlgorithm
public ParagraphVectors.Builder sequenceLearningAlgorithm(String algorithm)
Description copied from class:SequenceVectors.Builder
Sets specific LearningAlgorithm as Sequence Learning Algorithm- Overrides:
sequenceLearningAlgorithm
in classSequenceVectors.Builder<VocabWord>
- Parameters:
algorithm
- fully qualified class name- Returns:
-
useHierarchicSoftmax
public ParagraphVectors.Builder useHierarchicSoftmax(boolean reallyUse)
This method enables/disables Hierarchic softmax Default value: enabled- Overrides:
useHierarchicSoftmax
in classWord2Vec.Builder
- Parameters:
reallyUse
-- Returns:
-
useVariableWindow
public ParagraphVectors.Builder useVariableWindow(int... windows)
This method has no effect for ParagraphVectors- Overrides:
useVariableWindow
in classWord2Vec.Builder
- Parameters:
windows
-- Returns:
-
elementsLearningAlgorithm
public ParagraphVectors.Builder elementsLearningAlgorithm(ElementsLearningAlgorithm<VocabWord> algorithm)
Description copied from class:SequenceVectors.Builder
* Sets specific LearningAlgorithm as Elements Learning Algorithm- Overrides:
elementsLearningAlgorithm
in classWord2Vec.Builder
- Parameters:
algorithm
- ElementsLearningAlgorithm implementation- Returns:
-
elementsLearningAlgorithm
public ParagraphVectors.Builder elementsLearningAlgorithm(String algorithm)
Description copied from class:SequenceVectors.Builder
* Sets specific LearningAlgorithm as Elements Learning Algorithm- Overrides:
elementsLearningAlgorithm
in classWord2Vec.Builder
- Parameters:
algorithm
- fully qualified class name- Returns:
-
usePreciseWeightInit
public ParagraphVectors.Builder usePreciseWeightInit(boolean reallyUse)
Description copied from class:SequenceVectors.Builder
If set to true, initial weights for elements/sequences will be derived from elements themself. However, this implies additional cycle through input iterator. Default value: FALSE- Overrides:
usePreciseWeightInit
in classWord2Vec.Builder
- Returns:
-
seed
public ParagraphVectors.Builder seed(long randomSeed)
This method defines random seed for random numbers generator- Overrides:
seed
in classWord2Vec.Builder
- Parameters:
randomSeed
-- Returns:
-
-