public static class ParagraphVectors.Builder extends Word2Vec.Builder
Modifier and Type | Field and Description |
---|---|
protected DocumentIterator |
docIter |
protected LabelAwareIterator |
labelAwareIterator |
protected LabelsSource |
labelsSource |
sentenceIterator, tokenizerFactory
batchSize, configuration, elementsLearningAlgorithm, hugeModelExpected, iterations, iterator, layerSize, learningRate, learningRateDecayWords, lookupTable, minLearningRate, minWordFrequency, negative, numEpochs, resetModel, sampling, seed, sequenceLearningAlgorithm, stopWords, trainElementsVectors, trainSequenceVectors, useAdaGrad, vocabCache, window, workers
Constructor and Description |
---|
Builder() |
Builder(VectorsConfiguration configuration) |
Modifier and Type | Method and Description |
---|---|
ParagraphVectors.Builder |
batchSize(int batchSize)
This method defines mini-batch size
|
ParagraphVectors |
build()
Build SequenceVectors instance with defined settings/options
|
ParagraphVectors.Builder |
epochs(int numEpochs)
This method defines number of epochs (iterations over whole training corpus) for training
|
ParagraphVectors.Builder |
index(InvertedIndex<VocabWord> index) |
ParagraphVectors.Builder |
iterate(DocumentIterator iterator)
This method used to feed DocumentIterator, that contains training corpus, into ParagraphVectors
|
ParagraphVectors.Builder |
iterate(LabelAwareDocumentIterator iterator)
This method used to feed LabelAwareDocumentIterator, that contains training corpus, into ParagraphVectors
|
ParagraphVectors.Builder |
iterate(LabelAwareIterator iterator)
This method used to feed LabelAwareIterator, that contains training corpus, into ParagraphVectors
|
ParagraphVectors.Builder |
iterate(LabelAwareSentenceIterator iterator)
This method used to feed LabelAwareSentenceIterator, that contains training corpus, into ParagraphVectors
|
ParagraphVectors.Builder |
iterate(SentenceIterator iterator)
This method used to feed SentenceIterator, that contains training corpus, into ParagraphVectors
|
ParagraphVectors.Builder |
iterate(SequenceIterator<VocabWord> iterator)
This method used to feed SequenceIterator, that contains training corpus, into ParagraphVectors
|
ParagraphVectors.Builder |
iterations(int iterations)
This method defines number of iterations done for each mini-batch during training
|
ParagraphVectors.Builder |
labels(List<String> labels)
Deprecated.
|
ParagraphVectors.Builder |
labelsSource(LabelsSource source)
This method attaches pre-defined labels source to ParagraphVectors
|
ParagraphVectors.Builder |
layerSize(int layerSize)
This method defines number of dimensions for output vectors
|
ParagraphVectors.Builder |
learningRate(double learningRate)
This method defines initial learning rate for model training
|
ParagraphVectors.Builder |
lookupTable(WeightLookupTable<VocabWord> lookupTable)
This method allows to define external WeightLookupTable to be used
|
ParagraphVectors.Builder |
minLearningRate(double minLearningRate)
This method defines minimal learning rate value for training
|
ParagraphVectors.Builder |
minWordFrequency(int minWordFrequency)
This method defines minimal word frequency in training corpus.
|
ParagraphVectors.Builder |
negativeSample(double negative)
This method defines whether negative sampling should be used or not
|
ParagraphVectors.Builder |
resetModel(boolean reallyReset)
This method defines whether model should be totally wiped out prior building, or not
|
ParagraphVectors.Builder |
sampling(double sampling)
This method defines whether subsampling should be used or not
|
ParagraphVectors.Builder |
seed(long randomSeed)
This method defines random seed for random numbers generator
|
ParagraphVectors.Builder |
stopWords(Collection<VocabWord> stopList)
This method defines stop words that should be ignored during training
|
ParagraphVectors.Builder |
stopWords(List<String> stopList)
This method defines stop words that should be ignored during training
|
ParagraphVectors.Builder |
tokenizerFactory(TokenizerFactory tokenizerFactory)
This method defines TokenizerFactory to be used for strings tokenization during training
PLEASE NOTE: If external VocabCache is used, the same TokenizerFactory should be used to keep derived tokens equal.
|
ParagraphVectors.Builder |
trainElementsRepresentation(boolean trainElements)
This method defines, if words representation should be build together with documents representations.
|
ParagraphVectors.Builder |
trainSequencesRepresentation(boolean trainSequences)
This method is hardcoded to TRUE, since that's whole point of ParagraphVectors
|
ParagraphVectors.Builder |
trainWordVectors(boolean trainElements)
This method defines, if words representations should be build together with documents representations.
|
ParagraphVectors.Builder |
useAdaGrad(boolean reallyUse)
This method defines whether adaptive gradients should be used or not
|
ParagraphVectors.Builder |
vocabCache(VocabCache<VocabWord> vocabCache)
This method allows to define external VocabCache to be used
|
ParagraphVectors.Builder |
windowSize(int windowSize)
This method defines context window size
|
ParagraphVectors.Builder |
workers(int numWorkers)
This method defines maximum number of concurrent threads available for training
|
elementsLearningAlgorithm, elementsLearningAlgorithm, presetTables, sequenceLearningAlgorithm, sequenceLearningAlgorithm
protected LabelAwareIterator labelAwareIterator
protected LabelsSource labelsSource
protected DocumentIterator docIter
public Builder()
public Builder(@NonNull VectorsConfiguration configuration)
public ParagraphVectors.Builder trainWordVectors(boolean trainElements)
trainElements
- public ParagraphVectors.Builder labelsSource(@NonNull LabelsSource source)
source
- @Deprecated public ParagraphVectors.Builder labels(@NonNull List<String> labels)
labels
- public ParagraphVectors.Builder iterate(@NonNull LabelAwareDocumentIterator iterator)
iterator
- public ParagraphVectors.Builder iterate(@NonNull LabelAwareSentenceIterator iterator)
iterator
- public ParagraphVectors.Builder iterate(@NonNull LabelAwareIterator iterator)
iterator
- public ParagraphVectors.Builder iterate(@NonNull DocumentIterator iterator)
iterate
in class Word2Vec.Builder
iterator
- public ParagraphVectors.Builder iterate(@NonNull SentenceIterator iterator)
iterate
in class Word2Vec.Builder
iterator
- public ParagraphVectors build()
SequenceVectors.Builder
build
in class Word2Vec.Builder
public ParagraphVectors.Builder tokenizerFactory(@NonNull TokenizerFactory tokenizerFactory)
tokenizerFactory
in class Word2Vec.Builder
tokenizerFactory
- public ParagraphVectors.Builder index(@NonNull InvertedIndex<VocabWord> index)
index
in class Word2Vec.Builder
public ParagraphVectors.Builder iterate(@NonNull SequenceIterator<VocabWord> iterator)
iterate
in class Word2Vec.Builder
iterator
- public ParagraphVectors.Builder batchSize(int batchSize)
batchSize
in class Word2Vec.Builder
batchSize
- public ParagraphVectors.Builder iterations(int iterations)
iterations
in class Word2Vec.Builder
iterations
- public ParagraphVectors.Builder epochs(int numEpochs)
epochs
in class Word2Vec.Builder
numEpochs
- public ParagraphVectors.Builder layerSize(int layerSize)
layerSize
in class Word2Vec.Builder
layerSize
- public ParagraphVectors.Builder learningRate(double learningRate)
learningRate
in class Word2Vec.Builder
learningRate
- public ParagraphVectors.Builder minWordFrequency(int minWordFrequency)
minWordFrequency
in class Word2Vec.Builder
minWordFrequency
- public ParagraphVectors.Builder minLearningRate(double minLearningRate)
minLearningRate
in class Word2Vec.Builder
minLearningRate
- public ParagraphVectors.Builder resetModel(boolean reallyReset)
resetModel
in class Word2Vec.Builder
reallyReset
- public ParagraphVectors.Builder vocabCache(@NonNull VocabCache<VocabWord> vocabCache)
vocabCache
in class Word2Vec.Builder
vocabCache
- public ParagraphVectors.Builder lookupTable(@NonNull WeightLookupTable<VocabWord> lookupTable)
lookupTable
in class Word2Vec.Builder
lookupTable
- public ParagraphVectors.Builder sampling(double sampling)
sampling
in class Word2Vec.Builder
sampling
- set > 0 to subsampling argument, or 0 to disablepublic ParagraphVectors.Builder useAdaGrad(boolean reallyUse)
useAdaGrad
in class Word2Vec.Builder
reallyUse
- public ParagraphVectors.Builder negativeSample(double negative)
negativeSample
in class Word2Vec.Builder
negative
- set > 0 as negative sampling argument, or 0 to disablepublic ParagraphVectors.Builder stopWords(@NonNull List<String> stopList)
stopWords
in class Word2Vec.Builder
stopList
- public ParagraphVectors.Builder trainElementsRepresentation(boolean trainElements)
trainElementsRepresentation
in class Word2Vec.Builder
trainElements
- public ParagraphVectors.Builder trainSequencesRepresentation(boolean trainSequences)
trainSequencesRepresentation
in class Word2Vec.Builder
trainSequences
- public ParagraphVectors.Builder stopWords(@NonNull Collection<VocabWord> stopList)
stopWords
in class Word2Vec.Builder
stopList
- public ParagraphVectors.Builder windowSize(int windowSize)
windowSize
in class Word2Vec.Builder
windowSize
- public ParagraphVectors.Builder workers(int numWorkers)
workers
in class Word2Vec.Builder
numWorkers
- public ParagraphVectors.Builder seed(long randomSeed)
seed
in class Word2Vec.Builder
randomSeed
- Copyright © 2016. All Rights Reserved.