Class ParagraphVectors
- java.lang.Object
-
- org.deeplearning4j.models.embeddings.wordvectors.WordVectorsImpl<T>
-
- org.deeplearning4j.models.sequencevectors.SequenceVectors<VocabWord>
-
- org.deeplearning4j.models.word2vec.Word2Vec
-
- org.deeplearning4j.models.paragraphvectors.ParagraphVectors
-
- All Implemented Interfaces:
Serializable
,WordVectors
,org.deeplearning4j.nn.weights.embeddings.EmbeddingInitializer
public class ParagraphVectors extends Word2Vec
- See Also:
- Serialized Form
-
-
Nested Class Summary
Nested Classes Modifier and Type Class Description class
ParagraphVectors.BlindInferenceCallable
static class
ParagraphVectors.Builder
class
ParagraphVectors.InferenceCallable
-
Nested classes/interfaces inherited from class org.deeplearning4j.models.sequencevectors.SequenceVectors
SequenceVectors.AsyncSequencer
-
-
Field Summary
Fields Modifier and Type Field Description protected AtomicLong
countFinished
protected AtomicLong
countSubmitted
protected org.threadly.concurrent.PriorityScheduler
inferenceExecutor
protected Object
inferenceLocker
protected LabelAwareIterator
labelAwareIterator
protected List<VocabWord>
labelsList
protected org.nd4j.linalg.api.ndarray.INDArray
labelsMatrix
protected LabelsSource
labelsSource
protected boolean
normalizedLabels
-
Fields inherited from class org.deeplearning4j.models.word2vec.Word2Vec
sentenceIter, tokenizerFactory
-
Fields inherited from class org.deeplearning4j.models.sequencevectors.SequenceVectors
configuration, configured, elementsLearningAlgorithm, enableScavenger, eventListeners, existingModel, intersectModel, iterator, lockFactor, log, scoreElements, scoreSequences, sequenceLearningAlgorithm, unknownElement, vocabLimit
-
Fields inherited from class org.deeplearning4j.models.embeddings.wordvectors.WordVectorsImpl
batchSize, DEFAULT_UNK, layerSize, learningRate, learningRateDecayWords, lookupTable, minLearningRate, minWordFrequency, modelUtils, negative, numEpochs, numIterations, resetModel, sampling, seed, stopWords, trainElementsVectors, trainSequenceVectors, useAdeGrad, useUnknown, variableWindows, vocab, window, workers
-
-
Constructor Summary
Constructors Modifier Constructor Description protected
ParagraphVectors()
-
Method Summary
All Methods Static Methods Instance Methods Concrete Methods Deprecated Methods Modifier and Type Method Description void
extractLabels()
void
fit()
Starts training overstatic ParagraphVectors
fromJson(String jsonString)
org.nd4j.linalg.api.ndarray.INDArray
inferVector(@NonNull List<VocabWord> document)
This method calculates inferred vector for given list of words, with default parameters for learning rate and iterationsorg.nd4j.linalg.api.ndarray.INDArray
inferVector(@NonNull List<VocabWord> document, double learningRate, double minLearningRate, int iterations)
This method calculates inferred vector for given documentorg.nd4j.linalg.api.ndarray.INDArray
inferVector(String text)
This method calculates inferred vector for given text, with default parameters for learning rate and iterationsorg.nd4j.linalg.api.ndarray.INDArray
inferVector(String text, double learningRate, double minLearningRate, int iterations)
This method calculates inferred vector for given textorg.nd4j.linalg.api.ndarray.INDArray
inferVector(LabelledDocument document)
This method calculates inferred vector for given document, with default parameters for learning rate and iterationsorg.nd4j.linalg.api.ndarray.INDArray
inferVector(LabelledDocument document, double learningRate, double minLearningRate, int iterations)
This method calculates inferred vector for given documentFuture<org.nd4j.linalg.api.ndarray.INDArray>
inferVectorBatched(@NonNull String document)
This method implements batched inference, based on Java Future parallelism model.List<org.nd4j.linalg.api.ndarray.INDArray>
inferVectorBatched(@NonNull List<String> documents)
This method does inference on a given List<String>Future<org.nd4j.common.primitives.Pair<String,org.nd4j.linalg.api.ndarray.INDArray>>
inferVectorBatched(@NonNull LabelledDocument document)
This method implements batched inference, based on Java Future parallelism model.protected void
initInference()
Collection<String>
nearestLabels(@NonNull String rawText, int topN)
This method returns top N labels nearest to specified textCollection<String>
nearestLabels(@NonNull Collection<VocabWord> document, int topN)
This method returns top N labels nearest to specified set of vocab wordsCollection<String>
nearestLabels(LabelledDocument document, int topN)
This method returns top N labels nearest to specified documentCollection<String>
nearestLabels(org.nd4j.linalg.api.ndarray.INDArray labelVector, int topN)
This method returns top N labels nearest to specified features vectorString
predict(String rawText)
Deprecated.String
predict(List<VocabWord> document)
This method predicts label of the document.String
predict(LabelledDocument document)
This method predicts label of the document.Collection<String>
predictSeveral(@NonNull LabelledDocument document, int limit)
Predict several labels based on the document.Collection<String>
predictSeveral(String rawText, int limit)
Predict several labels based on the document.Collection<String>
predictSeveral(List<VocabWord> document, int limit)
Predict several labels based on the document.protected void
reassignExistingModel()
void
setSequenceIterator(@NonNull SequenceIterator<VocabWord> iterator)
This method defines SequenceIterator instance, that will be used as training corpus source.double
similarityToLabel(String rawText, String label)
Deprecated.double
similarityToLabel(List<VocabWord> document, String label)
This method returns similarity of the document to specific label, based on mean valuedouble
similarityToLabel(LabelledDocument document, String label)
This method returns similarity of the document to specific label, based on mean valueString
toJson()
-
Methods inherited from class org.deeplearning4j.models.word2vec.Word2Vec
setSentenceIterator, setTokenizerFactory
-
Methods inherited from class org.deeplearning4j.models.sequencevectors.SequenceVectors
buildVocab, getElementsScore, getSequencesScore, getUNK, getWordVectorMatrix, initLearners, setUNK, trainSequence
-
Methods inherited from class org.deeplearning4j.models.embeddings.wordvectors.WordVectorsImpl
accuracy, getLayerSize, getWordVector, getWordVectorMatrixNormalized, getWordVectors, getWordVectorsMean, hasWord, indexOf, jsonSerializable, loadWeightsInto, lookupTable, outOfVocabularySupported, setLookupTable, setModelUtils, setVocab, similarity, similarWordsInVocabTo, update, update, vectorSize, vocab, vocabSize, wordsNearest, wordsNearest, wordsNearest, wordsNearestSum, wordsNearestSum, wordsNearestSum
-
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
-
Methods inherited from interface org.deeplearning4j.nn.weights.embeddings.EmbeddingInitializer
jsonSerializable, loadWeightsInto, vectorSize, vocabSize
-
Methods inherited from interface org.deeplearning4j.models.embeddings.wordvectors.WordVectors
accuracy, getWordVector, getWordVectorMatrixNormalized, getWordVectors, getWordVectorsMean, hasWord, indexOf, lookupTable, outOfVocabularySupported, setModelUtils, similarity, similarWordsInVocabTo, vocab, wordsNearest, wordsNearest, wordsNearest, wordsNearestSum, wordsNearestSum, wordsNearestSum
-
-
-
-
Field Detail
-
labelsSource
protected LabelsSource labelsSource
-
labelAwareIterator
protected transient LabelAwareIterator labelAwareIterator
-
labelsMatrix
protected org.nd4j.linalg.api.ndarray.INDArray labelsMatrix
-
normalizedLabels
protected boolean normalizedLabels
-
inferenceLocker
protected final transient Object inferenceLocker
-
inferenceExecutor
protected transient org.threadly.concurrent.PriorityScheduler inferenceExecutor
-
countSubmitted
protected transient AtomicLong countSubmitted
-
countFinished
protected transient AtomicLong countFinished
-
-
Method Detail
-
initInference
protected void initInference()
-
predict
@Deprecated public String predict(String rawText)
Deprecated.This method takes raw text, applies tokenizer, and returns most probable label- Parameters:
rawText
-- Returns:
-
setSequenceIterator
public void setSequenceIterator(@NonNull @NonNull SequenceIterator<VocabWord> iterator)
This method defines SequenceIterator instance, that will be used as training corpus source. Main difference with other iterators here: it allows you to pass already tokenized Sequencefor training - Overrides:
setSequenceIterator
in classWord2Vec
- Parameters:
iterator
-
-
predict
public String predict(LabelledDocument document)
This method predicts label of the document. Computes a similarity wrt the mean of the representation of words in the document- Parameters:
document
- the document- Returns:
- the word distances for each label
-
extractLabels
public void extractLabels()
-
inferVector
public org.nd4j.linalg.api.ndarray.INDArray inferVector(String text, double learningRate, double minLearningRate, int iterations)
This method calculates inferred vector for given text- Parameters:
text
-- Returns:
-
reassignExistingModel
protected void reassignExistingModel()
-
inferVector
public org.nd4j.linalg.api.ndarray.INDArray inferVector(LabelledDocument document, double learningRate, double minLearningRate, int iterations)
This method calculates inferred vector for given document- Parameters:
document
-- Returns:
-
inferVector
public org.nd4j.linalg.api.ndarray.INDArray inferVector(@NonNull @NonNull List<VocabWord> document, double learningRate, double minLearningRate, int iterations)
This method calculates inferred vector for given document- Parameters:
document
-- Returns:
-
inferVector
public org.nd4j.linalg.api.ndarray.INDArray inferVector(String text)
This method calculates inferred vector for given text, with default parameters for learning rate and iterations- Parameters:
text
-- Returns:
-
inferVector
public org.nd4j.linalg.api.ndarray.INDArray inferVector(LabelledDocument document)
This method calculates inferred vector for given document, with default parameters for learning rate and iterations- Parameters:
document
-- Returns:
-
inferVector
public org.nd4j.linalg.api.ndarray.INDArray inferVector(@NonNull @NonNull List<VocabWord> document)
This method calculates inferred vector for given list of words, with default parameters for learning rate and iterations- Parameters:
document
-- Returns:
-
inferVectorBatched
public Future<org.nd4j.common.primitives.Pair<String,org.nd4j.linalg.api.ndarray.INDArray>> inferVectorBatched(@NonNull @NonNull LabelledDocument document)
This method implements batched inference, based on Java Future parallelism model. PLEASE NOTE: In order to use this method, LabelledDocument being passed in should have Id field defined.- Parameters:
document
-- Returns:
-
inferVectorBatched
public Future<org.nd4j.linalg.api.ndarray.INDArray> inferVectorBatched(@NonNull @NonNull String document)
This method implements batched inference, based on Java Future parallelism model. PLEASE NOTE: This method will return you Future<INDArray>, so tracking relation between document and INDArray will be your responsibility- Parameters:
document
-- Returns:
-
inferVectorBatched
public List<org.nd4j.linalg.api.ndarray.INDArray> inferVectorBatched(@NonNull @NonNull List<String> documents)
This method does inference on a given List<String>- Parameters:
documents
-- Returns:
- INDArrays in the same order as input texts
-
predict
public String predict(List<VocabWord> document)
This method predicts label of the document. Computes a similarity wrt the mean of the representation of words in the document- Parameters:
document
- the document- Returns:
- the word distances for each label
-
predictSeveral
public Collection<String> predictSeveral(@NonNull @NonNull LabelledDocument document, int limit)
Predict several labels based on the document. Computes a similarity wrt the mean of the representation of words in the document- Parameters:
document
- raw text of the document- Returns:
- possible labels in descending order
-
predictSeveral
public Collection<String> predictSeveral(String rawText, int limit)
Predict several labels based on the document. Computes a similarity wrt the mean of the representation of words in the document- Parameters:
rawText
- raw text of the document- Returns:
- possible labels in descending order
-
predictSeveral
public Collection<String> predictSeveral(List<VocabWord> document, int limit)
Predict several labels based on the document. Computes a similarity wrt the mean of the representation of words in the document- Parameters:
document
- the document- Returns:
- possible labels in descending order
-
nearestLabels
public Collection<String> nearestLabels(LabelledDocument document, int topN)
This method returns top N labels nearest to specified document- Parameters:
document
-topN
-- Returns:
-
nearestLabels
public Collection<String> nearestLabels(@NonNull @NonNull String rawText, int topN)
This method returns top N labels nearest to specified text- Parameters:
rawText
-topN
-- Returns:
-
nearestLabels
public Collection<String> nearestLabels(@NonNull @NonNull Collection<VocabWord> document, int topN)
This method returns top N labels nearest to specified set of vocab words- Parameters:
document
-topN
-- Returns:
-
nearestLabels
public Collection<String> nearestLabels(org.nd4j.linalg.api.ndarray.INDArray labelVector, int topN)
This method returns top N labels nearest to specified features vector- Parameters:
labelVector
-topN
-- Returns:
-
similarityToLabel
@Deprecated public double similarityToLabel(String rawText, String label)
Deprecated.This method returns similarity of the document to specific label, based on mean value- Parameters:
rawText
-label
-- Returns:
-
fit
public void fit()
Description copied from class:SequenceVectors
Starts training over- Overrides:
fit
in classSequenceVectors<VocabWord>
-
similarityToLabel
public double similarityToLabel(LabelledDocument document, String label)
This method returns similarity of the document to specific label, based on mean value- Parameters:
document
-label
-- Returns:
-
similarityToLabel
public double similarityToLabel(List<VocabWord> document, String label)
This method returns similarity of the document to specific label, based on mean value- Parameters:
document
-label
-- Returns:
-
toJson
public String toJson() throws org.nd4j.shade.jackson.core.JsonProcessingException
-
fromJson
public static ParagraphVectors fromJson(String jsonString) throws IOException
- Throws:
IOException
-
-