ParagraphVectors (deeplearning4j-nlp 1.0.0-M2 API)

java.lang.Object
- org.deeplearning4j.models.embeddings.wordvectors.WordVectorsImpl<T>
- - org.deeplearning4j.models.sequencevectors.SequenceVectors<VocabWord>
  - - org.deeplearning4j.models.word2vec.Word2Vec
    - - org.deeplearning4j.models.paragraphvectors.ParagraphVectors

All Implemented Interfaces:

Serializable, WordVectors, org.deeplearning4j.nn.weights.embeddings.EmbeddingInitializer
```
public class ParagraphVectors
extends Word2Vec
```
See Also:

Serialized Form

Nested Class Summary

Nested Classes
Modifier and Type	Class and Description
`class`	`ParagraphVectors.BlindInferenceCallable`
`static class`	`ParagraphVectors.Builder`
`class`	`ParagraphVectors.InferenceCallable`

Nested classes/interfaces inherited from class org.deeplearning4j.models.sequencevectors.SequenceVectors
SequenceVectors.AsyncSequencer

Field Summary

Fields
Modifier and Type	Field and Description
`protected AtomicLong`	`countFinished`
`protected AtomicLong`	`countSubmitted`
`protected org.threadly.concurrent.PriorityScheduler`	`inferenceExecutor`
`protected Object`	`inferenceLocker`
`protected LabelAwareIterator`	`labelAwareIterator`
`protected List<VocabWord>`	`labelsList`
`protected org.nd4j.linalg.api.ndarray.INDArray`	`labelsMatrix`
`protected LabelsSource`	`labelsSource`
`protected boolean`	`normalizedLabels`

Fields inherited from class org.deeplearning4j.models.word2vec.Word2Vec
sentenceIter, tokenizerFactory

Fields inherited from class org.deeplearning4j.models.sequencevectors.SequenceVectors
configuration, configured, elementsLearningAlgorithm, enableScavenger, eventListeners, existingModel, intersectModel, iterator, lockFactor, log, scoreElements, scoreSequences, sequenceLearningAlgorithm, unknownElement, vocabLimit

Fields inherited from class org.deeplearning4j.models.embeddings.wordvectors.WordVectorsImpl
batchSize, DEFAULT_UNK, layerSize, learningRate, learningRateDecayWords, lookupTable, minLearningRate, minWordFrequency, modelUtils, negative, numEpochs, numIterations, resetModel, sampling, seed, stopWords, trainElementsVectors, trainSequenceVectors, useAdeGrad, useUnknown, variableWindows, vocab, window, workers

Constructor Summary

Constructors
Modifier Constructor and Description

protected ParagraphVectors()

Constructors
Modifier	Constructor and Description
`protected`	`ParagraphVectors()`

Method Summary

All Methods Static Methods Instance Methods Concrete Methods Deprecated Methods
Modifier and Type	Method and Description
`void`	`extractLabels()`
`void`	`fit()` Starts training over
`static ParagraphVectors`	`fromJson(String jsonString)`
`org.nd4j.linalg.api.ndarray.INDArray`	`inferVector(LabelledDocument document)` This method calculates inferred vector for given document, with default parameters for learning rate and iterations
`org.nd4j.linalg.api.ndarray.INDArray`	`inferVector(LabelledDocument document, double learningRate, double minLearningRate, int iterations)` This method calculates inferred vector for given document
`org.nd4j.linalg.api.ndarray.INDArray`	`inferVector(@NonNull List<VocabWord> document)` This method calculates inferred vector for given list of words, with default parameters for learning rate and iterations
`org.nd4j.linalg.api.ndarray.INDArray`	`inferVector(@NonNull List<VocabWord> document, double learningRate, double minLearningRate, int iterations)` This method calculates inferred vector for given document
`org.nd4j.linalg.api.ndarray.INDArray`	`inferVector(String text)` This method calculates inferred vector for given text, with default parameters for learning rate and iterations
`org.nd4j.linalg.api.ndarray.INDArray`	`inferVector(String text, double learningRate, double minLearningRate, int iterations)` This method calculates inferred vector for given text
`Future<org.nd4j.common.primitives.Pair<String,org.nd4j.linalg.api.ndarray.INDArray>>`	`inferVectorBatched(@NonNull LabelledDocument document)` This method implements batched inference, based on Java Future parallelism model.
`List<org.nd4j.linalg.api.ndarray.INDArray>`	`inferVectorBatched(@NonNull List<String> documents)` This method does inference on a given List<String>
`Future<org.nd4j.linalg.api.ndarray.INDArray>`	`inferVectorBatched(@NonNull String document)` This method implements batched inference, based on Java Future parallelism model.
`protected void`	`initInference()`
`Collection<String>`	`nearestLabels(@NonNull Collection<VocabWord> document, int topN)` This method returns top N labels nearest to specified set of vocab words
`Collection<String>`	`nearestLabels(org.nd4j.linalg.api.ndarray.INDArray labelVector, int topN)` This method returns top N labels nearest to specified features vector
`Collection<String>`	`nearestLabels(LabelledDocument document, int topN)` This method returns top N labels nearest to specified document
`Collection<String>`	`nearestLabels(@NonNull String rawText, int topN)` This method returns top N labels nearest to specified text
`String`	`predict(LabelledDocument document)` This method predicts label of the document.
`String`	`predict(List<VocabWord> document)` This method predicts label of the document.
`String`	`predict(String rawText)` Deprecated.
`Collection<String>`	`predictSeveral(@NonNull LabelledDocument document, int limit)` Predict several labels based on the document.
`Collection<String>`	`predictSeveral(List<VocabWord> document, int limit)` Predict several labels based on the document.
`Collection<String>`	`predictSeveral(String rawText, int limit)` Predict several labels based on the document.
`protected void`	`reassignExistingModel()`
`void`	`setSequenceIterator(@NonNull SequenceIterator<VocabWord> iterator)` This method defines SequenceIterator instance, that will be used as training corpus source.
`double`	`similarityToLabel(LabelledDocument document, String label)` This method returns similarity of the document to specific label, based on mean value
`double`	`similarityToLabel(List<VocabWord> document, String label)` This method returns similarity of the document to specific label, based on mean value
`double`	`similarityToLabel(String rawText, String label)` Deprecated.
`String`	`toJson()`

Methods inherited from class org.deeplearning4j.models.word2vec.Word2Vec
setSentenceIterator, setTokenizerFactory

Methods inherited from class org.deeplearning4j.models.sequencevectors.SequenceVectors
buildVocab, getElementsScore, getSequencesScore, getUNK, getWordVectorMatrix, initLearners, setUNK, trainSequence

Methods inherited from class org.deeplearning4j.models.embeddings.wordvectors.WordVectorsImpl
accuracy, getLayerSize, getWordVector, getWordVectorMatrixNormalized, getWordVectors, getWordVectorsMean, hasWord, indexOf, jsonSerializable, loadWeightsInto, lookupTable, outOfVocabularySupported, setLookupTable, setModelUtils, setVocab, similarity, similarWordsInVocabTo, update, update, vectorSize, vocab, vocabSize, wordsNearest, wordsNearest, wordsNearest, wordsNearestSum, wordsNearestSum, wordsNearestSum

Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

Methods inherited from interface org.deeplearning4j.models.embeddings.wordvectors.WordVectors
accuracy, getWordVector, getWordVectorMatrixNormalized, getWordVectors, getWordVectorsMean, hasWord, indexOf, lookupTable, outOfVocabularySupported, setModelUtils, similarity, similarWordsInVocabTo, vocab, wordsNearest, wordsNearest, wordsNearest, wordsNearestSum, wordsNearestSum, wordsNearestSum

Methods inherited from interface org.deeplearning4j.nn.weights.embeddings.EmbeddingInitializer
jsonSerializable, loadWeightsInto, vectorSize, vocabSize

Field Detail

labelsSource
```
protected LabelsSource labelsSource
```

labelAwareIterator

protected transient LabelAwareIterator labelAwareIterator

labelsMatrix

protected org.nd4j.linalg.api.ndarray.INDArray labelsMatrix

labelsList
```
protected List<VocabWord> labelsList
```

normalizedLabels
```
protected boolean normalizedLabels
```

inferenceLocker

protected final transient Object inferenceLocker

inferenceExecutor

protected transient org.threadly.concurrent.PriorityScheduler inferenceExecutor

countSubmitted

protected transient AtomicLong countSubmitted

countFinished

protected transient AtomicLong countFinished

Constructor Detail
- ParagraphVectors
```
protected ParagraphVectors()
```

Method Detail

initInference
```
protected void initInference()
```

predict
```
@Deprecated
public String predict(String rawText)
```
Deprecated.

This method takes raw text, applies tokenizer, and returns most probable label

Parameters:

rawText -

Returns:

setSequenceIterator
```
public void setSequenceIterator(@NonNull
                                @NonNull SequenceIterator<VocabWord> iterator)
```
This method defines SequenceIterator instance, that will be used as training corpus source. Main difference with other iterators here: it allows you to pass already tokenized Sequence for training

Overrides:

setSequenceIterator in class Word2Vec

Parameters:

iterator -

predict
```
public String predict(LabelledDocument document)
```
This method predicts label of the document. Computes a similarity wrt the mean of the representation of words in the document

Parameters:

document - the document

Returns:

the word distances for each label

extractLabels
```
public void extractLabels()
```

inferVector

public org.nd4j.linalg.api.ndarray.INDArray inferVector(String text,
                                                        double learningRate,
                                                        double minLearningRate,
                                                        int iterations)

This method calculates inferred vector for given text

Parameters:: text -
Returns:

reassignExistingModel

protected void reassignExistingModel()

inferVector

public org.nd4j.linalg.api.ndarray.INDArray inferVector(LabelledDocument document,
                                                        double learningRate,
                                                        double minLearningRate,
                                                        int iterations)

This method calculates inferred vector for given document

Parameters:: document -
Returns:

inferVector

public org.nd4j.linalg.api.ndarray.INDArray inferVector(@NonNull
                                                        @NonNull List<VocabWord> document,
                                                        double learningRate,
                                                        double minLearningRate,
                                                        int iterations)

This method calculates inferred vector for given document

Parameters:: document -
Returns:

inferVector
```
public org.nd4j.linalg.api.ndarray.INDArray inferVector(String text)
```
This method calculates inferred vector for given text, with default parameters for learning rate and iterations

Parameters:

text -

Returns:

inferVector
```
public org.nd4j.linalg.api.ndarray.INDArray inferVector(LabelledDocument document)
```
This method calculates inferred vector for given document, with default parameters for learning rate and iterations

Parameters:

document -

Returns:

inferVector

public org.nd4j.linalg.api.ndarray.INDArray inferVector(@NonNull
                                                        @NonNull List<VocabWord> document)

This method calculates inferred vector for given list of words, with default parameters for learning rate and iterations

Parameters:: document -
Returns:

inferVectorBatched

public Future<org.nd4j.common.primitives.Pair<String,org.nd4j.linalg.api.ndarray.INDArray>> inferVectorBatched(@NonNull
                                                                                                               @NonNull LabelledDocument document)

This method implements batched inference, based on Java Future parallelism model. PLEASE NOTE: In order to use this method, LabelledDocument being passed in should have Id field defined.

Parameters:: document -
Returns:

inferVectorBatched
```
public Future<org.nd4j.linalg.api.ndarray.INDArray> inferVectorBatched(@NonNull
                                                                       @NonNull String document)
```
This method implements batched inference, based on Java Future parallelism model. PLEASE NOTE: This method will return you Future<INDArray>, so tracking relation between document and INDArray will be your responsibility

Parameters:

document -

Returns:

inferVectorBatched

public List<org.nd4j.linalg.api.ndarray.INDArray> inferVectorBatched(@NonNull
                                                                     @NonNull List<String> documents)

This method does inference on a given List<String>

Parameters:: documents -
Returns:: INDArrays in the same order as input texts

predict
```
public String predict(List<VocabWord> document)
```
This method predicts label of the document. Computes a similarity wrt the mean of the representation of words in the document

Parameters:

document - the document

Returns:

the word distances for each label

predictSeveral
```
public Collection<String> predictSeveral(@NonNull
                                         @NonNull LabelledDocument document,
                                         int limit)
```
Predict several labels based on the document. Computes a similarity wrt the mean of the representation of words in the document

Parameters:

document - raw text of the document

Returns:

possible labels in descending order

predictSeveral
```
public Collection<String> predictSeveral(String rawText,
                                         int limit)
```
Predict several labels based on the document. Computes a similarity wrt the mean of the representation of words in the document

Parameters:

rawText - raw text of the document

Returns:

possible labels in descending order

predictSeveral
```
public Collection<String> predictSeveral(List<VocabWord> document,
                                         int limit)
```
Predict several labels based on the document. Computes a similarity wrt the mean of the representation of words in the document

Parameters:

document - the document

Returns:

possible labels in descending order

nearestLabels

public Collection<String> nearestLabels(LabelledDocument document,
                                        int topN)

This method returns top N labels nearest to specified document

Parameters:: document -; topN -
Returns:

nearestLabels

public Collection<String> nearestLabels(@NonNull
                                        @NonNull String rawText,
                                        int topN)

This method returns top N labels nearest to specified text

Parameters:: rawText -; topN -
Returns:

nearestLabels

public Collection<String> nearestLabels(@NonNull
                                        @NonNull Collection<VocabWord> document,
                                        int topN)

This method returns top N labels nearest to specified set of vocab words

Parameters:: document -; topN -
Returns:

nearestLabels

public Collection<String> nearestLabels(org.nd4j.linalg.api.ndarray.INDArray labelVector,
                                        int topN)

This method returns top N labels nearest to specified features vector

Parameters:: labelVector -; topN -
Returns:

similarityToLabel

@Deprecated
public double similarityToLabel(String rawText,
                                            String label)

Deprecated.

This method returns similarity of the document to specific label, based on mean value

Parameters:: rawText -; label -
Returns:

fit
```
public void fit()
```
Description copied from class: SequenceVectors

Starts training over

Overrides:

fit in class SequenceVectors<VocabWord>

similarityToLabel

public double similarityToLabel(LabelledDocument document,
                                String label)

This method returns similarity of the document to specific label, based on mean value

Parameters:: document -; label -
Returns:

similarityToLabel

public double similarityToLabel(List<VocabWord> document,
                                String label)

This method returns similarity of the document to specific label, based on mean value

Parameters:: document -; label -
Returns:

toJson

public String toJson()
              throws org.nd4j.shade.jackson.core.JsonProcessingException

Overrides:: toJson in class Word2Vec
Throws:: org.nd4j.shade.jackson.core.JsonProcessingException

fromJson

public static ParagraphVectors fromJson(String jsonString)
                                 throws IOException

Throws:: IOException

Class ParagraphVectors

Nested Class Summary

Nested classes/interfaces inherited from class org.deeplearning4j.models.sequencevectors.SequenceVectors

Field Summary

Fields inherited from class org.deeplearning4j.models.word2vec.Word2Vec

Fields inherited from class org.deeplearning4j.models.sequencevectors.SequenceVectors

Fields inherited from class org.deeplearning4j.models.embeddings.wordvectors.WordVectorsImpl

Constructor Summary

Method Summary

Methods inherited from class org.deeplearning4j.models.word2vec.Word2Vec

Methods inherited from class org.deeplearning4j.models.sequencevectors.SequenceVectors

Methods inherited from class org.deeplearning4j.models.embeddings.wordvectors.WordVectorsImpl

Methods inherited from class java.lang.Object

Methods inherited from interface org.deeplearning4j.models.embeddings.wordvectors.WordVectors

Methods inherited from interface org.deeplearning4j.nn.weights.embeddings.EmbeddingInitializer

Field Detail

labelsSource

labelAwareIterator

labelsMatrix

labelsList

normalizedLabels

inferenceLocker

inferenceExecutor

countSubmitted

countFinished

Constructor Detail

ParagraphVectors

Method Detail

initInference

predict

setSequenceIterator

predict

extractLabels

inferVector

reassignExistingModel

inferVector

inferVector

inferVector

inferVector

inferVector

inferVectorBatched

inferVectorBatched

inferVectorBatched

predict

predictSeveral

predictSeveral

predictSeveral

nearestLabels

nearestLabels

nearestLabels

nearestLabels

similarityToLabel

fit

similarityToLabel

similarityToLabel

toJson

fromJson