ParagraphVectors (deeplearning4j-nlp 0.9.0 API)

java.lang.Object
- org.deeplearning4j.models.embeddings.wordvectors.WordVectorsImpl<T>
- - org.deeplearning4j.models.sequencevectors.SequenceVectors<VocabWord>
  - - org.deeplearning4j.models.word2vec.Word2Vec
    - - org.deeplearning4j.models.paragraphvectors.ParagraphVectors

All Implemented Interfaces:

Serializable, WordVectors
```
public class ParagraphVectors
extends Word2Vec
```
Basic ParagraphVectors (aka Doc2Vec) implementation for DL4j, as wrapper over SequenceVectors

Author:

[email protected]

See Also:

Serialized Form

Nested Class Summary

Nested Classes
Modifier and Type	Class and Description
`class`	`ParagraphVectors.BlindInferenceCallable`
`static class`	`ParagraphVectors.Builder`
`class`	`ParagraphVectors.InferenceCallable`

Nested classes/interfaces inherited from class org.deeplearning4j.models.sequencevectors.SequenceVectors
SequenceVectors.AsyncSequencer

Field Summary

Fields
Modifier and Type	Field and Description
`protected AtomicLong`	`countFinished`
`protected AtomicLong`	`countSubmitted`
`protected ExecutorService`	`inferenceExecutor`
`protected Object`	`inferenceLocker`
`protected LabelAwareIterator`	`labelAwareIterator`
`protected List<VocabWord>`	`labelsList`
`protected org.nd4j.linalg.api.ndarray.INDArray`	`labelsMatrix`
`protected LabelsSource`	`labelsSource`
`protected boolean`	`normalizedLabels`

Fields inherited from class org.deeplearning4j.models.word2vec.Word2Vec
sentenceIter, tokenizerFactory

Fields inherited from class org.deeplearning4j.models.sequencevectors.SequenceVectors
configuration, configured, elementsLearningAlgorithm, enableScavenger, eventListeners, existingModel, iterator, log, scoreElements, scoreSequences, sequenceLearningAlgorithm, unknownElement

Fields inherited from class org.deeplearning4j.models.embeddings.wordvectors.WordVectorsImpl
batchSize, DEFAULT_UNK, layerSize, learningRate, learningRateDecayWords, lookupTable, minLearningRate, minWordFrequency, modelUtils, negative, numEpochs, numIterations, resetModel, sampling, seed, stopWords, trainElementsVectors, trainSequenceVectors, useAdeGrad, useUnknown, variableWindows, vocab, window, workers

Constructor Summary

Constructors
Modifier Constructor and Description

protected ParagraphVectors()

Constructors
Modifier	Constructor and Description
`protected`	`ParagraphVectors()`

Method Summary

All Methods Instance Methods Concrete Methods Deprecated Methods
Modifier and Type	Method and Description
`void`	`extractLabels()`
`void`	`fit()` Starts training over
`org.nd4j.linalg.api.ndarray.INDArray`	`inferVector(LabelledDocument document)` This method calculates inferred vector for given document, with default parameters for learning rate and iterations
`org.nd4j.linalg.api.ndarray.INDArray`	`inferVector(LabelledDocument document, double learningRate, double minLearningRate, int iterations)` This method calculates inferred vector for given document
`org.nd4j.linalg.api.ndarray.INDArray`	`inferVector(List<VocabWord> document)` This method calculates inferred vector for given list of words, with default parameters for learning rate and iterations
`org.nd4j.linalg.api.ndarray.INDArray`	`inferVector(List<VocabWord> document, double learningRate, double minLearningRate, int iterations)` This method calculates inferred vector for given document
`org.nd4j.linalg.api.ndarray.INDArray`	`inferVector(String text)` This method calculates inferred vector for given text, with default parameters for learning rate and iterations
`org.nd4j.linalg.api.ndarray.INDArray`	`inferVector(String text, double learningRate, double minLearningRate, int iterations)` This method calculates inferred vector for given text
`Future<org.deeplearning4j.berkeley.Pair<String,org.nd4j.linalg.api.ndarray.INDArray>>`	`inferVectorBatched(LabelledDocument document)` This method implements batched inference, based on Java Future parallelism model.
`List<org.nd4j.linalg.api.ndarray.INDArray>`	`inferVectorBatched(List<String> documents)` This method does inference on a given List<String>
`Future<org.nd4j.linalg.api.ndarray.INDArray>`	`inferVectorBatched(String document)` This method implements batched inference, based on Java Future parallelism model.
`protected void`	`initInference()`
`Collection<String>`	`nearestLabels(Collection<VocabWord> document, int topN)` This method returns top N labels nearest to specified set of vocab words
`Collection<String>`	`nearestLabels(org.nd4j.linalg.api.ndarray.INDArray labelVector, int topN)` This method returns top N labels nearest to specified features vector
`Collection<String>`	`nearestLabels(LabelledDocument document, int topN)` This method returns top N labels nearest to specified document
`Collection<String>`	`nearestLabels(String rawText, int topN)` This method returns top N labels nearest to specified text
`String`	`predict(LabelledDocument document)` Deprecated.
`String`	`predict(List<VocabWord> document)` Deprecated.
`String`	`predict(String rawText)` Deprecated.
`Collection<String>`	`predictSeveral(LabelledDocument document, int limit)` Deprecated.
`Collection<String>`	`predictSeveral(List<VocabWord> document, int limit)` Deprecated.
`Collection<String>`	`predictSeveral(String rawText, int limit)` Deprecated.
`protected void`	`reassignExistingModel()`
`void`	`setSequenceIterator(SequenceIterator<VocabWord> iterator)` This method defines SequenceIterator instance, that will be used as training corpus source.
`double`	`similarityToLabel(LabelledDocument document, String label)` Deprecated.
`double`	`similarityToLabel(List<VocabWord> document, String label)` Deprecated.
`double`	`similarityToLabel(String rawText, String label)` Deprecated.

Methods inherited from class org.deeplearning4j.models.word2vec.Word2Vec
setSentenceIterator, setTokenizerFactory

Methods inherited from class org.deeplearning4j.models.sequencevectors.SequenceVectors
buildVocab, getElementsScore, getSequencesScore, getUNK, getWordVectorMatrix, initLearners, trainSequence

Methods inherited from class org.deeplearning4j.models.embeddings.wordvectors.WordVectorsImpl
accuracy, getLayerSize, getWordVector, getWordVectorMatrixNormalized, getWordVectors, getWordVectorsMean, hasWord, indexOf, lookupTable, setLookupTable, setModelUtils, setVocab, similarity, similarWordsInVocabTo, update, update, vocab, wordsNearest, wordsNearest, wordsNearest, wordsNearestSum, wordsNearestSum, wordsNearestSum

Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

Methods inherited from interface org.deeplearning4j.models.embeddings.wordvectors.WordVectors
accuracy, getWordVector, getWordVectorMatrixNormalized, getWordVectors, getWordVectorsMean, hasWord, indexOf, lookupTable, setModelUtils, setUNK, similarity, similarWordsInVocabTo, vocab, wordsNearest, wordsNearest, wordsNearest, wordsNearestSum, wordsNearestSum, wordsNearestSum

Field Detail

labelsSource
```
protected LabelsSource labelsSource
```

labelAwareIterator

protected transient LabelAwareIterator labelAwareIterator

labelsMatrix

protected org.nd4j.linalg.api.ndarray.INDArray labelsMatrix

labelsList
```
protected List<VocabWord> labelsList
```

normalizedLabels
```
protected boolean normalizedLabels
```

inferenceLocker

protected final transient Object inferenceLocker

inferenceExecutor

protected transient ExecutorService inferenceExecutor

countSubmitted

protected transient AtomicLong countSubmitted

countFinished

protected transient AtomicLong countFinished

Constructor Detail
- ParagraphVectors
```
protected ParagraphVectors()
```

Method Detail

initInference
```
protected void initInference()
```

predict
```
@Deprecated
public String predict(String rawText)
```
Deprecated.

This method takes raw text, applies tokenizer, and returns most probable label

Parameters:

rawText -

Returns:

setSequenceIterator
```
public void setSequenceIterator(@NonNull
                                SequenceIterator<VocabWord> iterator)
```
This method defines SequenceIterator instance, that will be used as training corpus source. Main difference with other iterators here: it allows you to pass already tokenized Sequence for training

Overrides:

setSequenceIterator in class Word2Vec

Parameters:

iterator -

predict
```
@Deprecated
public String predict(LabelledDocument document)
```
Deprecated.

This method predicts label of the document. Computes a similarity wrt the mean of the representation of words in the document

Parameters:

document - the document

Returns:

the word distances for each label

extractLabels
```
public void extractLabels()
```

inferVector

public org.nd4j.linalg.api.ndarray.INDArray inferVector(String text,
                                                        double learningRate,
                                                        double minLearningRate,
                                                        int iterations)

This method calculates inferred vector for given text

Parameters:: text -
Returns:

reassignExistingModel

protected void reassignExistingModel()

inferVector

public org.nd4j.linalg.api.ndarray.INDArray inferVector(LabelledDocument document,
                                                        double learningRate,
                                                        double minLearningRate,
                                                        int iterations)

This method calculates inferred vector for given document

Parameters:: document -
Returns:

inferVector

public org.nd4j.linalg.api.ndarray.INDArray inferVector(@NonNull
                                                        List<VocabWord> document,
                                                        double learningRate,
                                                        double minLearningRate,
                                                        int iterations)

This method calculates inferred vector for given document

Parameters:: document -
Returns:

inferVector
```
public org.nd4j.linalg.api.ndarray.INDArray inferVector(String text)
```
This method calculates inferred vector for given text, with default parameters for learning rate and iterations

Parameters:

text -

Returns:

inferVector
```
public org.nd4j.linalg.api.ndarray.INDArray inferVector(LabelledDocument document)
```
This method calculates inferred vector for given document, with default parameters for learning rate and iterations

Parameters:

document -

Returns:

inferVector

public org.nd4j.linalg.api.ndarray.INDArray inferVector(@NonNull
                                                        List<VocabWord> document)

This method calculates inferred vector for given list of words, with default parameters for learning rate and iterations

Parameters:: document -
Returns:

inferVectorBatched

public Future<org.deeplearning4j.berkeley.Pair<String,org.nd4j.linalg.api.ndarray.INDArray>> inferVectorBatched(@NonNull
                                                                                                                LabelledDocument document)

This method implements batched inference, based on Java Future parallelism model. PLEASE NOTE: In order to use this method, LabelledDocument being passed in should have Id field defined.

Parameters:: document -
Returns:

inferVectorBatched
```
public Future<org.nd4j.linalg.api.ndarray.INDArray> inferVectorBatched(@NonNull
                                                                       String document)
```
This method implements batched inference, based on Java Future parallelism model. PLEASE NOTE: This method will return you Future<INDArray>, so tracking relation between document and INDArray will be your responsibility

Parameters:

document -

Returns:

inferVectorBatched

public List<org.nd4j.linalg.api.ndarray.INDArray> inferVectorBatched(@NonNull
                                                                     List<String> documents)

This method does inference on a given List<String>

Parameters:: documents -
Returns:: INDArrays in the same order as input texts

predict
```
@Deprecated
public String predict(List<VocabWord> document)
```
Deprecated.

This method predicts label of the document. Computes a similarity wrt the mean of the representation of words in the document

Parameters:

document - the document

Returns:

the word distances for each label

predictSeveral

@Deprecated
public Collection<String> predictSeveral(@NonNull
                                                     LabelledDocument document,
                                                     int limit)

Deprecated.

Predict several labels based on the document. Computes a similarity wrt the mean of the representation of words in the document

Parameters:: document - raw text of the document
Returns:: possible labels in descending order

predictSeveral
```
@Deprecated
public Collection<String> predictSeveral(String rawText,
                                                     int limit)
```
Deprecated.

Predict several labels based on the document. Computes a similarity wrt the mean of the representation of words in the document

Parameters:

rawText - raw text of the document

Returns:

possible labels in descending order

predictSeveral
```
@Deprecated
public Collection<String> predictSeveral(List<VocabWord> document,
                                                     int limit)
```
Deprecated.

Predict several labels based on the document. Computes a similarity wrt the mean of the representation of words in the document

Parameters:

document - the document

Returns:

possible labels in descending order

nearestLabels

public Collection<String> nearestLabels(LabelledDocument document,
                                        int topN)

This method returns top N labels nearest to specified document

Parameters:: document -; topN -
Returns:

nearestLabels

public Collection<String> nearestLabels(@NonNull
                                        String rawText,
                                        int topN)

This method returns top N labels nearest to specified text

Parameters:: rawText -; topN -
Returns:

nearestLabels

public Collection<String> nearestLabels(@NonNull
                                        Collection<VocabWord> document,
                                        int topN)

This method returns top N labels nearest to specified set of vocab words

Parameters:: document -; topN -
Returns:

nearestLabels

public Collection<String> nearestLabels(org.nd4j.linalg.api.ndarray.INDArray labelVector,
                                        int topN)

This method returns top N labels nearest to specified features vector

Parameters:: labelVector -; topN -
Returns:

similarityToLabel

@Deprecated
public double similarityToLabel(String rawText,
                                            String label)

Deprecated.

This method returns similarity of the document to specific label, based on mean value

Parameters:: rawText -; label -
Returns:

fit
```
public void fit()
```
Description copied from class: SequenceVectors

Starts training over

Overrides:

fit in class SequenceVectors<VocabWord>

similarityToLabel

@Deprecated
public double similarityToLabel(LabelledDocument document,
                                            String label)

Deprecated.

This method returns similarity of the document to specific label, based on mean value

Parameters:: document -; label -
Returns:

similarityToLabel

@Deprecated
public double similarityToLabel(List<VocabWord> document,
                                            String label)

Deprecated.

This method returns similarity of the document to specific label, based on mean value

Parameters:: document -; label -
Returns:

Class ParagraphVectors

Nested Class Summary

Nested classes/interfaces inherited from class org.deeplearning4j.models.sequencevectors.SequenceVectors

Field Summary

Fields inherited from class org.deeplearning4j.models.word2vec.Word2Vec

Fields inherited from class org.deeplearning4j.models.sequencevectors.SequenceVectors

Fields inherited from class org.deeplearning4j.models.embeddings.wordvectors.WordVectorsImpl

Constructor Summary

Method Summary

Methods inherited from class org.deeplearning4j.models.word2vec.Word2Vec

Methods inherited from class org.deeplearning4j.models.sequencevectors.SequenceVectors

Methods inherited from class org.deeplearning4j.models.embeddings.wordvectors.WordVectorsImpl

Methods inherited from class java.lang.Object

Methods inherited from interface org.deeplearning4j.models.embeddings.wordvectors.WordVectors

Field Detail

labelsSource

labelAwareIterator

labelsMatrix

labelsList

normalizedLabels

inferenceLocker

inferenceExecutor

countSubmitted

countFinished

Constructor Detail

ParagraphVectors

Method Detail

initInference

predict

setSequenceIterator

predict

extractLabels

inferVector

reassignExistingModel

inferVector

inferVector

inferVector

inferVector

inferVector

inferVectorBatched

inferVectorBatched

inferVectorBatched

predict

predictSeveral

predictSeveral

predictSeveral

nearestLabels

nearestLabels

nearestLabels

nearestLabels

similarityToLabel

fit

similarityToLabel

similarityToLabel