Class WordVectorsImpl<T extends SequenceElement>
- java.lang.Object
-
- org.deeplearning4j.models.embeddings.wordvectors.WordVectorsImpl<T>
-
- All Implemented Interfaces:
Serializable
,WordVectors
,org.deeplearning4j.nn.weights.embeddings.EmbeddingInitializer
- Direct Known Subclasses:
SequenceVectors
public class WordVectorsImpl<T extends SequenceElement> extends Object implements WordVectors
- See Also:
- Serialized Form
-
-
Field Summary
Fields Modifier and Type Field Description protected int
batchSize
static String
DEFAULT_UNK
protected int
layerSize
protected org.nd4j.shade.guava.util.concurrent.AtomicDouble
learningRate
protected int
learningRateDecayWords
protected WeightLookupTable<T>
lookupTable
protected double
minLearningRate
protected int
minWordFrequency
protected ModelUtils<T>
modelUtils
protected double
negative
protected int
numEpochs
protected int
numIterations
protected boolean
resetModel
protected double
sampling
protected long
seed
protected Collection<String>
stopWords
protected boolean
trainElementsVectors
protected boolean
trainSequenceVectors
protected boolean
useAdeGrad
protected boolean
useUnknown
protected int[]
variableWindows
protected VocabCache<T>
vocab
protected int
window
protected int
workers
-
Constructor Summary
Constructors Constructor Description WordVectorsImpl()
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description Map<String,Double>
accuracy(List<String> questions)
Accuracy based on questions which are a space separated list of strings where the first word is the query word, the next 2 words are negative, and the last word is the predicted word to be nearestint
getLayerSize()
This method returns word vector sizedouble[]
getWordVector(String word)
Get the word vector for a given matrixorg.nd4j.linalg.api.ndarray.INDArray
getWordVectorMatrix(String word)
Get the word vector for a given matrixorg.nd4j.linalg.api.ndarray.INDArray
getWordVectorMatrixNormalized(String word)
Returns the word vector divided by the norm2 of the arrayorg.nd4j.linalg.api.ndarray.INDArray
getWordVectors(@NonNull Collection<String> labels)
This method returns 2D array, where each row represents corresponding labelorg.nd4j.linalg.api.ndarray.INDArray
getWordVectorsMean(Collection<String> labels)
This method returns mean vector, built from words/labels passed inboolean
hasWord(String word)
Returns true if the model has this word in the vocabint
indexOf(String word)
boolean
jsonSerializable()
void
loadWeightsInto(org.nd4j.linalg.api.ndarray.INDArray array)
WeightLookupTable
lookupTable()
Lookup table for the vectorsboolean
outOfVocabularySupported()
Does implementation vectorize words absent in vocabularyvoid
setLookupTable(@NonNull WeightLookupTable lookupTable)
void
setModelUtils(@NonNull ModelUtils modelUtils)
Specifies ModelUtils to be used to access modelvoid
setVocab(VocabCache vocab)
double
similarity(String word, String word2)
Returns similarity of two elements, provided by ModelUtilsList<String>
similarWordsInVocabTo(String word, double accuracy)
Find all words with a similar characters in the vocabprotected void
update()
protected void
update(org.nd4j.linalg.heartbeat.reports.Environment env, org.nd4j.linalg.heartbeat.reports.Event event)
int
vectorSize()
VocabCache<T>
vocab()
Vocab for the vectorslong
vocabSize()
Collection<String>
wordsNearest(String word, int n)
Get the top n words most similar to the given wordCollection<String>
wordsNearest(Collection<String> positive, Collection<String> negative, int top)
Words nearest based on positive and negative wordsCollection<String>
wordsNearest(org.nd4j.linalg.api.ndarray.INDArray words, int top)
Words nearest based on positive and negative words * @param top the top n wordsCollection<String>
wordsNearestSum(String word, int n)
Get the top n words most similar to the given wordCollection<String>
wordsNearestSum(Collection<String> positive, Collection<String> negative, int top)
Words nearest based on positive and negative wordsCollection<String>
wordsNearestSum(org.nd4j.linalg.api.ndarray.INDArray words, int top)
Words nearest based on positive and negative words * @param top the top n words-
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
-
Methods inherited from interface org.deeplearning4j.models.embeddings.wordvectors.WordVectors
getUNK, setUNK
-
-
-
-
Field Detail
-
minWordFrequency
protected int minWordFrequency
-
lookupTable
protected WeightLookupTable<T extends SequenceElement> lookupTable
-
vocab
protected VocabCache<T extends SequenceElement> vocab
-
layerSize
protected int layerSize
-
modelUtils
protected transient ModelUtils<T extends SequenceElement> modelUtils
-
numIterations
protected int numIterations
-
numEpochs
protected int numEpochs
-
negative
protected double negative
-
sampling
protected double sampling
-
learningRate
protected org.nd4j.shade.guava.util.concurrent.AtomicDouble learningRate
-
minLearningRate
protected double minLearningRate
-
window
protected int window
-
batchSize
protected int batchSize
-
learningRateDecayWords
protected int learningRateDecayWords
-
resetModel
protected boolean resetModel
-
useAdeGrad
protected boolean useAdeGrad
-
workers
protected int workers
-
trainSequenceVectors
protected boolean trainSequenceVectors
-
trainElementsVectors
protected boolean trainElementsVectors
-
seed
protected long seed
-
useUnknown
protected boolean useUnknown
-
variableWindows
protected int[] variableWindows
-
DEFAULT_UNK
public static final String DEFAULT_UNK
- See Also:
- Constant Field Values
-
stopWords
protected Collection<String> stopWords
-
-
Method Detail
-
getLayerSize
public int getLayerSize()
This method returns word vector size- Returns:
-
hasWord
public boolean hasWord(String word)
Returns true if the model has this word in the vocab- Specified by:
hasWord
in interfaceWordVectors
- Parameters:
word
- the word to test for- Returns:
- true if the model has the word in the vocab
-
wordsNearestSum
public Collection<String> wordsNearestSum(Collection<String> positive, Collection<String> negative, int top)
Words nearest based on positive and negative words- Specified by:
wordsNearestSum
in interfaceWordVectors
- Parameters:
positive
- the positive wordsnegative
- the negative wordstop
- the top n words- Returns:
- the words nearest the mean of the words
-
wordsNearestSum
public Collection<String> wordsNearestSum(org.nd4j.linalg.api.ndarray.INDArray words, int top)
Words nearest based on positive and negative words * @param top the top n words- Specified by:
wordsNearestSum
in interfaceWordVectors
- Returns:
- the words nearest the mean of the words
-
wordsNearest
public Collection<String> wordsNearest(org.nd4j.linalg.api.ndarray.INDArray words, int top)
Words nearest based on positive and negative words * @param top the top n words- Specified by:
wordsNearest
in interfaceWordVectors
- Returns:
- the words nearest the mean of the words
-
wordsNearestSum
public Collection<String> wordsNearestSum(String word, int n)
Get the top n words most similar to the given word- Specified by:
wordsNearestSum
in interfaceWordVectors
- Parameters:
word
- the word to comparen
- the n to get- Returns:
- the top n words
-
accuracy
public Map<String,Double> accuracy(List<String> questions)
Accuracy based on questions which are a space separated list of strings where the first word is the query word, the next 2 words are negative, and the last word is the predicted word to be nearest- Specified by:
accuracy
in interfaceWordVectors
- Parameters:
questions
- the questions to ask- Returns:
- the accuracy based on these questions
-
indexOf
public int indexOf(String word)
- Specified by:
indexOf
in interfaceWordVectors
-
similarWordsInVocabTo
public List<String> similarWordsInVocabTo(String word, double accuracy)
Find all words with a similar characters in the vocab- Specified by:
similarWordsInVocabTo
in interfaceWordVectors
- Parameters:
word
- the word to compareaccuracy
- the accuracy: 0 to 1- Returns:
- the list of words that are similar in the vocab
-
getWordVector
public double[] getWordVector(String word)
Get the word vector for a given matrix- Specified by:
getWordVector
in interfaceWordVectors
- Parameters:
word
- the word to get the matrix for- Returns:
- the ndarray for this word
-
getWordVectorMatrixNormalized
public org.nd4j.linalg.api.ndarray.INDArray getWordVectorMatrixNormalized(String word)
Returns the word vector divided by the norm2 of the array- Specified by:
getWordVectorMatrixNormalized
in interfaceWordVectors
- Parameters:
word
- the word to get the matrix for- Returns:
- the looked up matrix
-
getWordVectorMatrix
public org.nd4j.linalg.api.ndarray.INDArray getWordVectorMatrix(String word)
Description copied from interface:WordVectors
Get the word vector for a given matrix- Specified by:
getWordVectorMatrix
in interfaceWordVectors
- Parameters:
word
- the word to get the matrix for- Returns:
- the ndarray for this word
-
wordsNearest
public Collection<String> wordsNearest(Collection<String> positive, Collection<String> negative, int top)
Words nearest based on positive and negative words- Specified by:
wordsNearest
in interfaceWordVectors
- Parameters:
positive
- the positive wordsnegative
- the negative wordstop
- the top n words- Returns:
- the words nearest the mean of the words
-
getWordVectors
public org.nd4j.linalg.api.ndarray.INDArray getWordVectors(@NonNull @NonNull Collection<String> labels)
This method returns 2D array, where each row represents corresponding label- Specified by:
getWordVectors
in interfaceWordVectors
- Parameters:
labels
-- Returns:
-
getWordVectorsMean
public org.nd4j.linalg.api.ndarray.INDArray getWordVectorsMean(Collection<String> labels)
This method returns mean vector, built from words/labels passed in- Specified by:
getWordVectorsMean
in interfaceWordVectors
- Parameters:
labels
-- Returns:
-
wordsNearest
public Collection<String> wordsNearest(String word, int n)
Get the top n words most similar to the given word- Specified by:
wordsNearest
in interfaceWordVectors
- Parameters:
word
- the word to comparen
- the n to get- Returns:
- the top n words
-
similarity
public double similarity(String word, String word2)
Returns similarity of two elements, provided by ModelUtils- Specified by:
similarity
in interfaceWordVectors
- Parameters:
word
- the first wordword2
- the second word- Returns:
- a normalized similarity (cosine similarity)
-
vocab
public VocabCache<T> vocab()
Description copied from interface:WordVectors
Vocab for the vectors- Specified by:
vocab
in interfaceWordVectors
- Returns:
-
lookupTable
public WeightLookupTable lookupTable()
Description copied from interface:WordVectors
Lookup table for the vectors- Specified by:
lookupTable
in interfaceWordVectors
- Returns:
-
setModelUtils
public void setModelUtils(@NonNull @NonNull ModelUtils modelUtils)
Description copied from interface:WordVectors
Specifies ModelUtils to be used to access model- Specified by:
setModelUtils
in interfaceWordVectors
-
setLookupTable
public void setLookupTable(@NonNull @NonNull WeightLookupTable lookupTable)
-
setVocab
public void setVocab(VocabCache vocab)
-
update
protected void update()
-
update
protected void update(org.nd4j.linalg.heartbeat.reports.Environment env, org.nd4j.linalg.heartbeat.reports.Event event)
-
loadWeightsInto
public void loadWeightsInto(org.nd4j.linalg.api.ndarray.INDArray array)
- Specified by:
loadWeightsInto
in interfaceorg.deeplearning4j.nn.weights.embeddings.EmbeddingInitializer
-
vocabSize
public long vocabSize()
- Specified by:
vocabSize
in interfaceorg.deeplearning4j.nn.weights.embeddings.EmbeddingInitializer
-
vectorSize
public int vectorSize()
- Specified by:
vectorSize
in interfaceorg.deeplearning4j.nn.weights.embeddings.EmbeddingInitializer
-
jsonSerializable
public boolean jsonSerializable()
- Specified by:
jsonSerializable
in interfaceorg.deeplearning4j.nn.weights.embeddings.EmbeddingInitializer
-
outOfVocabularySupported
public boolean outOfVocabularySupported()
Description copied from interface:WordVectors
Does implementation vectorize words absent in vocabulary- Specified by:
outOfVocabularySupported
in interfaceWordVectors
- Returns:
- boolean
-
-