Class FastText
- java.lang.Object
-
- org.deeplearning4j.models.fasttext.FastText
-
- All Implemented Interfaces:
Serializable,WordVectors,org.deeplearning4j.nn.weights.embeddings.EmbeddingInitializer
public class FastText extends Object implements WordVectors, Serializable
- See Also:
- Serialized Form
-
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description Map<String,Double>accuracy(List<String> questions)Accuracy based on questions which are a space separated list of strings where the first word is the query word, the next 2 words are negative, and the last word is the predicted word to be nearestvoidfit()intgetContextWindowSize()intgetDimension()intgetEpoch()StringgetLabelPrefix()doublegetLearningRate()StringgetLossName()StringgetModelName()intgetNegativesNumber()intgetNumberOfBuckets()StringgetUNK()intgetWordNgrams()double[]getWordVector(String word)Get the word vector for a given matrixorg.nd4j.linalg.api.ndarray.INDArraygetWordVectorMatrix(String word)Get the word vector for a given matrixorg.nd4j.linalg.api.ndarray.INDArraygetWordVectorMatrixNormalized(String word)Returns the word vector divided by the norm2 of the arrayorg.nd4j.linalg.api.ndarray.INDArraygetWordVectors(Collection<String> labels)This method returns 2D array, where each row represents corresponding word/labelorg.nd4j.linalg.api.ndarray.INDArraygetWordVectorsMean(Collection<String> labels)This method returns mean vector, built from words/labels passed inbooleanhasWord(String word)Returns true if the model has this word in the vocabintindexOf(String word)booleanjsonSerializable()voidloadBinaryModel(String modelPath)voidloadIterator()voidloadPretrainedVectors(File vectorsFile)voidloadWeightsInto(org.nd4j.linalg.api.ndarray.INDArray array)WeightLookupTablelookupTable()Lookup table for the vectorsbooleanoutOfVocabularySupported()Does implementation vectorize words absent in vocabularyStringpredict(String text)org.nd4j.common.primitives.Pair<String,Float>predictProbability(String text)voidsetModelUtils(ModelUtils utils)Specifies ModelUtils to be used to access modelvoidsetUNK(String input)doublesimilarity(String word, String word2)Returns the similarity of 2 wordsList<String>similarWordsInVocabTo(String word, double accuracy)Find all words with a similar characters in the vocabvoidtest(File testFile)voidunloadBinaryModel()intvectorSize()VocabCachevocab()Vocab for the vectorslongvocabSize()Collection<String>wordsNearest(String word, int n)Get the top n words most similar to the given wordCollection<String>wordsNearest(Collection<String> positive, Collection<String> negative, int top)Words nearest based on positive and negative wordsCollection<String>wordsNearest(org.nd4j.linalg.api.ndarray.INDArray words, int top)Collection<String>wordsNearestSum(String word, int n)Get the top n words most similar to the given wordCollection<String>wordsNearestSum(Collection<String> positive, Collection<String> negative, int top)Words nearest based on positive and negative wordsCollection<String>wordsNearestSum(org.nd4j.linalg.api.ndarray.INDArray words, int top)
-
-
-
Constructor Detail
-
FastText
public FastText(File modelPath)
-
FastText
public FastText()
-
-
Method Detail
-
fit
public void fit()
-
loadIterator
public void loadIterator()
-
loadPretrainedVectors
public void loadPretrainedVectors(File vectorsFile)
-
loadBinaryModel
public void loadBinaryModel(String modelPath)
-
unloadBinaryModel
public void unloadBinaryModel()
-
test
public void test(File testFile)
-
predictProbability
public org.nd4j.common.primitives.Pair<String,Float> predictProbability(String text)
-
vocab
public VocabCache vocab()
Description copied from interface:WordVectorsVocab for the vectors- Specified by:
vocabin interfaceWordVectors- Returns:
-
vocabSize
public long vocabSize()
- Specified by:
vocabSizein interfaceorg.deeplearning4j.nn.weights.embeddings.EmbeddingInitializer
-
getUNK
public String getUNK()
- Specified by:
getUNKin interfaceWordVectors
-
setUNK
public void setUNK(String input)
- Specified by:
setUNKin interfaceWordVectors
-
getWordVector
public double[] getWordVector(String word)
Description copied from interface:WordVectorsGet the word vector for a given matrix- Specified by:
getWordVectorin interfaceWordVectors- Parameters:
word- the word to get the matrix for- Returns:
- the ndarray for this word
-
getWordVectorMatrixNormalized
public org.nd4j.linalg.api.ndarray.INDArray getWordVectorMatrixNormalized(String word)
Description copied from interface:WordVectorsReturns the word vector divided by the norm2 of the array- Specified by:
getWordVectorMatrixNormalizedin interfaceWordVectors- Parameters:
word- the word to get the matrix for- Returns:
- the looked up matrix
-
getWordVectorMatrix
public org.nd4j.linalg.api.ndarray.INDArray getWordVectorMatrix(String word)
Description copied from interface:WordVectorsGet the word vector for a given matrix- Specified by:
getWordVectorMatrixin interfaceWordVectors- Parameters:
word- the word to get the matrix for- Returns:
- the ndarray for this word
-
getWordVectors
public org.nd4j.linalg.api.ndarray.INDArray getWordVectors(Collection<String> labels)
Description copied from interface:WordVectorsThis method returns 2D array, where each row represents corresponding word/label- Specified by:
getWordVectorsin interfaceWordVectors- Returns:
-
getWordVectorsMean
public org.nd4j.linalg.api.ndarray.INDArray getWordVectorsMean(Collection<String> labels)
Description copied from interface:WordVectorsThis method returns mean vector, built from words/labels passed in- Specified by:
getWordVectorsMeanin interfaceWordVectors- Returns:
-
hasWord
public boolean hasWord(String word)
Description copied from interface:WordVectorsReturns true if the model has this word in the vocab- Specified by:
hasWordin interfaceWordVectors- Parameters:
word- the word to test for- Returns:
- true if the model has the word in the vocab
-
wordsNearest
public Collection<String> wordsNearest(org.nd4j.linalg.api.ndarray.INDArray words, int top)
- Specified by:
wordsNearestin interfaceWordVectors
-
wordsNearestSum
public Collection<String> wordsNearestSum(org.nd4j.linalg.api.ndarray.INDArray words, int top)
- Specified by:
wordsNearestSumin interfaceWordVectors
-
wordsNearestSum
public Collection<String> wordsNearestSum(String word, int n)
Description copied from interface:WordVectorsGet the top n words most similar to the given word- Specified by:
wordsNearestSumin interfaceWordVectors- Parameters:
word- the word to comparen- the n to get- Returns:
- the top n words
-
wordsNearestSum
public Collection<String> wordsNearestSum(Collection<String> positive, Collection<String> negative, int top)
Description copied from interface:WordVectorsWords nearest based on positive and negative words- Specified by:
wordsNearestSumin interfaceWordVectors- Parameters:
positive- the positive wordsnegative- the negative wordstop- the top n words- Returns:
- the words nearest the mean of the words
-
accuracy
public Map<String,Double> accuracy(List<String> questions)
Description copied from interface:WordVectorsAccuracy based on questions which are a space separated list of strings where the first word is the query word, the next 2 words are negative, and the last word is the predicted word to be nearest- Specified by:
accuracyin interfaceWordVectors- Parameters:
questions- the questions to ask- Returns:
- the accuracy based on these questions
-
indexOf
public int indexOf(String word)
- Specified by:
indexOfin interfaceWordVectors
-
similarWordsInVocabTo
public List<String> similarWordsInVocabTo(String word, double accuracy)
Description copied from interface:WordVectorsFind all words with a similar characters in the vocab- Specified by:
similarWordsInVocabToin interfaceWordVectors- Parameters:
word- the word to compareaccuracy- the accuracy: 0 to 1- Returns:
- the list of words that are similar in the vocab
-
wordsNearest
public Collection<String> wordsNearest(Collection<String> positive, Collection<String> negative, int top)
Description copied from interface:WordVectorsWords nearest based on positive and negative words- Specified by:
wordsNearestin interfaceWordVectors- Parameters:
positive- the positive wordsnegative- the negative wordstop- the top n words- Returns:
- the words nearest the mean of the words
-
wordsNearest
public Collection<String> wordsNearest(String word, int n)
Description copied from interface:WordVectorsGet the top n words most similar to the given word- Specified by:
wordsNearestin interfaceWordVectors- Parameters:
word- the word to comparen- the n to get- Returns:
- the top n words
-
similarity
public double similarity(String word, String word2)
Description copied from interface:WordVectorsReturns the similarity of 2 words- Specified by:
similarityin interfaceWordVectors- Parameters:
word- the first wordword2- the second word- Returns:
- a normalized similarity (cosine similarity)
-
lookupTable
public WeightLookupTable lookupTable()
Description copied from interface:WordVectorsLookup table for the vectors- Specified by:
lookupTablein interfaceWordVectors- Returns:
-
setModelUtils
public void setModelUtils(ModelUtils utils)
Description copied from interface:WordVectorsSpecifies ModelUtils to be used to access model- Specified by:
setModelUtilsin interfaceWordVectors
-
loadWeightsInto
public void loadWeightsInto(org.nd4j.linalg.api.ndarray.INDArray array)
- Specified by:
loadWeightsIntoin interfaceorg.deeplearning4j.nn.weights.embeddings.EmbeddingInitializer
-
vectorSize
public int vectorSize()
- Specified by:
vectorSizein interfaceorg.deeplearning4j.nn.weights.embeddings.EmbeddingInitializer
-
jsonSerializable
public boolean jsonSerializable()
- Specified by:
jsonSerializablein interfaceorg.deeplearning4j.nn.weights.embeddings.EmbeddingInitializer
-
getLearningRate
public double getLearningRate()
-
getDimension
public int getDimension()
-
getContextWindowSize
public int getContextWindowSize()
-
getEpoch
public int getEpoch()
-
getNegativesNumber
public int getNegativesNumber()
-
getWordNgrams
public int getWordNgrams()
-
getLossName
public String getLossName()
-
getModelName
public String getModelName()
-
getNumberOfBuckets
public int getNumberOfBuckets()
-
getLabelPrefix
public String getLabelPrefix()
-
outOfVocabularySupported
public boolean outOfVocabularySupported()
Description copied from interface:WordVectorsDoes implementation vectorize words absent in vocabulary- Specified by:
outOfVocabularySupportedin interfaceWordVectors- Returns:
- boolean
-
-