Class FastText
- java.lang.Object
-
- org.deeplearning4j.models.fasttext.FastText
-
- All Implemented Interfaces:
Serializable
,WordVectors
,org.deeplearning4j.nn.weights.embeddings.EmbeddingInitializer
public class FastText extends Object implements WordVectors, Serializable
- See Also:
- Serialized Form
-
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description Map<String,Double>
accuracy(List<String> questions)
Accuracy based on questions which are a space separated list of strings where the first word is the query word, the next 2 words are negative, and the last word is the predicted word to be nearestvoid
fit()
int
getContextWindowSize()
int
getDimension()
int
getEpoch()
String
getLabelPrefix()
double
getLearningRate()
String
getLossName()
String
getModelName()
int
getNegativesNumber()
int
getNumberOfBuckets()
String
getUNK()
int
getWordNgrams()
double[]
getWordVector(String word)
Get the word vector for a given matrixorg.nd4j.linalg.api.ndarray.INDArray
getWordVectorMatrix(String word)
Get the word vector for a given matrixorg.nd4j.linalg.api.ndarray.INDArray
getWordVectorMatrixNormalized(String word)
Returns the word vector divided by the norm2 of the arrayorg.nd4j.linalg.api.ndarray.INDArray
getWordVectors(Collection<String> labels)
This method returns 2D array, where each row represents corresponding word/labelorg.nd4j.linalg.api.ndarray.INDArray
getWordVectorsMean(Collection<String> labels)
This method returns mean vector, built from words/labels passed inboolean
hasWord(String word)
Returns true if the model has this word in the vocabint
indexOf(String word)
boolean
jsonSerializable()
void
loadBinaryModel(String modelPath)
void
loadIterator()
void
loadPretrainedVectors(File vectorsFile)
void
loadWeightsInto(org.nd4j.linalg.api.ndarray.INDArray array)
WeightLookupTable
lookupTable()
Lookup table for the vectorsboolean
outOfVocabularySupported()
Does implementation vectorize words absent in vocabularyString
predict(String text)
org.nd4j.common.primitives.Pair<String,Float>
predictProbability(String text)
void
setModelUtils(ModelUtils utils)
Specifies ModelUtils to be used to access modelvoid
setUNK(String input)
double
similarity(String word, String word2)
Returns the similarity of 2 wordsList<String>
similarWordsInVocabTo(String word, double accuracy)
Find all words with a similar characters in the vocabvoid
test(File testFile)
void
unloadBinaryModel()
int
vectorSize()
VocabCache
vocab()
Vocab for the vectorslong
vocabSize()
Collection<String>
wordsNearest(String word, int n)
Get the top n words most similar to the given wordCollection<String>
wordsNearest(Collection<String> positive, Collection<String> negative, int top)
Words nearest based on positive and negative wordsCollection<String>
wordsNearest(org.nd4j.linalg.api.ndarray.INDArray words, int top)
Collection<String>
wordsNearestSum(String word, int n)
Get the top n words most similar to the given wordCollection<String>
wordsNearestSum(Collection<String> positive, Collection<String> negative, int top)
Words nearest based on positive and negative wordsCollection<String>
wordsNearestSum(org.nd4j.linalg.api.ndarray.INDArray words, int top)
-
-
-
Constructor Detail
-
FastText
public FastText(File modelPath)
-
FastText
public FastText()
-
-
Method Detail
-
fit
public void fit()
-
loadIterator
public void loadIterator()
-
loadPretrainedVectors
public void loadPretrainedVectors(File vectorsFile)
-
loadBinaryModel
public void loadBinaryModel(String modelPath)
-
unloadBinaryModel
public void unloadBinaryModel()
-
test
public void test(File testFile)
-
predictProbability
public org.nd4j.common.primitives.Pair<String,Float> predictProbability(String text)
-
vocab
public VocabCache vocab()
Description copied from interface:WordVectors
Vocab for the vectors- Specified by:
vocab
in interfaceWordVectors
- Returns:
-
vocabSize
public long vocabSize()
- Specified by:
vocabSize
in interfaceorg.deeplearning4j.nn.weights.embeddings.EmbeddingInitializer
-
getUNK
public String getUNK()
- Specified by:
getUNK
in interfaceWordVectors
-
setUNK
public void setUNK(String input)
- Specified by:
setUNK
in interfaceWordVectors
-
getWordVector
public double[] getWordVector(String word)
Description copied from interface:WordVectors
Get the word vector for a given matrix- Specified by:
getWordVector
in interfaceWordVectors
- Parameters:
word
- the word to get the matrix for- Returns:
- the ndarray for this word
-
getWordVectorMatrixNormalized
public org.nd4j.linalg.api.ndarray.INDArray getWordVectorMatrixNormalized(String word)
Description copied from interface:WordVectors
Returns the word vector divided by the norm2 of the array- Specified by:
getWordVectorMatrixNormalized
in interfaceWordVectors
- Parameters:
word
- the word to get the matrix for- Returns:
- the looked up matrix
-
getWordVectorMatrix
public org.nd4j.linalg.api.ndarray.INDArray getWordVectorMatrix(String word)
Description copied from interface:WordVectors
Get the word vector for a given matrix- Specified by:
getWordVectorMatrix
in interfaceWordVectors
- Parameters:
word
- the word to get the matrix for- Returns:
- the ndarray for this word
-
getWordVectors
public org.nd4j.linalg.api.ndarray.INDArray getWordVectors(Collection<String> labels)
Description copied from interface:WordVectors
This method returns 2D array, where each row represents corresponding word/label- Specified by:
getWordVectors
in interfaceWordVectors
- Returns:
-
getWordVectorsMean
public org.nd4j.linalg.api.ndarray.INDArray getWordVectorsMean(Collection<String> labels)
Description copied from interface:WordVectors
This method returns mean vector, built from words/labels passed in- Specified by:
getWordVectorsMean
in interfaceWordVectors
- Returns:
-
hasWord
public boolean hasWord(String word)
Description copied from interface:WordVectors
Returns true if the model has this word in the vocab- Specified by:
hasWord
in interfaceWordVectors
- Parameters:
word
- the word to test for- Returns:
- true if the model has the word in the vocab
-
wordsNearest
public Collection<String> wordsNearest(org.nd4j.linalg.api.ndarray.INDArray words, int top)
- Specified by:
wordsNearest
in interfaceWordVectors
-
wordsNearestSum
public Collection<String> wordsNearestSum(org.nd4j.linalg.api.ndarray.INDArray words, int top)
- Specified by:
wordsNearestSum
in interfaceWordVectors
-
wordsNearestSum
public Collection<String> wordsNearestSum(String word, int n)
Description copied from interface:WordVectors
Get the top n words most similar to the given word- Specified by:
wordsNearestSum
in interfaceWordVectors
- Parameters:
word
- the word to comparen
- the n to get- Returns:
- the top n words
-
wordsNearestSum
public Collection<String> wordsNearestSum(Collection<String> positive, Collection<String> negative, int top)
Description copied from interface:WordVectors
Words nearest based on positive and negative words- Specified by:
wordsNearestSum
in interfaceWordVectors
- Parameters:
positive
- the positive wordsnegative
- the negative wordstop
- the top n words- Returns:
- the words nearest the mean of the words
-
accuracy
public Map<String,Double> accuracy(List<String> questions)
Description copied from interface:WordVectors
Accuracy based on questions which are a space separated list of strings where the first word is the query word, the next 2 words are negative, and the last word is the predicted word to be nearest- Specified by:
accuracy
in interfaceWordVectors
- Parameters:
questions
- the questions to ask- Returns:
- the accuracy based on these questions
-
indexOf
public int indexOf(String word)
- Specified by:
indexOf
in interfaceWordVectors
-
similarWordsInVocabTo
public List<String> similarWordsInVocabTo(String word, double accuracy)
Description copied from interface:WordVectors
Find all words with a similar characters in the vocab- Specified by:
similarWordsInVocabTo
in interfaceWordVectors
- Parameters:
word
- the word to compareaccuracy
- the accuracy: 0 to 1- Returns:
- the list of words that are similar in the vocab
-
wordsNearest
public Collection<String> wordsNearest(Collection<String> positive, Collection<String> negative, int top)
Description copied from interface:WordVectors
Words nearest based on positive and negative words- Specified by:
wordsNearest
in interfaceWordVectors
- Parameters:
positive
- the positive wordsnegative
- the negative wordstop
- the top n words- Returns:
- the words nearest the mean of the words
-
wordsNearest
public Collection<String> wordsNearest(String word, int n)
Description copied from interface:WordVectors
Get the top n words most similar to the given word- Specified by:
wordsNearest
in interfaceWordVectors
- Parameters:
word
- the word to comparen
- the n to get- Returns:
- the top n words
-
similarity
public double similarity(String word, String word2)
Description copied from interface:WordVectors
Returns the similarity of 2 words- Specified by:
similarity
in interfaceWordVectors
- Parameters:
word
- the first wordword2
- the second word- Returns:
- a normalized similarity (cosine similarity)
-
lookupTable
public WeightLookupTable lookupTable()
Description copied from interface:WordVectors
Lookup table for the vectors- Specified by:
lookupTable
in interfaceWordVectors
- Returns:
-
setModelUtils
public void setModelUtils(ModelUtils utils)
Description copied from interface:WordVectors
Specifies ModelUtils to be used to access model- Specified by:
setModelUtils
in interfaceWordVectors
-
loadWeightsInto
public void loadWeightsInto(org.nd4j.linalg.api.ndarray.INDArray array)
- Specified by:
loadWeightsInto
in interfaceorg.deeplearning4j.nn.weights.embeddings.EmbeddingInitializer
-
vectorSize
public int vectorSize()
- Specified by:
vectorSize
in interfaceorg.deeplearning4j.nn.weights.embeddings.EmbeddingInitializer
-
jsonSerializable
public boolean jsonSerializable()
- Specified by:
jsonSerializable
in interfaceorg.deeplearning4j.nn.weights.embeddings.EmbeddingInitializer
-
getLearningRate
public double getLearningRate()
-
getDimension
public int getDimension()
-
getContextWindowSize
public int getContextWindowSize()
-
getEpoch
public int getEpoch()
-
getNegativesNumber
public int getNegativesNumber()
-
getWordNgrams
public int getWordNgrams()
-
getLossName
public String getLossName()
-
getModelName
public String getModelName()
-
getNumberOfBuckets
public int getNumberOfBuckets()
-
getLabelPrefix
public String getLabelPrefix()
-
outOfVocabularySupported
public boolean outOfVocabularySupported()
Description copied from interface:WordVectors
Does implementation vectorize words absent in vocabulary- Specified by:
outOfVocabularySupported
in interfaceWordVectors
- Returns:
- boolean
-
-