Interface TextVectorizer
-
- All Superinterfaces:
Serializable,Vectorizer
- All Known Implementing Classes:
BagOfWordsVectorizer,BaseTextVectorizer,TfidfVectorizer
public interface TextVectorizer extends Vectorizer
-
-
Method Summary
All Methods Instance Methods Abstract Methods Modifier and Type Method Description voidfit()Train the modelInvertedIndex<VocabWord>getIndex()Inverted indexVocabCache<VocabWord>getVocabCache()The vocab sorted in descending orderlongnumWordsEncountered()Returns the number of words encountered so farorg.nd4j.linalg.api.ndarray.INDArraytransform(String text)Transforms the matrixorg.nd4j.linalg.api.ndarray.INDArraytransform(List<String> tokens)Transforms the matrixorg.nd4j.linalg.dataset.DataSetvectorize(File input, String label)org.nd4j.linalg.dataset.DataSetvectorize(InputStream is, String label)Text coming from an input stream considered as one documentorg.nd4j.linalg.dataset.DataSetvectorize(String text, String label)Vectorizes the passed in text treating it as one document-
Methods inherited from interface org.deeplearning4j.core.datasets.vectorizer.Vectorizer
vectorize
-
-
-
-
Method Detail
-
getVocabCache
VocabCache<VocabWord> getVocabCache()
The vocab sorted in descending order- Returns:
- the vocab sorted in descending order
-
vectorize
org.nd4j.linalg.dataset.DataSet vectorize(InputStream is, String label)
Text coming from an input stream considered as one document- Parameters:
is- the input stream to read fromlabel- the label to assign- Returns:
- a dataset with a applyTransformToDestination of weights(relative to impl; could be word counts or tfidf scores)
-
vectorize
org.nd4j.linalg.dataset.DataSet vectorize(String text, String label)
Vectorizes the passed in text treating it as one document- Parameters:
text- the text to vectorizelabel- the label of the text- Returns:
- a dataset with a transform of weights(relative to impl; could be word counts or tfidf scores)
-
fit
void fit()
Train the model
-
vectorize
org.nd4j.linalg.dataset.DataSet vectorize(File input, String label)
- Parameters:
input- the text to vectorizelabel- the label of the text- Returns:
DataSetwith a applyTransformToDestination of weights(relative to impl; could be word counts or tfidf scores)
-
transform
org.nd4j.linalg.api.ndarray.INDArray transform(String text)
Transforms the matrix- Parameters:
text- text to transform- Returns:
INDArray
-
transform
org.nd4j.linalg.api.ndarray.INDArray transform(List<String> tokens)
Transforms the matrix- Parameters:
tokens-- Returns:
-
numWordsEncountered
long numWordsEncountered()
Returns the number of words encountered so far- Returns:
- the number of words encountered so far
-
getIndex
InvertedIndex<VocabWord> getIndex()
Inverted index- Returns:
- the inverted index for this vectorizer
-
-