Interface TextVectorizer
-
- All Superinterfaces:
Serializable
,Vectorizer
- All Known Implementing Classes:
BagOfWordsVectorizer
,BaseTextVectorizer
,TfidfVectorizer
public interface TextVectorizer extends Vectorizer
-
-
Method Summary
All Methods Instance Methods Abstract Methods Modifier and Type Method Description void
fit()
Train the modelInvertedIndex<VocabWord>
getIndex()
Inverted indexVocabCache<VocabWord>
getVocabCache()
The vocab sorted in descending orderlong
numWordsEncountered()
Returns the number of words encountered so farorg.nd4j.linalg.api.ndarray.INDArray
transform(String text)
Transforms the matrixorg.nd4j.linalg.api.ndarray.INDArray
transform(List<String> tokens)
Transforms the matrixorg.nd4j.linalg.dataset.DataSet
vectorize(File input, String label)
org.nd4j.linalg.dataset.DataSet
vectorize(InputStream is, String label)
Text coming from an input stream considered as one documentorg.nd4j.linalg.dataset.DataSet
vectorize(String text, String label)
Vectorizes the passed in text treating it as one document-
Methods inherited from interface org.deeplearning4j.core.datasets.vectorizer.Vectorizer
vectorize
-
-
-
-
Method Detail
-
getVocabCache
VocabCache<VocabWord> getVocabCache()
The vocab sorted in descending order- Returns:
- the vocab sorted in descending order
-
vectorize
org.nd4j.linalg.dataset.DataSet vectorize(InputStream is, String label)
Text coming from an input stream considered as one document- Parameters:
is
- the input stream to read fromlabel
- the label to assign- Returns:
- a dataset with a applyTransformToDestination of weights(relative to impl; could be word counts or tfidf scores)
-
vectorize
org.nd4j.linalg.dataset.DataSet vectorize(String text, String label)
Vectorizes the passed in text treating it as one document- Parameters:
text
- the text to vectorizelabel
- the label of the text- Returns:
- a dataset with a transform of weights(relative to impl; could be word counts or tfidf scores)
-
fit
void fit()
Train the model
-
vectorize
org.nd4j.linalg.dataset.DataSet vectorize(File input, String label)
- Parameters:
input
- the text to vectorizelabel
- the label of the text- Returns:
DataSet
with a applyTransformToDestination of weights(relative to impl; could be word counts or tfidf scores)
-
transform
org.nd4j.linalg.api.ndarray.INDArray transform(String text)
Transforms the matrix- Parameters:
text
- text to transform- Returns:
INDArray
-
transform
org.nd4j.linalg.api.ndarray.INDArray transform(List<String> tokens)
Transforms the matrix- Parameters:
tokens
-- Returns:
-
numWordsEncountered
long numWordsEncountered()
Returns the number of words encountered so far- Returns:
- the number of words encountered so far
-
getIndex
InvertedIndex<VocabWord> getIndex()
Inverted index- Returns:
- the inverted index for this vectorizer
-
-