public interface TextVectorizer extends Vectorizer
Modifier and Type | Method and Description |
---|---|
int |
batchSize()
For word vectors, this is the batch size for how to partition documents
in to workloads
|
void |
fit()
Train the model
|
InvertedIndex |
index()
Inverted index
|
long |
numWordsEncountered()
Returns the number of words encountered so far
|
double |
sample()
Sampling for building mini batches
|
org.nd4j.linalg.api.ndarray.INDArray |
transform(String text)
Transforms the matrix
|
org.nd4j.linalg.dataset.DataSet |
vectorize(File input,
String label) |
org.nd4j.linalg.dataset.DataSet |
vectorize(InputStream is,
String label)
Text coming from an input stream considered as one document
|
org.nd4j.linalg.dataset.DataSet |
vectorize(String text,
String label)
Vectorizes the passed in text treating it as one document
|
VocabCache |
vocab()
The vocab sorted in descending order
|
vectorize
double sample()
int batchSize()
VocabCache vocab()
org.nd4j.linalg.dataset.DataSet vectorize(InputStream is, String label)
is
- the input stream to read fromlabel
- the label to assignorg.nd4j.linalg.dataset.DataSet vectorize(String text, String label)
text
- the text to vectorizelabel
- the label of the textvoid fit()
org.nd4j.linalg.dataset.DataSet vectorize(File input, String label)
input
- the text to vectorizeLABEL
- the label of the textorg.nd4j.linalg.api.ndarray.INDArray transform(String text)
text
- long numWordsEncountered()
InvertedIndex index()
Copyright © 2015. All rights reserved.