public class TfidfVectorizer extends BaseTextVectorizer implements Serializable
Modifier and Type | Class and Description |
---|---|
static class |
TfidfVectorizer.Builder |
batchSize, cache, cleanup, docIter, index, labels, labelSentenceIter, minWordFrequency, numWordsEncountered, sample, sentenceIterator, stem, stopWords, tokenizerFactory, trainingSystem
Modifier | Constructor and Description |
---|---|
|
TfidfVectorizer() |
protected |
TfidfVectorizer(VocabCache cache,
TokenizerFactory tokenizerFactory,
List<String> stopWords,
int minWordFrequency,
DocumentIterator docIter,
SentenceIterator sentenceIterator,
List<String> labels,
InvertedIndex index,
int batchSize,
double sample,
boolean stem,
boolean cleanup) |
Modifier and Type | Method and Description |
---|---|
org.nd4j.linalg.api.ndarray.INDArray |
transform(String text)
Transforms the matrix
|
org.nd4j.linalg.dataset.DataSet |
vectorize() |
org.nd4j.linalg.dataset.DataSet |
vectorize(File input,
String label) |
org.nd4j.linalg.dataset.DataSet |
vectorize(InputStream is,
String label)
Text coming from an input stream considered as one document
|
org.nd4j.linalg.dataset.DataSet |
vectorize(String text,
String label)
Vectorizes the passed in text treating it as one document
|
batchSize, fit, getCache, getDocIter, getMinWordFrequency, getSentenceIterator, getStopWords, getTokenizerFactory, index, numWordsEncountered, sample, setCache, setDocIter, setMinWordFrequency, setSentenceIterator, setStopWords, setTokenizerFactory, vocab
public TfidfVectorizer()
protected TfidfVectorizer(VocabCache cache, TokenizerFactory tokenizerFactory, List<String> stopWords, int minWordFrequency, DocumentIterator docIter, SentenceIterator sentenceIterator, List<String> labels, InvertedIndex index, int batchSize, double sample, boolean stem, boolean cleanup)
public org.nd4j.linalg.dataset.DataSet vectorize(InputStream is, String label)
TextVectorizer
vectorize
in interface TextVectorizer
is
- the input stream to read fromlabel
- the label to assignpublic org.nd4j.linalg.dataset.DataSet vectorize(String text, String label)
TextVectorizer
vectorize
in interface TextVectorizer
text
- the text to vectorizelabel
- the label of the textpublic org.nd4j.linalg.dataset.DataSet vectorize(File input, String label)
vectorize
in interface TextVectorizer
input
- the text to vectorizepublic org.nd4j.linalg.api.ndarray.INDArray transform(String text)
transform
in interface TextVectorizer
text
- public org.nd4j.linalg.dataset.DataSet vectorize()
vectorize
in interface Vectorizer
Copyright © 2015. All rights reserved.