public class TfidfVectorizer extends BaseTextVectorizer
| Modifier and Type | Class and Description |
|---|---|
static class |
TfidfVectorizer.Builder |
index, isParallel, iterator, labelsSource, minWordFrequency, stopWords, tokenizerFactory, vocabCache| Constructor and Description |
|---|
TfidfVectorizer() |
| Modifier and Type | Method and Description |
|---|---|
double |
tfidfWord(String word,
long wordCount,
long documentLength) |
org.nd4j.linalg.api.ndarray.INDArray |
transform(List<String> tokens)
Transforms the matrix
|
org.nd4j.linalg.api.ndarray.INDArray |
transform(String text)
Transforms the matrix
|
org.nd4j.linalg.dataset.DataSet |
vectorize()
Vectorizes the input source in to a dataset
|
org.nd4j.linalg.dataset.DataSet |
vectorize(File input,
String label) |
org.nd4j.linalg.dataset.DataSet |
vectorize(InputStream is,
String label)
Text coming from an input stream considered as one document
|
org.nd4j.linalg.dataset.DataSet |
vectorize(String text,
String label)
Vectorizes the passed in text treating it as one document
|
buildVocab, fit, getLabelsSource, numWordsEncounteredclone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, waitgetIndex, getVocabCachepublic org.nd4j.linalg.dataset.DataSet vectorize(InputStream is, String label)
is - the input stream to read fromlabel - the label to assignpublic org.nd4j.linalg.dataset.DataSet vectorize(String text, String label)
text - the text to vectorizelabel - the label of the textpublic org.nd4j.linalg.dataset.DataSet vectorize(File input, String label)
input - the text to vectorizelabel - the label of the textDataSet with a applyTransformToDestination of
weights(relative to impl; could be word counts or tfidf scores)public org.nd4j.linalg.api.ndarray.INDArray transform(String text)
text - text to transformINDArraypublic org.nd4j.linalg.api.ndarray.INDArray transform(List<String> tokens)
TextVectorizerpublic double tfidfWord(String word, long wordCount, long documentLength)
public org.nd4j.linalg.dataset.DataSet vectorize()
Copyright © 2022. All rights reserved.