Class TfidfVectorizer
- java.lang.Object
-
- org.deeplearning4j.bagofwords.vectorizer.BaseTextVectorizer
-
- org.deeplearning4j.bagofwords.vectorizer.TfidfVectorizer
-
- All Implemented Interfaces:
Serializable,TextVectorizer,Vectorizer
public class TfidfVectorizer extends BaseTextVectorizer
- See Also:
- Serialized Form
-
-
Nested Class Summary
Nested Classes Modifier and Type Class Description static classTfidfVectorizer.Builder
-
Field Summary
-
Fields inherited from class org.deeplearning4j.bagofwords.vectorizer.BaseTextVectorizer
index, isParallel, iterator, labelsSource, minWordFrequency, stopWords, tokenizerFactory, vocabCache
-
-
Constructor Summary
Constructors Constructor Description TfidfVectorizer()
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description doubletfidfWord(String word, long wordCount, long documentLength)org.nd4j.linalg.api.ndarray.INDArraytransform(String text)Transforms the matrixorg.nd4j.linalg.api.ndarray.INDArraytransform(List<String> tokens)Transforms the matrixorg.nd4j.linalg.dataset.DataSetvectorize()Vectorizes the input source in to a datasetorg.nd4j.linalg.dataset.DataSetvectorize(File input, String label)org.nd4j.linalg.dataset.DataSetvectorize(InputStream is, String label)Text coming from an input stream considered as one documentorg.nd4j.linalg.dataset.DataSetvectorize(String text, String label)Vectorizes the passed in text treating it as one document-
Methods inherited from class org.deeplearning4j.bagofwords.vectorizer.BaseTextVectorizer
buildVocab, fit, getLabelsSource, numWordsEncountered
-
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
-
Methods inherited from interface org.deeplearning4j.bagofwords.vectorizer.TextVectorizer
getIndex, getVocabCache
-
-
-
-
Method Detail
-
vectorize
public org.nd4j.linalg.dataset.DataSet vectorize(InputStream is, String label)
Text coming from an input stream considered as one document- Parameters:
is- the input stream to read fromlabel- the label to assign- Returns:
- a dataset with a applyTransformToDestination of weights(relative to impl; could be word counts or tfidf scores)
-
vectorize
public org.nd4j.linalg.dataset.DataSet vectorize(String text, String label)
Vectorizes the passed in text treating it as one document- Parameters:
text- the text to vectorizelabel- the label of the text- Returns:
- a dataset with a transform of weights(relative to impl; could be word counts or tfidf scores)
-
vectorize
public org.nd4j.linalg.dataset.DataSet vectorize(File input, String label)
- Parameters:
input- the text to vectorizelabel- the label of the text- Returns:
DataSetwith a applyTransformToDestination of weights(relative to impl; could be word counts or tfidf scores)
-
transform
public org.nd4j.linalg.api.ndarray.INDArray transform(String text)
Transforms the matrix- Parameters:
text- text to transform- Returns:
INDArray
-
transform
public org.nd4j.linalg.api.ndarray.INDArray transform(List<String> tokens)
Description copied from interface:TextVectorizerTransforms the matrix- Returns:
-
tfidfWord
public double tfidfWord(String word, long wordCount, long documentLength)
-
vectorize
public org.nd4j.linalg.dataset.DataSet vectorize()
Vectorizes the input source in to a dataset- Returns:
- Adam Gibson
-
-