Class TfidfVectorizer
- java.lang.Object
-
- org.deeplearning4j.bagofwords.vectorizer.BaseTextVectorizer
-
- org.deeplearning4j.bagofwords.vectorizer.TfidfVectorizer
-
- All Implemented Interfaces:
Serializable
,TextVectorizer
,Vectorizer
public class TfidfVectorizer extends BaseTextVectorizer
- See Also:
- Serialized Form
-
-
Nested Class Summary
Nested Classes Modifier and Type Class Description static class
TfidfVectorizer.Builder
-
Field Summary
-
Fields inherited from class org.deeplearning4j.bagofwords.vectorizer.BaseTextVectorizer
index, isParallel, iterator, labelsSource, minWordFrequency, stopWords, tokenizerFactory, vocabCache
-
-
Constructor Summary
Constructors Constructor Description TfidfVectorizer()
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description double
tfidfWord(String word, long wordCount, long documentLength)
org.nd4j.linalg.api.ndarray.INDArray
transform(String text)
Transforms the matrixorg.nd4j.linalg.api.ndarray.INDArray
transform(List<String> tokens)
Transforms the matrixorg.nd4j.linalg.dataset.DataSet
vectorize()
Vectorizes the input source in to a datasetorg.nd4j.linalg.dataset.DataSet
vectorize(File input, String label)
org.nd4j.linalg.dataset.DataSet
vectorize(InputStream is, String label)
Text coming from an input stream considered as one documentorg.nd4j.linalg.dataset.DataSet
vectorize(String text, String label)
Vectorizes the passed in text treating it as one document-
Methods inherited from class org.deeplearning4j.bagofwords.vectorizer.BaseTextVectorizer
buildVocab, fit, getLabelsSource, numWordsEncountered
-
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
-
Methods inherited from interface org.deeplearning4j.bagofwords.vectorizer.TextVectorizer
getIndex, getVocabCache
-
-
-
-
Method Detail
-
vectorize
public org.nd4j.linalg.dataset.DataSet vectorize(InputStream is, String label)
Text coming from an input stream considered as one document- Parameters:
is
- the input stream to read fromlabel
- the label to assign- Returns:
- a dataset with a applyTransformToDestination of weights(relative to impl; could be word counts or tfidf scores)
-
vectorize
public org.nd4j.linalg.dataset.DataSet vectorize(String text, String label)
Vectorizes the passed in text treating it as one document- Parameters:
text
- the text to vectorizelabel
- the label of the text- Returns:
- a dataset with a transform of weights(relative to impl; could be word counts or tfidf scores)
-
vectorize
public org.nd4j.linalg.dataset.DataSet vectorize(File input, String label)
- Parameters:
input
- the text to vectorizelabel
- the label of the text- Returns:
DataSet
with a applyTransformToDestination of weights(relative to impl; could be word counts or tfidf scores)
-
transform
public org.nd4j.linalg.api.ndarray.INDArray transform(String text)
Transforms the matrix- Parameters:
text
- text to transform- Returns:
INDArray
-
transform
public org.nd4j.linalg.api.ndarray.INDArray transform(List<String> tokens)
Description copied from interface:TextVectorizer
Transforms the matrix- Returns:
-
tfidfWord
public double tfidfWord(String word, long wordCount, long documentLength)
-
vectorize
public org.nd4j.linalg.dataset.DataSet vectorize()
Vectorizes the input source in to a dataset- Returns:
- Adam Gibson
-
-