Class BaseTextVectorizer
- java.lang.Object
-
- org.deeplearning4j.bagofwords.vectorizer.BaseTextVectorizer
-
- All Implemented Interfaces:
Serializable
,TextVectorizer
,Vectorizer
- Direct Known Subclasses:
BagOfWordsVectorizer
,TfidfVectorizer
public abstract class BaseTextVectorizer extends Object implements TextVectorizer
- See Also:
- Serialized Form
-
-
Field Summary
Fields Modifier and Type Field Description protected InvertedIndex<VocabWord>
index
protected boolean
isParallel
protected LabelAwareIterator
iterator
protected LabelsSource
labelsSource
protected int
minWordFrequency
protected Collection<String>
stopWords
protected TokenizerFactory
tokenizerFactory
protected VocabCache<VocabWord>
vocabCache
-
Constructor Summary
Constructors Constructor Description BaseTextVectorizer()
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description void
buildVocab()
void
fit()
Train the modelLabelsSource
getLabelsSource()
long
numWordsEncountered()
Returns the number of words encountered so far-
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
-
Methods inherited from interface org.deeplearning4j.bagofwords.vectorizer.TextVectorizer
getIndex, getVocabCache, transform, transform, vectorize, vectorize, vectorize
-
Methods inherited from interface org.deeplearning4j.core.datasets.vectorizer.Vectorizer
vectorize
-
-
-
-
Field Detail
-
tokenizerFactory
protected transient TokenizerFactory tokenizerFactory
-
iterator
protected transient LabelAwareIterator iterator
-
minWordFrequency
protected int minWordFrequency
-
vocabCache
protected VocabCache<VocabWord> vocabCache
-
labelsSource
protected LabelsSource labelsSource
-
stopWords
protected Collection<String> stopWords
-
index
protected transient InvertedIndex<VocabWord> index
-
isParallel
protected boolean isParallel
-
-
Method Detail
-
getLabelsSource
public LabelsSource getLabelsSource()
-
buildVocab
public void buildVocab()
-
fit
public void fit()
Description copied from interface:TextVectorizer
Train the model- Specified by:
fit
in interfaceTextVectorizer
-
numWordsEncountered
public long numWordsEncountered()
Returns the number of words encountered so far- Specified by:
numWordsEncountered
in interfaceTextVectorizer
- Returns:
- the number of words encountered so far
-
-