TextVectorizer (deeplearning4j-nlp 0.0.3.3 API)

All Superinterfaces:

Serializable, Vectorizer

All Known Implementing Classes:

BagOfWordsVectorizer, BaseTextVectorizer, TfidfVectorizer
```
public interface TextVectorizer
extends Vectorizer
```
Vectorizes text

Author:

Adam Gibson

Method Summary

All Methods Instance Methods Abstract Methods
Modifier and Type	Method and Description
`int`	`batchSize()` For word vectors, this is the batch size for how to partition documents in to workloads
`void`	`fit()` Train the model
`InvertedIndex`	`index()` Inverted index
`long`	`numWordsEncountered()` Returns the number of words encountered so far
`double`	`sample()` Sampling for building mini batches
`org.nd4j.linalg.api.ndarray.INDArray`	`transform(String text)` Transforms the matrix
`org.nd4j.linalg.dataset.DataSet`	`vectorize(File input, String label)`
`org.nd4j.linalg.dataset.DataSet`	`vectorize(InputStream is, String label)` Text coming from an input stream considered as one document
`org.nd4j.linalg.dataset.DataSet`	`vectorize(String text, String label)` Vectorizes the passed in text treating it as one document
`VocabCache`	`vocab()` The vocab sorted in descending order

Methods inherited from interface org.deeplearning4j.datasets.vectorizer.Vectorizer
vectorize

- Method Detail
  - sample
```
double sample()
```
    Sampling for building mini batches
    
    Returns:
    
    the sampling
  - batchSize
```
int batchSize()
```
    For word vectors, this is the batch size for how to partition documents in to workloads
    
    Returns:
    
    the batchsize for partitioning documents in to workloads
  - vocab
```
VocabCache vocab()
```
    The vocab sorted in descending order
    
    Returns:
    
    the vocab sorted in descending order
  - vectorize
```
org.nd4j.linalg.dataset.DataSet vectorize(InputStream is,
                                          String label)
```
    Text coming from an input stream considered as one document
    
    Parameters:
    
    is - the input stream to read from
    
    label - the label to assign
    
    Returns:
    
    a dataset with a applyTransformToDestination of weights(relative to impl; could be word counts or tfidf scores)
  - vectorize
```
org.nd4j.linalg.dataset.DataSet vectorize(String text,
                                          String label)
```
    Vectorizes the passed in text treating it as one document
    
    Parameters:
    
    text - the text to vectorize
    
    label - the label of the text
    
    Returns:
    
    a dataset with a applyTransformToDestination of weights(relative to impl; could be word counts or tfidf scores)
  - fit
```
void fit()
```
    Train the model
  - vectorize
```
org.nd4j.linalg.dataset.DataSet vectorize(File input,
                                          String label)
```
    Parameters:
    
    input - the text to vectorize
    
    LABEL - the label of the text
    
    Returns:
    
    a dataset with a applyTransformToDestination of weights(relative to impl; could be word counts or tfidf scores)
  - transform
```
org.nd4j.linalg.api.ndarray.INDArray transform(String text)
```
    Transforms the matrix
    
    Parameters:
    
    text -
    
    Returns:
  - numWordsEncountered
```
long numWordsEncountered()
```
    Returns the number of words encountered so far
    
    Returns:
    
    the number of words encountered so far
  - index
```
InvertedIndex index()
```
    Inverted index
    
    Returns:
    
    the inverted index for this vectorizer

Interface TextVectorizer

Method Summary

Methods inherited from interface org.deeplearning4j.datasets.vectorizer.Vectorizer

Method Detail

sample

batchSize

vocab

vectorize

vectorize

fit

vectorize

transform

numWordsEncountered

index