Interface TextVectorizer

    • Method Detail

      • getVocabCache

        VocabCache<VocabWord> getVocabCache()
        The vocab sorted in descending order
        Returns:
        the vocab sorted in descending order
      • vectorize

        org.nd4j.linalg.dataset.DataSet vectorize​(InputStream is,
                                                  String label)
        Text coming from an input stream considered as one document
        Parameters:
        is - the input stream to read from
        label - the label to assign
        Returns:
        a dataset with a applyTransformToDestination of weights(relative to impl; could be word counts or tfidf scores)
      • vectorize

        org.nd4j.linalg.dataset.DataSet vectorize​(String text,
                                                  String label)
        Vectorizes the passed in text treating it as one document
        Parameters:
        text - the text to vectorize
        label - the label of the text
        Returns:
        a dataset with a transform of weights(relative to impl; could be word counts or tfidf scores)
      • fit

        void fit()
        Train the model
      • vectorize

        org.nd4j.linalg.dataset.DataSet vectorize​(File input,
                                                  String label)
        Parameters:
        input - the text to vectorize
        label - the label of the text
        Returns:
        DataSet with a applyTransformToDestination of weights(relative to impl; could be word counts or tfidf scores)
      • transform

        org.nd4j.linalg.api.ndarray.INDArray transform​(String text)
        Transforms the matrix
        Parameters:
        text - text to transform
        Returns:
        INDArray
      • transform

        org.nd4j.linalg.api.ndarray.INDArray transform​(List<String> tokens)
        Transforms the matrix
        Parameters:
        tokens -
        Returns:
      • numWordsEncountered

        long numWordsEncountered()
        Returns the number of words encountered so far
        Returns:
        the number of words encountered so far
      • getIndex

        InvertedIndex<VocabWord> getIndex()
        Inverted index
        Returns:
        the inverted index for this vectorizer