Class TfidfVectorizer

    • Constructor Detail

      • TfidfVectorizer

        public TfidfVectorizer()
    • Method Detail

      • vectorize

        public org.nd4j.linalg.dataset.DataSet vectorize​(InputStream is,
                                                         String label)
        Text coming from an input stream considered as one document
        Parameters:
        is - the input stream to read from
        label - the label to assign
        Returns:
        a dataset with a applyTransformToDestination of weights(relative to impl; could be word counts or tfidf scores)
      • vectorize

        public org.nd4j.linalg.dataset.DataSet vectorize​(String text,
                                                         String label)
        Vectorizes the passed in text treating it as one document
        Parameters:
        text - the text to vectorize
        label - the label of the text
        Returns:
        a dataset with a transform of weights(relative to impl; could be word counts or tfidf scores)
      • vectorize

        public org.nd4j.linalg.dataset.DataSet vectorize​(File input,
                                                         String label)
        Parameters:
        input - the text to vectorize
        label - the label of the text
        Returns:
        DataSet with a applyTransformToDestination of weights(relative to impl; could be word counts or tfidf scores)
      • transform

        public org.nd4j.linalg.api.ndarray.INDArray transform​(String text)
        Transforms the matrix
        Parameters:
        text - text to transform
        Returns:
        INDArray
      • transform

        public org.nd4j.linalg.api.ndarray.INDArray transform​(List<String> tokens)
        Description copied from interface: TextVectorizer
        Transforms the matrix
        Returns:
      • tfidfWord

        public double tfidfWord​(String word,
                                long wordCount,
                                long documentLength)
      • vectorize

        public org.nd4j.linalg.dataset.DataSet vectorize()
        Vectorizes the input source in to a dataset
        Returns:
        Adam Gibson