Class ParagraphVectors

    • Field Detail

      • labelsMatrix

        protected org.nd4j.linalg.api.ndarray.INDArray labelsMatrix
      • normalizedLabels

        protected boolean normalizedLabels
      • inferenceLocker

        protected final transient Object inferenceLocker
      • inferenceExecutor

        protected transient org.threadly.concurrent.PriorityScheduler inferenceExecutor
      • countSubmitted

        protected transient AtomicLong countSubmitted
      • countFinished

        protected transient AtomicLong countFinished
    • Constructor Detail

      • ParagraphVectors

        protected ParagraphVectors()
    • Method Detail

      • initInference

        protected void initInference()
      • predict

        @Deprecated
        public String predict​(String rawText)
        Deprecated.
        This method takes raw text, applies tokenizer, and returns most probable label
        Parameters:
        rawText -
        Returns:
      • setSequenceIterator

        public void setSequenceIterator​(@NonNull
                                        @NonNull SequenceIterator<VocabWord> iterator)
        This method defines SequenceIterator instance, that will be used as training corpus source. Main difference with other iterators here: it allows you to pass already tokenized Sequence for training
        Overrides:
        setSequenceIterator in class Word2Vec
        Parameters:
        iterator -
      • predict

        public String predict​(LabelledDocument document)
        This method predicts label of the document. Computes a similarity wrt the mean of the representation of words in the document
        Parameters:
        document - the document
        Returns:
        the word distances for each label
      • extractLabels

        public void extractLabels()
      • inferVector

        public org.nd4j.linalg.api.ndarray.INDArray inferVector​(String text,
                                                                double learningRate,
                                                                double minLearningRate,
                                                                int iterations)
        This method calculates inferred vector for given text
        Parameters:
        text -
        Returns:
      • reassignExistingModel

        protected void reassignExistingModel()
      • inferVector

        public org.nd4j.linalg.api.ndarray.INDArray inferVector​(LabelledDocument document,
                                                                double learningRate,
                                                                double minLearningRate,
                                                                int iterations)
        This method calculates inferred vector for given document
        Parameters:
        document -
        Returns:
      • inferVector

        public org.nd4j.linalg.api.ndarray.INDArray inferVector​(@NonNull
                                                                @NonNull List<VocabWord> document,
                                                                double learningRate,
                                                                double minLearningRate,
                                                                int iterations)
        This method calculates inferred vector for given document
        Parameters:
        document -
        Returns:
      • inferVector

        public org.nd4j.linalg.api.ndarray.INDArray inferVector​(String text)
        This method calculates inferred vector for given text, with default parameters for learning rate and iterations
        Parameters:
        text -
        Returns:
      • inferVector

        public org.nd4j.linalg.api.ndarray.INDArray inferVector​(LabelledDocument document)
        This method calculates inferred vector for given document, with default parameters for learning rate and iterations
        Parameters:
        document -
        Returns:
      • inferVector

        public org.nd4j.linalg.api.ndarray.INDArray inferVector​(@NonNull
                                                                @NonNull List<VocabWord> document)
        This method calculates inferred vector for given list of words, with default parameters for learning rate and iterations
        Parameters:
        document -
        Returns:
      • inferVectorBatched

        public Future<org.nd4j.common.primitives.Pair<String,​org.nd4j.linalg.api.ndarray.INDArray>> inferVectorBatched​(@NonNull
                                                                                                                             @NonNull LabelledDocument document)
        This method implements batched inference, based on Java Future parallelism model. PLEASE NOTE: In order to use this method, LabelledDocument being passed in should have Id field defined.
        Parameters:
        document -
        Returns:
      • inferVectorBatched

        public Future<org.nd4j.linalg.api.ndarray.INDArray> inferVectorBatched​(@NonNull
                                                                               @NonNull String document)
        This method implements batched inference, based on Java Future parallelism model. PLEASE NOTE: This method will return you Future<INDArray>, so tracking relation between document and INDArray will be your responsibility
        Parameters:
        document -
        Returns:
      • inferVectorBatched

        public List<org.nd4j.linalg.api.ndarray.INDArray> inferVectorBatched​(@NonNull
                                                                             @NonNull List<String> documents)
        This method does inference on a given List<String>
        Parameters:
        documents -
        Returns:
        INDArrays in the same order as input texts
      • predict

        public String predict​(List<VocabWord> document)
        This method predicts label of the document. Computes a similarity wrt the mean of the representation of words in the document
        Parameters:
        document - the document
        Returns:
        the word distances for each label
      • predictSeveral

        public Collection<String> predictSeveral​(@NonNull
                                                 @NonNull LabelledDocument document,
                                                 int limit)
        Predict several labels based on the document. Computes a similarity wrt the mean of the representation of words in the document
        Parameters:
        document - raw text of the document
        Returns:
        possible labels in descending order
      • predictSeveral

        public Collection<String> predictSeveral​(String rawText,
                                                 int limit)
        Predict several labels based on the document. Computes a similarity wrt the mean of the representation of words in the document
        Parameters:
        rawText - raw text of the document
        Returns:
        possible labels in descending order
      • predictSeveral

        public Collection<String> predictSeveral​(List<VocabWord> document,
                                                 int limit)
        Predict several labels based on the document. Computes a similarity wrt the mean of the representation of words in the document
        Parameters:
        document - the document
        Returns:
        possible labels in descending order
      • nearestLabels

        public Collection<String> nearestLabels​(LabelledDocument document,
                                                int topN)
        This method returns top N labels nearest to specified document
        Parameters:
        document -
        topN -
        Returns:
      • nearestLabels

        public Collection<String> nearestLabels​(@NonNull
                                                @NonNull String rawText,
                                                int topN)
        This method returns top N labels nearest to specified text
        Parameters:
        rawText -
        topN -
        Returns:
      • nearestLabels

        public Collection<String> nearestLabels​(@NonNull
                                                @NonNull Collection<VocabWord> document,
                                                int topN)
        This method returns top N labels nearest to specified set of vocab words
        Parameters:
        document -
        topN -
        Returns:
      • nearestLabels

        public Collection<String> nearestLabels​(org.nd4j.linalg.api.ndarray.INDArray labelVector,
                                                int topN)
        This method returns top N labels nearest to specified features vector
        Parameters:
        labelVector -
        topN -
        Returns:
      • similarityToLabel

        @Deprecated
        public double similarityToLabel​(String rawText,
                                        String label)
        Deprecated.
        This method returns similarity of the document to specific label, based on mean value
        Parameters:
        rawText -
        label -
        Returns:
      • similarityToLabel

        public double similarityToLabel​(LabelledDocument document,
                                        String label)
        This method returns similarity of the document to specific label, based on mean value
        Parameters:
        document -
        label -
        Returns:
      • similarityToLabel

        public double similarityToLabel​(List<VocabWord> document,
                                        String label)
        This method returns similarity of the document to specific label, based on mean value
        Parameters:
        document -
        label -
        Returns:
      • toJson

        public String toJson()
                      throws org.nd4j.shade.jackson.core.JsonProcessingException
        Overrides:
        toJson in class Word2Vec
        Throws:
        org.nd4j.shade.jackson.core.JsonProcessingException