Class SequenceVectors.Builder<T extends SequenceElement>

    • Field Detail

      • existingVectors

        protected WordVectors existingVectors
      • lockFactor

        protected boolean lockFactor
      • sampling

        protected double sampling
      • negative

        protected double negative
      • learningRate

        protected double learningRate
      • minLearningRate

        protected double minLearningRate
      • minWordFrequency

        protected int minWordFrequency
      • iterations

        protected int iterations
      • numEpochs

        protected int numEpochs
      • layerSize

        protected int layerSize
      • window

        protected int window
      • hugeModelExpected

        protected boolean hugeModelExpected
      • batchSize

        protected int batchSize
      • learningRateDecayWords

        protected int learningRateDecayWords
      • seed

        protected long seed
      • useAdaGrad

        protected boolean useAdaGrad
      • resetModel

        protected boolean resetModel
      • workers

        protected int workers
      • useUnknown

        protected boolean useUnknown
      • useHierarchicSoftmax

        protected boolean useHierarchicSoftmax
      • variableWindows

        protected int[] variableWindows
      • trainSequenceVectors

        protected boolean trainSequenceVectors
      • trainElementsVectors

        protected boolean trainElementsVectors
      • preciseWeightInit

        protected boolean preciseWeightInit
      • enableScavenger

        protected boolean enableScavenger
      • vocabLimit

        protected int vocabLimit
      • preciseMode

        protected boolean preciseMode
        Experimental field. Switches on precise mode for batch operations.
    • Constructor Detail

      • Builder

        public Builder()
    • Method Detail

      • useExistingWordVectors

        protected SequenceVectors.Builder<T> useExistingWordVectors​(@NonNull
                                                                    @NonNull WordVectors vec)
        This method allows you to use pre-built WordVectors model (e.g. SkipGram) for DBOW sequence learning. Existing model will be transferred into new model before training starts. PLEASE NOTE: This model has no effect for elements learning algorithms. Only sequence learning is affected. PLEASE NOTE: Non-normalized model is recommended to use here.
        Parameters:
        vec - existing WordVectors model
        Returns:
      • sequenceLearningAlgorithm

        public SequenceVectors.Builder<T> sequenceLearningAlgorithm​(@NonNull
                                                                    @NonNull String algoName)
        Sets specific LearningAlgorithm as Sequence Learning Algorithm
        Parameters:
        algoName - fully qualified class name
        Returns:
      • sequenceLearningAlgorithm

        public SequenceVectors.Builder<T> sequenceLearningAlgorithm​(@NonNull
                                                                    @NonNull SequenceLearningAlgorithm<T> algorithm)
        Sets specific LearningAlgorithm as Sequence Learning Algorithm
        Parameters:
        algorithm - SequenceLearningAlgorithm implementation
        Returns:
      • elementsLearningAlgorithm

        public SequenceVectors.Builder<T> elementsLearningAlgorithm​(@NonNull
                                                                    @NonNull String algoName)
        * Sets specific LearningAlgorithm as Elements Learning Algorithm
        Parameters:
        algoName - fully qualified class name
        Returns:
      • elementsLearningAlgorithm

        public SequenceVectors.Builder<T> elementsLearningAlgorithm​(@NonNull
                                                                    @NonNull ElementsLearningAlgorithm<T> algorithm)
        * Sets specific LearningAlgorithm as Elements Learning Algorithm
        Parameters:
        algorithm - ElementsLearningAlgorithm implementation
        Returns:
      • batchSize

        public SequenceVectors.Builder<T> batchSize​(int batchSize)
        This method defines batchSize option, viable only if iterations > 1
        Parameters:
        batchSize -
        Returns:
      • iterations

        public SequenceVectors.Builder<T> iterations​(int iterations)
        This method defines how much iterations should be done over batched sequences.
        Parameters:
        iterations -
        Returns:
      • epochs

        public SequenceVectors.Builder<T> epochs​(int numEpochs)
        This method defines how much iterations should be done over whole training corpus during modelling
        Parameters:
        numEpochs -
        Returns:
      • workers

        public SequenceVectors.Builder<T> workers​(int numWorkers)
        Sets number of worker threads to be used in calculations
        Parameters:
        numWorkers -
        Returns:
      • useHierarchicSoftmax

        public SequenceVectors.Builder<T> useHierarchicSoftmax​(boolean reallyUse)
        Enable/disable hierarchic softmax
        Parameters:
        reallyUse -
        Returns:
      • useAdaGrad

        @Deprecated
        public SequenceVectors.Builder<T> useAdaGrad​(boolean reallyUse)
        Deprecated.
        This method defines if Adaptive Gradients should be used in calculations
        Parameters:
        reallyUse -
        Returns:
      • layerSize

        public SequenceVectors.Builder<T> layerSize​(int layerSize)
        This method defines number of dimensions for outcome vectors. Please note: This option has effect only if lookupTable wasn't defined during building process.
        Parameters:
        layerSize -
        Returns:
      • learningRate

        public SequenceVectors.Builder<T> learningRate​(double learningRate)
        This method defines initial learning rate. Default value is 0.025
        Parameters:
        learningRate -
        Returns:
      • minWordFrequency

        public SequenceVectors.Builder<T> minWordFrequency​(int minWordFrequency)
        This method defines minimal element frequency for elements found in the training corpus. All elements with frequency below this threshold will be removed before training. Please note: this method has effect only if vocabulary is built internally.
        Parameters:
        minWordFrequency -
        Returns:
      • limitVocabularySize

        public SequenceVectors.Builder limitVocabularySize​(int limit)
        This method sets vocabulary limit during construction. Default value: 0. Means no limit
        Parameters:
        limit -
        Returns:
      • minLearningRate

        public SequenceVectors.Builder<T> minLearningRate​(double minLearningRate)
        This method defines minimum learning rate after decay being applied. Default value is 0.01
        Parameters:
        minLearningRate -
        Returns:
      • resetModel

        public SequenceVectors.Builder<T> resetModel​(boolean reallyReset)
        This method defines, should all model be reset before training. If set to true, vocabulary and WeightLookupTable will be reset before training, and will be built from scratches
        Parameters:
        reallyReset -
        Returns:
      • vocabCache

        public SequenceVectors.Builder<T> vocabCache​(@NonNull
                                                     @NonNull VocabCache<T> vocabCache)
        You can pass externally built vocabCache object, containing vocabulary
        Parameters:
        vocabCache -
        Returns:
      • lookupTable

        public SequenceVectors.Builder<T> lookupTable​(@NonNull
                                                      @NonNull WeightLookupTable<T> lookupTable)
        You can pass externally built WeightLookupTable, containing model weights and vocabulary.
        Parameters:
        lookupTable -
        Returns:
      • sampling

        public SequenceVectors.Builder<T> sampling​(double sampling)
        This method defines sub-sampling threshold.
        Parameters:
        sampling -
        Returns:
      • negativeSample

        public SequenceVectors.Builder<T> negativeSample​(double negative)
        This method defines negative sampling value for skip-gram algorithm.
        Parameters:
        negative -
        Returns:
      • stopWords

        public SequenceVectors.Builder<T> stopWords​(@NonNull
                                                    @NonNull List<String> stopList)
        You can provide collection of objects to be ignored, and excluded out of model Please note: Object labels and hashCode will be used for filtering
        Parameters:
        stopList -
        Returns:
      • trainElementsRepresentation

        public SequenceVectors.Builder<T> trainElementsRepresentation​(boolean trainElements)
        Parameters:
        trainElements -
        Returns:
      • trainSequencesRepresentation

        public SequenceVectors.Builder<T> trainSequencesRepresentation​(boolean trainSequences)
      • stopWords

        public SequenceVectors.Builder<T> stopWords​(@NonNull
                                                    @NonNull Collection<T> stopList)
        You can provide collection of objects to be ignored, and excluded out of model Please note: Object labels and hashCode will be used for filtering
        Parameters:
        stopList -
        Returns:
      • windowSize

        public SequenceVectors.Builder<T> windowSize​(int windowSize)
        Sets window size for skip-Gram training
        Parameters:
        windowSize -
        Returns:
      • seed

        public SequenceVectors.Builder<T> seed​(long randomSeed)
        Sets seed for random numbers generator. Please note: this has effect only if vocabulary and WeightLookupTable is built internally
        Parameters:
        randomSeed -
        Returns:
      • modelUtils

        public SequenceVectors.Builder<T> modelUtils​(@NonNull
                                                     @NonNull ModelUtils<T> modelUtils)
        ModelUtils implementation, that will be used to access model. Methods like: similarity, wordsNearest, accuracy are provided by user-defined ModelUtils
        Parameters:
        modelUtils - model utils to be used
        Returns:
      • useUnknown

        public SequenceVectors.Builder<T> useUnknown​(boolean reallyUse)
        This method allows you to specify, if UNK word should be used internally
        Parameters:
        reallyUse -
        Returns:
      • unknownElement

        public SequenceVectors.Builder<T> unknownElement​(@NonNull
                                                         T element)
        This method allows you to specify SequenceElement that will be used as UNK element, if UNK is used
        Parameters:
        element -
        Returns:
      • useVariableWindow

        public SequenceVectors.Builder<T> useVariableWindow​(int... windows)
        This method allows to use variable window size. In this case, every batch gets processed using one of predefined window sizes
        Parameters:
        windows -
        Returns:
      • usePreciseWeightInit

        public SequenceVectors.Builder<T> usePreciseWeightInit​(boolean reallyUse)
        If set to true, initial weights for elements/sequences will be derived from elements themself. However, this implies additional cycle through input iterator. Default value: FALSE
        Parameters:
        reallyUse -
        Returns:
      • presetTables

        protected void presetTables()
        This method creates new WeightLookupTable and VocabCache if there were none set
      • enableScavenger

        public SequenceVectors.Builder<T> enableScavenger​(boolean reallyEnable)
        This method ebables/disables periodical vocab truncation during construction Default value: disabled
        Parameters:
        reallyEnable -
        Returns:
      • build

        public SequenceVectors<T> build()
        Build SequenceVectors instance with defined settings/options
        Returns: