Class VocabConstructor<T extends SequenceElement>

    • Field Detail

      • log

        protected static final org.slf4j.Logger log
    • Method Detail

      • buildExtendedLookupTable

        protected WeightLookupTable<T> buildExtendedLookupTable()
        Placeholder for future implementation
        Returns:
      • buildExtendedVocabulary

        protected VocabCache<T> buildExtendedVocabulary()
        Placeholder for future implementation
        Returns:
      • buildMergedVocabulary

        public VocabCache<T> buildMergedVocabulary​(@NonNull
                                                   @NonNull WordVectors wordVectors,
                                                   boolean fetchLabels)
        This method transfers existing WordVectors model into current one
        Parameters:
        wordVectors -
        Returns:
      • getNumberOfSequences

        public long getNumberOfSequences()
        This method returns total number of sequences passed through VocabConstructor
        Returns:
      • buildMergedVocabulary

        public VocabCache<T> buildMergedVocabulary​(@NonNull
                                                   @NonNull VocabCache<T> vocabCache,
                                                   boolean fetchLabels)
        This method transfers existing vocabulary into current one Please note: this method expects source vocabulary has Huffman tree indexes applied
        Parameters:
        vocabCache -
        Returns:
      • transferVocabulary

        public VocabCache<T> transferVocabulary​(@NonNull
                                                @NonNull VocabCache<T> vocabCache,
                                                boolean buildHuffman)
      • buildJointVocabulary

        public VocabCache<T> buildJointVocabulary​(boolean resetCounters,
                                                  boolean buildHuffmanTree)
        This method scans all sources passed through builder, and returns all words as vocab. If TargetVocabCache was set during instance creation, it'll be filled too.
        Returns:
      • filterVocab

        protected void filterVocab​(AbstractCache<T> cache,
                                   int minWordFrequency)