Interface VocabCache<T extends SequenceElement>

    • Method Detail

      • loadVocab

        void loadVocab()
        Load vocab
      • vocabExists

        boolean vocabExists()
        Vocab exists already
        Returns:
      • saveVocab

        void saveVocab()
        Saves the vocab: this allow for reuse of word frequencies
      • incrementWordCount

        void incrementWordCount​(String word)
        Increment the count for the given word
        Parameters:
        word - the word to increment the count for
      • incrementWordCount

        void incrementWordCount​(String word,
                                int increment)
        Increment the count for the given word by the amount increment
        Parameters:
        word - the word to increment the count for
        increment - the amount to increment by
      • wordFrequency

        int wordFrequency​(String word)
        Returns the number of times the word has occurred
        Parameters:
        word - the word to retrieve the occurrence frequency for
        Returns:
        0 if hasn't occurred or the number of times the word occurs
      • containsWord

        boolean containsWord​(String word)
        Returns true if the cache contains the given word
        Parameters:
        word - the word to check for
        Returns:
      • wordAtIndex

        String wordAtIndex​(int index)
        Returns the word contained at the given index or null
        Parameters:
        index - the index of the word to get
        Returns:
        the word at the given index
      • elementAtIndex

        T elementAtIndex​(int index)
        Returns SequenceElement at the given index or null
        Parameters:
        index -
        Returns:
      • indexOf

        int indexOf​(String word)
        Returns the index of a given word
        Parameters:
        word - the index of a given word
        Returns:
        the index of a given word or -1 if not found
      • vocabWords

        Collection<T> vocabWords()
        Returns all of the vocab word nodes
        Returns:
      • totalWordOccurrences

        long totalWordOccurrences()
        The total number of word occurrences
        Returns:
        the total number of word occurrences
      • wordFor

        T wordFor​(String word)
        Parameters:
        word -
        Returns:
      • wordFor

        T wordFor​(long id)
      • addWordToIndex

        void addWordToIndex​(int index,
                            String word)
        Parameters:
        index -
        word -
      • addWordToIndex

        void addWordToIndex​(int index,
                            long elementId)
      • putVocabWord

        @Deprecated
        void putVocabWord​(String word)
        Deprecated.
        Inserts the word as a vocab word (it gets the vocab word from the internal token store). Note that the index must be set on the token.
        Parameters:
        word - the word to add to the vocab
      • numWords

        int numWords()
        Returns the number of words in the cache
        Returns:
        the number of words in the cache
      • docAppearedIn

        int docAppearedIn​(String word)
        Count of documents a word appeared in
        Parameters:
        word - the number of documents the word appeared in
        Returns:
      • incrementDocCount

        void incrementDocCount​(String word,
                               long howMuch)
        Increment the document count
        Parameters:
        word - the word to increment by
        howMuch -
      • setCountForDoc

        void setCountForDoc​(String word,
                            long count)
        Set the count for the number of documents the word appears in
        Parameters:
        word - the word to set the count for
        count - the count of the word
      • totalNumberOfDocs

        long totalNumberOfDocs()
        Returns the total of number of documents encountered in the corpus
        Returns:
        the total number of docs in the corpus
      • incrementTotalDocCount

        void incrementTotalDocCount()
        Increment the doc count
      • incrementTotalDocCount

        void incrementTotalDocCount​(long by)
        Increment the doc count
        Parameters:
        by - the number to increment by
      • tokens

        Collection<T> tokens()
        All of the tokens in the cache, (not necessarily apart of the vocab)
        Returns:
        the tokens for this cache
      • addToken

        boolean addToken​(T element)
        Adds a token to the cache
        Parameters:
        element - the word to add
        Returns:
        true if token was added, false if updated
      • tokenFor

        T tokenFor​(String word)
        Returns the token (again not necessarily in the vocab) for this word
        Parameters:
        word - the word to get the token for
        Returns:
        the vocab word for this token
      • tokenFor

        T tokenFor​(long id)
      • hasToken

        boolean hasToken​(String token)
        Returns whether the cache contains this token or not
        Parameters:
        token - the token to tes
        Returns:
        whether the token exists in the cache or not
      • importVocabulary

        void importVocabulary​(VocabCache<T> vocabCache)
        imports vocabulary
        Parameters:
        vocabCache -
      • updateWordsOccurrences

        void updateWordsOccurrences()
        Updates counters
      • removeElement

        void removeElement​(String label)
        Removes element with specified label from vocabulary Please note: Huffman index should be updated after element removal
        Parameters:
        label - label of the element to be removed
      • removeElement

        void removeElement​(T element)
        Removes specified element from vocabulary Please note: Huffman index should be updated after element removal
        Parameters:
        element - SequenceElement to be removed