Interface VocabCache<T extends SequenceElement>
-
- All Superinterfaces:
Serializable
- All Known Implementing Classes:
AbstractCache
,InMemoryLookupCache
public interface VocabCache<T extends SequenceElement> extends Serializable
-
-
Method Summary
All Methods Instance Methods Abstract Methods Deprecated Methods Modifier and Type Method Description boolean
addToken(T element)
Adds a token to the cachevoid
addWordToIndex(int index, long elementId)
void
addWordToIndex(int index, String word)
boolean
containsWord(String word)
Returns true if the cache contains the given wordint
docAppearedIn(String word)
Count of documents a word appeared inT
elementAtIndex(int index)
Returns SequenceElement at the given index or nullboolean
hasToken(String token)
Returns whether the cache contains this token or notvoid
importVocabulary(VocabCache<T> vocabCache)
imports vocabularyvoid
incrementDocCount(String word, long howMuch)
Increment the document countvoid
incrementTotalDocCount()
Increment the doc countvoid
incrementTotalDocCount(long by)
Increment the doc countvoid
incrementWordCount(String word)
Increment the count for the given wordvoid
incrementWordCount(String word, int increment)
Increment the count for the given word by the amount incrementint
indexOf(String word)
Returns the index of a given wordvoid
loadVocab()
Load vocabint
numWords()
Returns the number of words in the cachevoid
putVocabWord(String word)
Deprecated.void
removeElement(String label)
Removes element with specified label from vocabulary Please note: Huffman index should be updated after element removalvoid
removeElement(T element)
Removes specified element from vocabulary Please note: Huffman index should be updated after element removalvoid
saveVocab()
Saves the vocab: this allow for reuse of word frequenciesvoid
setCountForDoc(String word, long count)
Set the count for the number of documents the word appears inT
tokenFor(long id)
T
tokenFor(String word)
Returns the token (again not necessarily in the vocab) for this wordCollection<T>
tokens()
All of the tokens in the cache, (not necessarily apart of the vocab)long
totalNumberOfDocs()
Returns the total of number of documents encountered in the corpuslong
totalWordOccurrences()
The total number of word occurrencesvoid
updateWordsOccurrences()
Updates countersboolean
vocabExists()
Vocab exists alreadyCollection<T>
vocabWords()
Returns all of the vocab word nodesString
wordAtIndex(int index)
Returns the word contained at the given index or nullT
wordFor(long id)
T
wordFor(String word)
int
wordFrequency(String word)
Returns the number of times the word has occurredCollection<String>
words()
Returns all of the words in the vocab
-
-
-
Method Detail
-
loadVocab
void loadVocab()
Load vocab
-
vocabExists
boolean vocabExists()
Vocab exists already- Returns:
-
saveVocab
void saveVocab()
Saves the vocab: this allow for reuse of word frequencies
-
words
Collection<String> words()
Returns all of the words in the vocab
-
incrementWordCount
void incrementWordCount(String word)
Increment the count for the given word- Parameters:
word
- the word to increment the count for
-
incrementWordCount
void incrementWordCount(String word, int increment)
Increment the count for the given word by the amount increment- Parameters:
word
- the word to increment the count forincrement
- the amount to increment by
-
wordFrequency
int wordFrequency(String word)
Returns the number of times the word has occurred- Parameters:
word
- the word to retrieve the occurrence frequency for- Returns:
- 0 if hasn't occurred or the number of times the word occurs
-
containsWord
boolean containsWord(String word)
Returns true if the cache contains the given word- Parameters:
word
- the word to check for- Returns:
-
wordAtIndex
String wordAtIndex(int index)
Returns the word contained at the given index or null- Parameters:
index
- the index of the word to get- Returns:
- the word at the given index
-
elementAtIndex
T elementAtIndex(int index)
Returns SequenceElement at the given index or null- Parameters:
index
-- Returns:
-
indexOf
int indexOf(String word)
Returns the index of a given word- Parameters:
word
- the index of a given word- Returns:
- the index of a given word or -1 if not found
-
vocabWords
Collection<T> vocabWords()
Returns all of the vocab word nodes- Returns:
-
totalWordOccurrences
long totalWordOccurrences()
The total number of word occurrences- Returns:
- the total number of word occurrences
-
wordFor
T wordFor(long id)
-
addWordToIndex
void addWordToIndex(int index, String word)
- Parameters:
index
-word
-
-
addWordToIndex
void addWordToIndex(int index, long elementId)
-
putVocabWord
@Deprecated void putVocabWord(String word)
Deprecated.Inserts the word as a vocab word (it gets the vocab word from the internal token store). Note that the index must be set on the token.- Parameters:
word
- the word to add to the vocab
-
numWords
int numWords()
Returns the number of words in the cache- Returns:
- the number of words in the cache
-
docAppearedIn
int docAppearedIn(String word)
Count of documents a word appeared in- Parameters:
word
- the number of documents the word appeared in- Returns:
-
incrementDocCount
void incrementDocCount(String word, long howMuch)
Increment the document count- Parameters:
word
- the word to increment byhowMuch
-
-
setCountForDoc
void setCountForDoc(String word, long count)
Set the count for the number of documents the word appears in- Parameters:
word
- the word to set the count forcount
- the count of the word
-
totalNumberOfDocs
long totalNumberOfDocs()
Returns the total of number of documents encountered in the corpus- Returns:
- the total number of docs in the corpus
-
incrementTotalDocCount
void incrementTotalDocCount()
Increment the doc count
-
incrementTotalDocCount
void incrementTotalDocCount(long by)
Increment the doc count- Parameters:
by
- the number to increment by
-
tokens
Collection<T> tokens()
All of the tokens in the cache, (not necessarily apart of the vocab)- Returns:
- the tokens for this cache
-
addToken
boolean addToken(T element)
Adds a token to the cache- Parameters:
element
- the word to add- Returns:
- true if token was added, false if updated
-
tokenFor
T tokenFor(String word)
Returns the token (again not necessarily in the vocab) for this word- Parameters:
word
- the word to get the token for- Returns:
- the vocab word for this token
-
tokenFor
T tokenFor(long id)
-
hasToken
boolean hasToken(String token)
Returns whether the cache contains this token or not- Parameters:
token
- the token to tes- Returns:
- whether the token exists in the cache or not
-
importVocabulary
void importVocabulary(VocabCache<T> vocabCache)
imports vocabulary- Parameters:
vocabCache
-
-
updateWordsOccurrences
void updateWordsOccurrences()
Updates counters
-
removeElement
void removeElement(String label)
Removes element with specified label from vocabulary Please note: Huffman index should be updated after element removal- Parameters:
label
- label of the element to be removed
-
removeElement
void removeElement(T element)
Removes specified element from vocabulary Please note: Huffman index should be updated after element removal- Parameters:
element
- SequenceElement to be removed
-
-