Class AbstractCache<T extends SequenceElement>
- java.lang.Object
-
- org.deeplearning4j.models.word2vec.wordstore.inmemory.AbstractCache<T>
-
- All Implemented Interfaces:
Serializable
,VocabCache<T>
public class AbstractCache<T extends SequenceElement> extends Object implements VocabCache<T>
- See Also:
- Serialized Form
-
-
Nested Class Summary
Nested Classes Modifier and Type Class Description static class
AbstractCache.Builder<T extends SequenceElement>
-
Constructor Summary
Constructors Constructor Description AbstractCache()
-
Method Summary
All Methods Static Methods Instance Methods Concrete Methods Deprecated Methods Modifier and Type Method Description boolean
addToken(T element)
This method adds specified SequenceElement to vocabularyvoid
addToken(T element, boolean lockf)
void
addWordToIndex(int index, long elementId)
void
addWordToIndex(int index, String label)
This method allows to insert specified label to specified Huffman tree position.boolean
containsElement(T element)
Checks, if specified element exists in vocabularyboolean
containsWord(String word)
Checks, if specified label exists in vocabularyint
docAppearedIn(String word)
Returns number of documents (if applicable) the label was observed in.T
elementAtIndex(int index)
Returns SequenceElement at specified indexstatic <T extends SequenceElement>
AbstractCache<T>fromJson(String jsonString)
boolean
hasToken(String label)
Checks, if specified label already exists in vocabulary.void
importVocabulary(@NonNull VocabCache<T> vocabCache)
This method imports all elements from VocabCache passed as argument If element already exists,void
incrementDocCount(String word, long howMuch)
Increment number of documents the label was observed in Please note: this method is NOT thread-safevoid
incrementTotalDocCount()
Increment total number of documents observed by 1void
incrementTotalDocCount(long by)
Increment total number of documents observed by specified valuevoid
incrementWordCount(String word)
Increment frequency for specified label by 1void
incrementWordCount(String word, int increment)
Increment frequency for specified label by specified valueint
indexOf(String label)
Returns Huffman index for specified labelvoid
loadVocab()
Deserialize vocabulary from specified pathint
numWords()
Returns number of elements in this vocabularyvoid
putVocabWord(String word)
Deprecated.void
removeElement(String label)
Removes element with specified label from vocabulary Please note: Huffman index should be updated after element removalvoid
removeElement(T element)
Removes specified element from vocabulary Please note: Huffman index should be updated after element removalvoid
saveVocab()
Serialize vocabulary to specified pathvoid
setCountForDoc(String word, long count)
Set exact number of observed documents that contain specified word Please note: this method is NOT thread-safevoid
setTotalDocCount(long by)
This method allows to set total number of documentsvoid
setTotalWordOccurences(long value)
String
toJson()
T
tokenFor(long id)
T
tokenFor(String label)
Returns SequenceElement for specified label.Collection<T>
tokens()
Returns collection of SequenceElements from this vocabulary.long
totalNumberOfDocs()
Returns total number of documents observed (if applicable)long
totalWordOccurrences()
Returns total number of elements observedvoid
updateWordsOccurrences()
Updates countersboolean
vocabExists()
Returns true, if number of elements in vocabulary > 0, false otherwiseCollection<T>
vocabWords()
Returns collection of SequenceElements stored in this vocabularyString
wordAtIndex(int index)
Returns the label of the element at specified Huffman indexT
wordFor(long id)
T
wordFor(@NonNull String label)
Returns SequenceElement for specified labelint
wordFrequency(@NonNull String word)
Returns the SequenceElement's frequency over training corpusCollection<String>
words()
Returns collection of labels available in this vocabulary
-
-
-
Method Detail
-
loadVocab
public void loadVocab()
Deserialize vocabulary from specified path- Specified by:
loadVocab
in interfaceVocabCache<T extends SequenceElement>
-
vocabExists
public boolean vocabExists()
Returns true, if number of elements in vocabulary > 0, false otherwise- Specified by:
vocabExists
in interfaceVocabCache<T extends SequenceElement>
- Returns:
-
saveVocab
public void saveVocab()
Serialize vocabulary to specified path- Specified by:
saveVocab
in interfaceVocabCache<T extends SequenceElement>
-
words
public Collection<String> words()
Returns collection of labels available in this vocabulary- Specified by:
words
in interfaceVocabCache<T extends SequenceElement>
- Returns:
-
incrementWordCount
public void incrementWordCount(String word)
Increment frequency for specified label by 1- Specified by:
incrementWordCount
in interfaceVocabCache<T extends SequenceElement>
- Parameters:
word
- the word to increment the count for
-
incrementWordCount
public void incrementWordCount(String word, int increment)
Increment frequency for specified label by specified value- Specified by:
incrementWordCount
in interfaceVocabCache<T extends SequenceElement>
- Parameters:
word
- the word to increment the count forincrement
- the amount to increment by
-
wordFrequency
public int wordFrequency(@NonNull @NonNull String word)
Returns the SequenceElement's frequency over training corpus- Specified by:
wordFrequency
in interfaceVocabCache<T extends SequenceElement>
- Parameters:
word
- the word to retrieve the occurrence frequency for- Returns:
-
containsWord
public boolean containsWord(String word)
Checks, if specified label exists in vocabulary- Specified by:
containsWord
in interfaceVocabCache<T extends SequenceElement>
- Parameters:
word
- the word to check for- Returns:
-
containsElement
public boolean containsElement(T element)
Checks, if specified element exists in vocabulary- Parameters:
element
-- Returns:
-
wordAtIndex
public String wordAtIndex(int index)
Returns the label of the element at specified Huffman index- Specified by:
wordAtIndex
in interfaceVocabCache<T extends SequenceElement>
- Parameters:
index
- the index of the word to get- Returns:
-
elementAtIndex
public T elementAtIndex(int index)
Returns SequenceElement at specified index- Specified by:
elementAtIndex
in interfaceVocabCache<T extends SequenceElement>
- Parameters:
index
-- Returns:
-
indexOf
public int indexOf(String label)
Returns Huffman index for specified label- Specified by:
indexOf
in interfaceVocabCache<T extends SequenceElement>
- Parameters:
label
- the label to get index for- Returns:
- >=0 if label exists, -1 if Huffman tree wasn't built yet, -2 if specified label wasn't found
-
vocabWords
public Collection<T> vocabWords()
Returns collection of SequenceElements stored in this vocabulary- Specified by:
vocabWords
in interfaceVocabCache<T extends SequenceElement>
- Returns:
-
totalWordOccurrences
public long totalWordOccurrences()
Returns total number of elements observed- Specified by:
totalWordOccurrences
in interfaceVocabCache<T extends SequenceElement>
- Returns:
-
setTotalWordOccurences
public void setTotalWordOccurences(long value)
-
wordFor
public T wordFor(@NonNull @NonNull String label)
Returns SequenceElement for specified label- Specified by:
wordFor
in interfaceVocabCache<T extends SequenceElement>
- Parameters:
label
- to fetch element for- Returns:
-
wordFor
public T wordFor(long id)
- Specified by:
wordFor
in interfaceVocabCache<T extends SequenceElement>
-
addWordToIndex
public void addWordToIndex(int index, String label)
This method allows to insert specified label to specified Huffman tree position. CAUTION: Never use this, unless you 100% sure what are you doing.- Specified by:
addWordToIndex
in interfaceVocabCache<T extends SequenceElement>
- Parameters:
index
-label
-
-
addWordToIndex
public void addWordToIndex(int index, long elementId)
- Specified by:
addWordToIndex
in interfaceVocabCache<T extends SequenceElement>
-
putVocabWord
@Deprecated public void putVocabWord(String word)
Deprecated.Description copied from interface:VocabCache
Inserts the word as a vocab word (it gets the vocab word from the internal token store). Note that the index must be set on the token.- Specified by:
putVocabWord
in interfaceVocabCache<T extends SequenceElement>
- Parameters:
word
- the word to add to the vocab
-
numWords
public int numWords()
Returns number of elements in this vocabulary- Specified by:
numWords
in interfaceVocabCache<T extends SequenceElement>
- Returns:
-
docAppearedIn
public int docAppearedIn(String word)
Returns number of documents (if applicable) the label was observed in.- Specified by:
docAppearedIn
in interfaceVocabCache<T extends SequenceElement>
- Parameters:
word
- the number of documents the word appeared in- Returns:
-
incrementDocCount
public void incrementDocCount(String word, long howMuch)
Increment number of documents the label was observed in Please note: this method is NOT thread-safe- Specified by:
incrementDocCount
in interfaceVocabCache<T extends SequenceElement>
- Parameters:
word
- the word to increment byhowMuch
-
-
setCountForDoc
public void setCountForDoc(String word, long count)
Set exact number of observed documents that contain specified word Please note: this method is NOT thread-safe- Specified by:
setCountForDoc
in interfaceVocabCache<T extends SequenceElement>
- Parameters:
word
- the word to set the count forcount
- the count of the word
-
totalNumberOfDocs
public long totalNumberOfDocs()
Returns total number of documents observed (if applicable)- Specified by:
totalNumberOfDocs
in interfaceVocabCache<T extends SequenceElement>
- Returns:
-
incrementTotalDocCount
public void incrementTotalDocCount()
Increment total number of documents observed by 1- Specified by:
incrementTotalDocCount
in interfaceVocabCache<T extends SequenceElement>
-
incrementTotalDocCount
public void incrementTotalDocCount(long by)
Increment total number of documents observed by specified value- Specified by:
incrementTotalDocCount
in interfaceVocabCache<T extends SequenceElement>
- Parameters:
by
- the number to increment by
-
setTotalDocCount
public void setTotalDocCount(long by)
This method allows to set total number of documents- Parameters:
by
-
-
tokens
public Collection<T> tokens()
Returns collection of SequenceElements from this vocabulary. The same as vocabWords() method- Specified by:
tokens
in interfaceVocabCache<T extends SequenceElement>
- Returns:
- collection of SequenceElements
-
addToken
public boolean addToken(T element)
This method adds specified SequenceElement to vocabulary- Specified by:
addToken
in interfaceVocabCache<T extends SequenceElement>
- Parameters:
element
- the word to add- Returns:
- true if token was added, false if updated
-
addToken
public void addToken(T element, boolean lockf)
-
tokenFor
public T tokenFor(String label)
Returns SequenceElement for specified label. The same as wordFor() method.- Specified by:
tokenFor
in interfaceVocabCache<T extends SequenceElement>
- Parameters:
label
- the label to get the token for- Returns:
-
tokenFor
public T tokenFor(long id)
- Specified by:
tokenFor
in interfaceVocabCache<T extends SequenceElement>
-
hasToken
public boolean hasToken(String label)
Checks, if specified label already exists in vocabulary. The same as containsWord() method.- Specified by:
hasToken
in interfaceVocabCache<T extends SequenceElement>
- Parameters:
label
- the token to test- Returns:
-
importVocabulary
public void importVocabulary(@NonNull @NonNull VocabCache<T> vocabCache)
This method imports all elements from VocabCache passed as argument If element already exists,- Specified by:
importVocabulary
in interfaceVocabCache<T extends SequenceElement>
- Parameters:
vocabCache
-
-
updateWordsOccurrences
public void updateWordsOccurrences()
Description copied from interface:VocabCache
Updates counters- Specified by:
updateWordsOccurrences
in interfaceVocabCache<T extends SequenceElement>
-
removeElement
public void removeElement(String label)
Description copied from interface:VocabCache
Removes element with specified label from vocabulary Please note: Huffman index should be updated after element removal- Specified by:
removeElement
in interfaceVocabCache<T extends SequenceElement>
- Parameters:
label
- label of the element to be removed
-
removeElement
public void removeElement(T element)
Description copied from interface:VocabCache
Removes specified element from vocabulary Please note: Huffman index should be updated after element removal- Specified by:
removeElement
in interfaceVocabCache<T extends SequenceElement>
- Parameters:
element
- SequenceElement to be removed
-
toJson
public String toJson() throws org.nd4j.shade.jackson.core.JsonProcessingException
- Throws:
org.nd4j.shade.jackson.core.JsonProcessingException
-
fromJson
public static <T extends SequenceElement> AbstractCache<T> fromJson(String jsonString) throws IOException
- Throws:
IOException
-
-