Word2Vec (deeplearning4j-nlp 0.4-rc1 API)

java.lang.Object
- org.deeplearning4j.models.embeddings.wordvectors.WordVectorsImpl
- - org.deeplearning4j.models.word2vec.Word2Vec

All Implemented Interfaces:

Serializable, WordVectors

Direct Known Subclasses:

ParagraphVectors
```
public class Word2Vec
extends WordVectorsImpl
```
Leveraging a 3 layer neural net with a softmax approach as output, converts a word based on its context and the training examples in to a numeric vector

Author:

Adam Gibson

See Also:

Serialized Form

Nested Class Summary

Nested Classes
Modifier and Type Class and Description

static class Word2Vec.Builder

Nested Classes
Modifier and Type	Class and Description
`static class`	`Word2Vec.Builder`

Field Summary

Fields
Modifier and Type	Field and Description
`protected com.google.common.util.concurrent.AtomicDouble`	`alpha`
`protected int`	`batchSize`
`protected DocumentIterator`	`docIter`
`protected org.apache.commons.math3.random.RandomGenerator`	`g`
`protected InvertedIndex`	`invertedIndex`
`protected int`	`learningRateDecayWords`
`protected static org.slf4j.Logger`	`log`
`protected double`	`minLearningRate`
`protected int`	`numIterations`
`protected double`	`sample`
`protected boolean`	`saveVocab`
`protected long`	`seed`
`protected SentenceIterator`	`sentenceIter`
`protected static long`	`serialVersionUID`
`protected boolean`	`shouldReset`
`protected TokenizerFactory`	`tokenizerFactory`
`protected long`	`totalWords`
`static String`	`UNK`
`protected boolean`	`useAdaGrad`
`protected TextVectorizer`	`vectorizer`
`protected int`	`window`
`protected int`	`workers`

Fields inherited from class org.deeplearning4j.models.embeddings.wordvectors.WordVectorsImpl
layerSize, lookupTable, minWordFrequency, stopWords, vocab

Constructor Summary

Constructors
Constructor and Description

Word2Vec()

Constructors
Constructor and Description
`Word2Vec()`

Method Summary

All Methods Instance Methods Concrete Methods
Modifier and Type	Method and Description
`protected void`	`addWords(List<VocabWord> sentence, AtomicLong nextRandom, List<VocabWord> currMiniBatch)`
`protected void`	`buildBinaryTree()`
`boolean`	`buildVocab()` Builds the vocabulary for training
`void`	`fit()` Train the model
`SentenceIterator`	`getSentenceIter()`
`List<String>`	`getStopWords()`
`TokenizerFactory`	`getTokenizerFactory()`
`TextVectorizer`	`getVectorizer()`
`int`	`getWindow()`
`void`	`iterate(VocabWord w1, VocabWord w2, AtomicLong nextRandom, double alpha)` Train the word vector on the given words
`protected void`	`readStopWords()`
`protected void`	`resetWeights()`
`void`	`resetWeightsOnSetup()` restart training on next fit().
`void`	`setSentenceIter(SentenceIterator sentenceIter)` Note that calling a setter on this means assumes that this is a training continuation and therefore weights should not be reset.
`void`	`setTokenizerFactory(TokenizerFactory tokenizerFactory)`
`void`	`setup()` Build the binary tree Reset the weights
`void`	`setVectorizer(TextVectorizer vectorizer)`
`void`	`skipGram(int i, List<VocabWord> sentence, int b, AtomicLong nextRandom, double alpha)` Train via skip gram
`void`	`trainSentence(List<VocabWord> sentence, AtomicLong nextRandom, double alpha)` Train on a list of vocab words

Methods inherited from class org.deeplearning4j.models.embeddings.wordvectors.WordVectorsImpl
accuracy, getWordVector, getWordVectorMatrix, getWordVectorMatrixNormalized, hasWord, indexOf, lookupTable, setLookupTable, setVocab, similarity, similarWordsInVocabTo, vocab, wordsNearest, wordsNearest, wordsNearest, wordsNearestSum, wordsNearestSum, wordsNearestSum

Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

Field Detail

serialVersionUID
```
protected static final long serialVersionUID
```
See Also:

Constant Field Values

tokenizerFactory

protected transient TokenizerFactory tokenizerFactory

sentenceIter

protected transient SentenceIterator sentenceIter

docIter

protected transient DocumentIterator docIter

batchSize
```
protected int batchSize
```

sample
```
protected double sample
```

totalWords
```
protected long totalWords
```

alpha

protected com.google.common.util.concurrent.AtomicDouble alpha

window
```
protected int window
```

g

protected transient org.apache.commons.math3.random.RandomGenerator g

log

protected static final org.slf4j.Logger log

shouldReset
```
protected boolean shouldReset
```

numIterations
```
protected int numIterations
```

UNK
```
public static final String UNK
```
See Also:

Constant Field Values

seed
```
protected long seed
```

saveVocab
```
protected boolean saveVocab
```

minLearningRate
```
protected double minLearningRate
```

vectorizer

protected transient TextVectorizer vectorizer

learningRateDecayWords
```
protected int learningRateDecayWords
```

invertedIndex
```
protected InvertedIndex invertedIndex
```

useAdaGrad
```
protected boolean useAdaGrad
```

workers
```
protected int workers
```

Constructor Detail
- Word2Vec
```
public Word2Vec()
```

Method Detail

getVectorizer
```
public TextVectorizer getVectorizer()
```

setVectorizer

public void setVectorizer(TextVectorizer vectorizer)

fit

public void fit()
         throws IOException

Train the model

Throws:: IOException

addWords

protected void addWords(List<VocabWord> sentence,
                        AtomicLong nextRandom,
                        List<VocabWord> currMiniBatch)

setup
```
public void setup()
```
Build the binary tree Reset the weights

buildVocab
```
public boolean buildVocab()
```
Builds the vocabulary for training

trainSentence

public void trainSentence(List<VocabWord> sentence,
                          AtomicLong nextRandom,
                          double alpha)

Train on a list of vocab words

Parameters:: sentence - the list of vocab words to train on

skipGram

public void skipGram(int i,
                     List<VocabWord> sentence,
                     int b,
                     AtomicLong nextRandom,
                     double alpha)

Train via skip gram

Parameters:: i -; sentence -

iterate

public void iterate(VocabWord w1,
                    VocabWord w2,
                    AtomicLong nextRandom,
                    double alpha)

Train the word vector on the given words

Parameters:: w1 - the first word to fit

buildBinaryTree
```
protected void buildBinaryTree()
```

resetWeights
```
protected void resetWeights()
```

readStopWords
```
protected void readStopWords()
```

setSentenceIter
```
public void setSentenceIter(SentenceIterator sentenceIter)
```
Note that calling a setter on this means assumes that this is a training continuation and therefore weights should not be reset.

Parameters:

sentenceIter -

resetWeightsOnSetup
```
public void resetWeightsOnSetup()
```
restart training on next fit(). Use when sentence iterator is set for new training.

getWindow
```
public int getWindow()
```

getStopWords
```
public List<String> getStopWords()
```

getSentenceIter

public SentenceIterator getSentenceIter()

getTokenizerFactory

public TokenizerFactory getTokenizerFactory()

setTokenizerFactory

public void setTokenizerFactory(TokenizerFactory tokenizerFactory)

Class Word2Vec

Nested Class Summary

Field Summary

Fields inherited from class org.deeplearning4j.models.embeddings.wordvectors.WordVectorsImpl

Constructor Summary

Method Summary

Methods inherited from class org.deeplearning4j.models.embeddings.wordvectors.WordVectorsImpl

Methods inherited from class java.lang.Object

Field Detail

serialVersionUID

tokenizerFactory

sentenceIter

docIter

batchSize

sample

totalWords

alpha

window

g

log

shouldReset

numIterations

UNK

seed

saveVocab

minLearningRate

vectorizer

learningRateDecayWords

invertedIndex

useAdaGrad

workers

Constructor Detail

Word2Vec

Method Detail

getVectorizer

setVectorizer

fit

addWords

setup

buildVocab

trainSentence

skipGram

iterate

buildBinaryTree

resetWeights

readStopWords

setSentenceIter

resetWeightsOnSetup

getWindow

getStopWords

getSentenceIter

getTokenizerFactory

setTokenizerFactory