Elmo Model wrapper with TensorFlow Wrapper
size of batch
Configuration for TensorFlow session Sources : https://tfhub.dev/google/elmo/3 https://arxiv.org/abs/1802.05365
Calculate the embeddigns for a sequence of Tokens and create WordPieceEmbeddingsSentence objects from them
Calculate the embeddigns for a sequence of Tokens and create WordPieceEmbeddingsSentence objects from them
A sequence of Tokenized Sentences for which embeddings will be calculated
Define which output layer you want from the model word_emb, lstm_outputs1, lstm_outputs2, elmo. See https://tfhub.dev/google/elmo/3 for reference
A Seq of WordpieceEmbeddingsSentence, one element for each input sentence
word_emb: the character-based word representations with shape [batch_size, max_length, 512].
word_emb: the character-based word representations with shape [batch_size, max_length, 512]. == 512
lstm_outputs1: the first LSTM hidden state with shape [batch_size, max_length, 1024]. === 1024
lstm_outputs2: the second LSTM hidden state with shape [batch_size, max_length, 1024]. === 1024
elmo: the weighted sum of the 3 layers, where the weights are trainable. This tensor has shape [batch_size, max_length, 1024] == 1024
The dimension of chosen layer
Tag a seq of TokenizedSentences, will get the embeddings according to key.
Tag a seq of TokenizedSentences, will get the embeddings according to key.
The Tokens for which we calculate embeddings
Specification of the output embedding for Elmo
Elmo's embeddings dimension: either 512 or 1024
The Embeddings Vector. For each Seq Element we have a Sentence, and for each sentence we have an Array for each of its words. Each of its words gets a float array to represent its Embeddings
Elmo Model wrapper with TensorFlow Wrapper
Embeddings from a language model trained on the 1 Billion Word Benchmark.
Note that this is a very computationally expensive module compared to word embedding modules that only perform embedding lookups. The use of an accelerator is recommended.
word_emb: the character-based word representations with shape [batch_size, max_length, 512]. == word_emb
lstm_outputs1: the first LSTM hidden state with shape [batch_size, max_length, 1024]. === lstm_outputs1
lstm_outputs2: the second LSTM hidden state with shape [batch_size, max_length, 1024]. === lstm_outputs2
elmo: the weighted sum of the 3 layers, where the weights are trainable. This tensor has shape [batch_size, max_length, 1024] == elmo
See https://github.com/JohnSnowLabs/spark-nlp/blob/master/src/test/scala/com/johnsnowlabs/nlp/embeddings/ElmoEmbeddingsTestSpec.scala for further reference on how to use this API.