class TensorflowElmo extends Serializable
Embeddings from a language model trained on the 1 Billion Word Benchmark.
Note that this is a very computationally expensive module compared to word embedding modules that only perform embedding lookups. The use of an accelerator is recommended.
word_emb: the character-based word representations with shape [batch_size, max_length, 512]. == word_emb
lstm_outputs1: the first LSTM hidden state with shape [batch_size, max_length, 1024]. \=== lstm_outputs1
lstm_outputs2: the second LSTM hidden state with shape [batch_size, max_length, 1024]. \=== lstm_outputs2
elmo: the weighted sum of the 3 layers, where the weights are trainable. This tensor has shape [batch_size, max_length, 1024] == elmo
See https://github.com/JohnSnowLabs/spark-nlp/blob/master/src/test/scala/com/johnsnowlabs/nlp/embeddings/ElmoEmbeddingsTestSpec.scala for further reference on how to use this API.
- Alphabetic
- By Inheritance
- TensorflowElmo
- Serializable
- Serializable
- AnyRef
- Any
- Hide All
- Show All
- Public
- All
Instance Constructors
-
new
TensorflowElmo(tensorflow: TensorflowWrapper, batchSize: Int, configProtoBytes: Option[Array[Byte]] = None)
- tensorflow
Elmo Model wrapper with TensorFlow Wrapper
- batchSize
size of batch
- configProtoBytes
Configuration for TensorFlow session Sources : https://tfhub.dev/google/elmo/3 https://arxiv.org/abs/1802.05365
Value Members
-
final
def
!=(arg0: Any): Boolean
- Definition Classes
- AnyRef → Any
-
final
def
##(): Int
- Definition Classes
- AnyRef → Any
-
final
def
==(arg0: Any): Boolean
- Definition Classes
- AnyRef → Any
-
final
def
asInstanceOf[T0]: T0
- Definition Classes
- Any
-
def
clone(): AnyRef
- Attributes
- protected[lang]
- Definition Classes
- AnyRef
- Annotations
- @throws( ... ) @native()
-
final
def
eq(arg0: AnyRef): Boolean
- Definition Classes
- AnyRef
-
def
equals(arg0: Any): Boolean
- Definition Classes
- AnyRef → Any
-
def
finalize(): Unit
- Attributes
- protected[lang]
- Definition Classes
- AnyRef
- Annotations
- @throws( classOf[java.lang.Throwable] )
-
final
def
getClass(): Class[_]
- Definition Classes
- AnyRef → Any
- Annotations
- @native()
-
def
getDimensions: (String) ⇒ Int
word_emb: the character-based word representations with shape [batch_size, max_length, 512].
word_emb: the character-based word representations with shape [batch_size, max_length, 512]. \== 512
lstm_outputs1: the first LSTM hidden state with shape [batch_size, max_length, 1024]. === 1024
lstm_outputs2: the second LSTM hidden state with shape [batch_size, max_length, 1024]. === 1024
elmo: the weighted sum of the 3 layers, where the weights are trainable. This tensor has shape [batch_size, max_length, 1024] == 1024
- returns
The dimension of chosen layer
-
def
hashCode(): Int
- Definition Classes
- AnyRef → Any
- Annotations
- @native()
-
final
def
isInstanceOf[T0]: Boolean
- Definition Classes
- Any
-
final
def
ne(arg0: AnyRef): Boolean
- Definition Classes
- AnyRef
-
final
def
notify(): Unit
- Definition Classes
- AnyRef
- Annotations
- @native()
-
final
def
notifyAll(): Unit
- Definition Classes
- AnyRef
- Annotations
- @native()
-
def
predict(sentences: Seq[TokenizedSentence], poolingLayer: String): Seq[WordpieceEmbeddingsSentence]
Calculate the embeddigns for a sequence of Tokens and create WordPieceEmbeddingsSentence objects from them
Calculate the embeddigns for a sequence of Tokens and create WordPieceEmbeddingsSentence objects from them
- sentences
A sequence of Tokenized Sentences for which embeddings will be calculated
- poolingLayer
Define which output layer you want from the model word_emb, lstm_outputs1, lstm_outputs2, elmo. See https://tfhub.dev/google/elmo/3 for reference
- returns
A Seq of WordpieceEmbeddingsSentence, one element for each input sentence
-
final
def
synchronized[T0](arg0: ⇒ T0): T0
- Definition Classes
- AnyRef
-
def
tag(batch: Seq[TokenizedSentence], embeddingsKey: String, dimension: Int): Seq[Array[Array[Float]]]
Tag a seq of TokenizedSentences, will get the embeddings according to key.
Tag a seq of TokenizedSentences, will get the embeddings according to key.
- batch
The Tokens for which we calculate embeddings
- embeddingsKey
Specification of the output embedding for Elmo
- dimension
Elmo's embeddings dimension: either 512 or 1024
- returns
The Embeddings Vector. For each Seq Element we have a Sentence, and for each sentence we have an Array for each of its words. Each of its words gets a float array to represent its Embeddings
- val tensorflow: TensorflowWrapper
-
def
toString(): String
- Definition Classes
- AnyRef → Any
-
final
def
wait(): Unit
- Definition Classes
- AnyRef
- Annotations
- @throws( ... )
-
final
def
wait(arg0: Long, arg1: Int): Unit
- Definition Classes
- AnyRef
- Annotations
- @throws( ... )
-
final
def
wait(arg0: Long): Unit
- Definition Classes
- AnyRef
- Annotations
- @throws( ... ) @native()