TensorflowElmo

Embeddings from a language model trained on the 1 Billion Word Benchmark.

Note that this is a very computationally expensive module compared to word embedding modules that only perform embedding lookups. The use of an accelerator is recommended.

word_emb: the character-based word representations with shape [batch_size, max_length, 512]. == word_emb

lstm_outputs1: the first LSTM hidden state with shape [batch_size, max_length, 1024]. === lstm_outputs1

lstm_outputs2: the second LSTM hidden state with shape [batch_size, max_length, 1024]. === lstm_outputs2

elmo: the weighted sum of the 3 layers, where the weights are trainable. This tensor has shape [batch_size, max_length, 1024] == elmo

See https://github.com/JohnSnowLabs/spark-nlp/blob/master/src/test/scala/com/johnsnowlabs/nlp/embeddings/ElmoEmbeddingsTestSpec.scala for further reference on how to use this API.

Linear Supertypes

Serializable, Serializable, AnyRef, Any

Instance Constructors

new TensorflowElmo(tensorflow: TensorflowWrapper, batchSize: Int, configProtoBytes: Option[Array[Byte]] = None)

tensorflow
Elmo Model wrapper with TensorFlow Wrapper
batchSize
size of batch
configProtoBytes
Configuration for TensorFlow session Sources : https://tfhub.dev/google/elmo/3 https://arxiv.org/abs/1802.05365

Value Members

final def !=(arg0: Any): Boolean

Definition Classes
AnyRef → Any
final def ##(): Int

Definition Classes
AnyRef → Any
final def ==(arg0: Any): Boolean

Definition Classes
AnyRef → Any
final def asInstanceOf[T0]: T0

Definition Classes
Any
def calculateEmbeddings(sentences: Seq[TokenizedSentence], poolingLayer: String): Seq[WordpieceEmbeddingsSentence]

Calculate the embeddigns for a sequence of Tokens and create WordPieceEmbeddingsSentence objects from them
Calculate the embeddigns for a sequence of Tokens and create WordPieceEmbeddingsSentence objects from them
sentences
A sequence of Tokenized Sentences for which embeddings will be calculated
poolingLayer
Define which output layer you want from the model word_emb, lstm_outputs1, lstm_outputs2, elmo. See https://tfhub.dev/google/elmo/3 for reference
returns
A Seq of WordpieceEmbeddingsSentence, one element for each input sentence
def clone(): AnyRef

Attributes
protected[java.lang]
Definition Classes
AnyRef
Annotations
@throws( ... )
final def eq(arg0: AnyRef): Boolean

Definition Classes
AnyRef
def equals(arg0: Any): Boolean

Definition Classes
AnyRef → Any
def finalize(): Unit

Attributes
protected[java.lang]
Definition Classes
AnyRef
Annotations
@throws( classOf[java.lang.Throwable] )
final def getClass(): Class[_]

Definition Classes
AnyRef → Any
def getDimensions: (String) ⇒ Int

word_emb: the character-based word representations with shape [batch_size, max_length, 512].
word_emb: the character-based word representations with shape [batch_size, max_length, 512]. == 512
lstm_outputs1: the first LSTM hidden state with shape [batch_size, max_length, 1024]. === 1024
lstm_outputs2: the second LSTM hidden state with shape [batch_size, max_length, 1024]. === 1024
elmo: the weighted sum of the 3 layers, where the weights are trainable. This tensor has shape [batch_size, max_length, 1024] == 1024
returns
The dimension of chosen layer
def hashCode(): Int

Definition Classes
AnyRef → Any
final def isInstanceOf[T0]: Boolean

Definition Classes
Any
final def ne(arg0: AnyRef): Boolean

Definition Classes
AnyRef
final def notify(): Unit

Definition Classes
AnyRef
final def notifyAll(): Unit

Definition Classes
AnyRef
final def synchronized[T0](arg0: ⇒ T0): T0

Definition Classes
AnyRef
def tag(batch: Seq[TokenizedSentence], embeddingsKey: String, dimension: Int): Seq[Array[Array[Float]]]

Tag a seq of TokenizedSentences, will get the embeddings according to key.
Tag a seq of TokenizedSentences, will get the embeddings according to key.
batch
The Tokens for which we calculate embeddings
embeddingsKey
Specification of the output embedding for Elmo
dimension
Elmo's embeddings dimension: either 512 or 1024
returns
The Embeddings Vector. For each Seq Element we have a Sentence, and for each sentence we have an Array for each of its words. Each of its words gets a float array to represent its Embeddings
val tensorflow: TensorflowWrapper

Elmo Model wrapper with TensorFlow Wrapper
def toString(): String

Definition Classes
AnyRef → Any
final def wait(): Unit

Definition Classes
AnyRef
Annotations
@throws( ... )
final def wait(arg0: Long, arg1: Int): Unit

Definition Classes
AnyRef
Annotations
@throws( ... )
final def wait(arg0: Long): Unit

Definition Classes
AnyRef
Annotations
@throws( ... )

Related Doc: package tensorflow

class TensorflowElmo extends Serializable

Instance Constructors

new TensorflowElmo(tensorflow: TensorflowWrapper, batchSize: Int, configProtoBytes: Option[Array[Byte]] = None)

Value Members

final def !=(arg0: Any): Boolean

final def ##(): Int

final def ==(arg0: Any): Boolean

final def asInstanceOf[T0]: T0

def calculateEmbeddings(sentences: Seq[TokenizedSentence], poolingLayer: String): Seq[WordpieceEmbeddingsSentence]

def clone(): AnyRef

final def eq(arg0: AnyRef): Boolean

def equals(arg0: Any): Boolean

def finalize(): Unit

final def getClass(): Class[_]

def getDimensions: (String) ⇒ Int

def hashCode(): Int

final def isInstanceOf[T0]: Boolean

final def ne(arg0: AnyRef): Boolean

final def notify(): Unit

final def notifyAll(): Unit

final def synchronized[T0](arg0: ⇒ T0): T0

def tag(batch: Seq[TokenizedSentence], embeddingsKey: String, dimension: Int): Seq[Array[Array[Float]]]

val tensorflow: TensorflowWrapper

def toString(): String

final def wait(): Unit

final def wait(arg0: Long, arg1: Int): Unit

final def wait(arg0: Long): Unit

Inherited from Serializable

Inherited from Serializable

Inherited from AnyRef

Inherited from Any

Ungrouped