lamp.nn.languagemodel

Members list

Type members

Classlikes

case class LanguageModelInput(tokens: Constant, maxLength: Option[STen], positions: Option[STen])

Input to language model

Input to language model

Value parameters

maxLength

batch x sequence OR batch, see maskedSoftmax. Used to define masking of the attention matrix. Use cases:

  • Left-to-right (causal) attention with uniform sequence length. In this case use a batch x sequence 2D matrix with arange(0,sequence) in each row.
  • Variable length sequences with bidirectional attention. In this case use a 1D [batch] vector with the real length of each sequence (rest are padded).
  • If empty the attention matrix is not masked
positions

batch x sequence, type long in [0,sequence], selects positions. Final LM logits are computed on the selected positions. If empty then selects all positions.

tokens

batch x sequence, type long

Attributes

Companion
object
Supertypes
trait Serializable
trait Product
trait Equals
class Object
trait Matchable
class Any
Show all

Attributes

Companion
class
Supertypes
trait Product
trait Mirror
class Object
trait Matchable
class Any
Self type

Module with the language model and a loss

Module with the language model and a loss

Main trainig entry point of the language model

Attributes

Companion
object
Supertypes
trait Serializable
trait Product
trait Equals
class Object
trait Matchable
class Any
Show all

Attributes

Companion
class
Supertypes
trait Product
trait Mirror
class Object
trait Matchable
class Any
Self type
case class LanguageModelModule(tokenEmbedding: Embedding, positionEmbedding: Embedding, encoder: TransformerEncoder, finalNorm: LayerNorm) extends GenericModule[LanguageModelInput, LanguageModelOutput]

Transformer based language model module

Transformer based language model module

Initial embedding is the sum of token and position embedding. Token embedding is a learned embedding. Position embedding is also a learned embedding (not sinusoidal etc).

Initial embeddings are fed into layers of transformer blocks. Attention masking is governed by the input similarly as described in chapter 11.3.2.1 in d2l v1.0.0-beta0.

Selected sequence positions in the output of the transformer chain are linearly mapped back into the desired vocabulary size.

Attributes

Companion
object
Supertypes
trait Serializable
trait Product
trait Equals
class Object
trait Matchable
class Any
Show all

Attributes

Companion
class
Supertypes
trait Product
trait Mirror
class Object
trait Matchable
class Any
Self type
case class LanguageModelOutput(encoded: Variable, languageModelLogits: Variable)

Output of LM

Output of LM

Value parameters

encoded

encoded: float tensor of size (batch, sequence length, embedding dimension) holds per token embeddings

languageModelLogits

float tensor of size (batch, sequence length, vocabulary size) holds per token logits. Use logSoftMax(dim=2) to get log probabilities.

Attributes

Supertypes
trait Serializable
trait Product
trait Equals
class Object
trait Matchable
class Any
Show all
case class LanguageModelOutputNonVariable(encoded: STen, languageModelLogits: STen)

Attributes

Companion
object
Supertypes
trait Serializable
trait Product
trait Equals
class Object
trait Matchable
class Any
Show all

Attributes

Companion
class
Supertypes
trait Product
trait Mirror
class Object
trait Matchable
class Any
Self type
case class LossInput(input: LanguageModelInput, languageModelTarget: STen)

Language model input and target for loss calculation

Language model input and target for loss calculation

Value parameters

languageModelTarget

batch x sequence

Attributes

Companion
object
Supertypes
trait Serializable
trait Product
trait Equals
class Object
trait Matchable
class Any
Show all
object LossInput

Attributes

Companion
class
Supertypes
trait Product
trait Mirror
class Object
trait Matchable
class Any
Self type
LossInput.type