lamp.nn.languagemodel

Input to language model

Value parameters

maxLength

batch x sequence OR batch, see maskedSoftmax. Used to define masking of the attention matrix. Use cases:

Left-to-right (causal) attention with uniform sequence length. In this case use a batch x sequence 2D matrix with arange(0,sequence) in each row.
Variable length sequences with bidirectional attention. In this case use a 1D [batch] vector with the real length of each sequence (rest are padded).
If empty the attention matrix is not masked

positions

batch x sequence, type long in [0,sequence], selects positions. Final LM logits are computed on the selected positions. If empty then selects all positions.

tokens

batch x sequence, type long

Attributes

Companion: object
Supertypes: trait Serializable

trait Product

trait Equals

class Object

trait Matchable

class Any
Show all

Attributes

Companion: class
Supertypes: trait Product

trait Mirror

class Object

trait Matchable

class Any
Self type: LanguageModelInput.type

Module with the language model and a loss

Main trainig entry point of the language model

Attributes

Companion: object
Supertypes: trait Serializable

trait Product

trait Equals

trait GenericModule[LossInput, Variable]

class Object

trait Matchable

class Any
Show all

Attributes

Companion: class
Supertypes: trait Product

trait Mirror

class Object

trait Matchable

class Any
Self type: LanguageModelLoss.type

Transformer based language model module

Initial embedding is the sum of token and position embedding. Token embedding is a learned embedding. Position embedding is also a learned embedding (not sinusoidal etc).

Initial embeddings are fed into layers of transformer blocks. Attention masking is governed by the input similarly as described in chapter 11.3.2.1 in d2l v1.0.0-beta0.

Selected sequence positions in the output of the transformer chain are linearly mapped back into the desired vocabulary size.

Attributes

Companion: object
Supertypes: trait Serializable

trait Product

trait Equals

trait GenericModule[LanguageModelInput, LanguageModelOutput]

class Object

trait Matchable

class Any
Show all

Attributes

Companion: class
Supertypes: trait Product

trait Mirror

class Object

trait Matchable

class Any
Self type: LanguageModelModule.type

Output of LM

Value parameters

encoded: encoded: float tensor of size (batch, sequence length, embedding dimension) holds per token embeddings
languageModelLogits: float tensor of size (batch, sequence length, vocabulary size) holds per token logits. Use logSoftMax(dim=2) to get log probabilities.

Attributes

Supertypes: trait Serializable

trait Product

trait Equals

class Object

trait Matchable

class Any
Show all

Attributes

Companion: object
Supertypes: trait Serializable

trait Product

trait Equals

class Object

trait Matchable

class Any
Show all

Attributes

Companion: class
Supertypes: trait Product

trait Mirror

class Object

trait Matchable

class Any
Self type: LanguageModelOutputNonVariable.type

Language model input and target for loss calculation

Value parameters

languageModelTarget: batch x sequence

Attributes

Companion: object
Supertypes: trait Serializable

trait Product

trait Equals

class Object

trait Matchable

class Any
Show all

Attributes

Companion: class
Supertypes: trait Product

trait Mirror

class Object

trait Matchable

class Any
Self type: LossInput.type

lamp.nn.languagemodel

Members list

Type members

Classlikes

Value parameters

Attributes

Attributes

Attributes

Attributes

Attributes

Attributes

Value parameters

Attributes

Attributes

Attributes

Value parameters

Attributes

Attributes