lamp.nn

Learnable mapping from classes to dense vectors. Equivalent to L * W where L is the n x C one-hot encoded matrix of the classes * is matrix multiplication W is the C x dim dense matrix. W is learnable. L is never computed directly. C is the number of classes. n is the size of the batch.

Input is a long tensor with values in [0,C-1]. Input shape is arbitrary, (). Output shape is ( x D) where D is the embedding dimension.

Companion:: object

Companion:: class

Wraps a (sequence x batch) long -> (sequence x batch x dim) double stateful module and runs in it greedy (argmax) generation mode over timeSteps steps.

Companion:: object

Companion:: class

Companion:: object

Companion:: class

Inputs of size (sequence length * batch * in dim) Outputs of size (sequence length * batch * hidden dim)

Companion:: object

Companion:: class

Companion:: object

Companion:: class

Companion:: class

Base type of modules

Modules are functions of type (Seq[lamp.autograd.Constant],A) => B, where the Seq[lamp.autograd.Constant] arguments are optimizable parameters and A is a non-optimizable input.

Modules provide a way to build composite functions while also keep track of the parameter list of the composite function.

===Example===

case object Weights extends LeafTag
case object Bias extends LeafTag
case class Linear(weights: Constant, bias: Option[Constant]) extends Module {

 override val state = List(
   weights -> Weights
 ) ++ bias.toList.map(b => (b, Bias))

 def forward[S: Sc](x: Variable): Variable = {
   val v = x.mm(weights)
   bias.map(_ + v).getOrElse(v)

 }
}

Some other attributes of modules are attached by type classes e.g. with the nn.TrainingMode, nn.Load type classes.

Type parameters:

A: the argument type of the module
B: the value type of the module

See also:

nn.Module is an alias for simple Variable => Variable modules

Companion:

object

Type class about how to initialize recurrent neural networks

Companion:: object

Companion:: class

Inputs of size (sequence length * batch * vocab) Outputs of size (sequence length * batch * output dim)

Companion:: object

Companion:: class

Companion:: object

Companion:: class

Companion:: object

Companion:: class

Companion:: object

Companion:: class

Companion:: object

Companion:: class

Type class about how to load the contents of the state of modules from external tensors

Companion:: object

Companion:: class

Loss and Gradient calculation

Takes samples, target, module, loss function and computes the loss and the gradients

Factory for multilayer fully connected feed forward networks

Returned network has the following repeated structure: [linear -> batchnorm -> nonlinearity -> dropout]*

The last block does not include the nonlinearity and the dropout.

Value parameters:

dropout: dropout applied to each block
hidden: list of hidden dimensions
in: input dimensions
out: output dimensions

Companion:: object

Companion:: class

Multi-head scaled dot product attention module

Input: (query,key,value,tokens) where query: batch x num queries x query dim key: batch x num k-v x key dim value: batch x num k-v x key value tokens: batch x num queries, long type

Tokens is used to carry over padding information and ignore the padding

Companion:: object

Companion:: class

A small trait to mark paramters for unique identification

Companion:: object

Companion:: class

Evaluates the gradient at current point + eps where eps is I * N(0,noiseLevel)

Companion:: class

Rectified Adam optimizer algorithm

Companion:: object

Inputs of size (sequence length * batch * in dim) Outputs of size (sequence length * batch * hidden dim)

Companion:: object

Companion:: class

Companion:: object

Companion:: class

Companion:: object

Companion:: class

Companion:: class

Companion:: object

Companion:: object

Companion:: class

Companion:: object

Companion:: class

Companion:: object

Companion:: class

Companion:: object

Companion:: class

Companion:: object

Companion:: class

Companion:: object

Companion:: class

Companion:: object

Companion:: class

Inputs of size (sequence length * batch * in dim) Outputs of size (sequence length * batch * output dim) Applies a linear function to each time step

Companion:: object

Companion:: class

Companion:: object

Companion:: class

Companion:: class

See also:: https://arxiv.org/pdf/1802.09568.pdf Algorithm 1
Companion:: object

Companion:: object

Companion:: class

Companion:: object

Companion:: class

Companion:: object

Companion:: class

Companion:: object

Companion:: class

Type class about how to switch a module into training or evaluation mode

Companion:: object

Companion:: class

Gradients are not computed for positionalEmbedding

Companion:: object

Companion:: class

TransformerEncoder module

Input is (data, tokens) where data is (batch, num tokens, in dimension), double tensor tokens is (batch,num tokens) long tensor.

Output is (bach, num tokens, out dimension)

The sole purpose of tokens is to carry over the padding

Companion:: object

Companion:: class

A single block of the transformer encoder as defined in Fig 10.7.1 in d2l v0.16

Companion:: object

Companion:: class

Companion:: object

Companion:: class

Companion:: object

Companion:: class

Companion:: object

Companion:: class

Companion:: class

The Yogi optimizer algorithm I added the decoupled weight decay term following https://arxiv.org/pdf/1711.05101.pdf

See also:: https://papers.nips.cc/paper/2018/file/90365351ccc7437a1309dc64e4db32a3-Paper.pdf Algorithm 2
Companion:: object

lamp.nn

Type members

Classlikes

Types

Value members

Concrete methods

Implicits

Implicits