lamp.nn
Provides building blocks for neural networks
Notable types:
- nn.GenericModule is an abstraction on parametric functions
- nn.Optimizer is an abstraction of gradient based optimizers
- nn.LossFunction is an abstraction of loss functions, see the companion object for the implemented losses
- nn.SupervisedModel combines a module with a loss
Optimizers:
Modules facilitating composing other modules:
- nn.Sequential composes a homogenous list of modules (analogous to List)
- nn.sequence composes a heterogeneous list of modules (analogous to tuples)
- nn.EitherModule composes two modules in a scala.Either
Examples of neural network building blocks, layers etc:
- nn.Linear implements
W X + b
with parametersW
andb
and inputX
- nn.BatchNorm, nn.LayerNorm implement batch and layer normalization
- nn.MLP is a factory of a multilayer perceptron architecture
Type members
Classlikes
- See also:
https://arxiv.org/pdf/1711.05101.pdf Algorithm 2
- Companion:
- object
- Companion:
- object
- Companion:
- object
Learnable mapping from classes to dense vectors. Equivalent to L * W where L is the n x C one-hot encoded matrix of the classes * is matrix multiplication W is the C x dim dense matrix. W is learnable. L is never computed directly. C is the number of classes. n is the size of the batch.
Learnable mapping from classes to dense vectors. Equivalent to L * W where L is the n x C one-hot encoded matrix of the classes * is matrix multiplication W is the C x dim dense matrix. W is learnable. L is never computed directly. C is the number of classes. n is the size of the batch.
Input is a long tensor with values in [0,C-1]. Input shape is arbitrary, (). Output shape is ( x D) where D is the embedding dimension.
- Companion:
- object
Wraps a (sequence x batch) long -> (sequence x batch x dim) double stateful
module and runs in it greedy (argmax) generation mode over timeSteps
steps.
Wraps a (sequence x batch) long -> (sequence x batch x dim) double stateful
module and runs in it greedy (argmax) generation mode over timeSteps
steps.
- Companion:
- object
Inputs of size (sequence length * batch * in dim) Outputs of size (sequence length * batch * hidden dim)
Inputs of size (sequence length * batch * in dim) Outputs of size (sequence length * batch * hidden dim)
- Companion:
- object
Base type of modules
Base type of modules
Modules are functions of type (Seq[lamp.autograd.Constant],A) => B
, where
the Seq[lamp.autograd.Constant]
arguments are optimizable parameters and
A
is a non-optimizable input.
Modules provide a way to build composite functions while also keep track of the parameter list of the composite function.
===Example===
case object Weights extends LeafTag
case object Bias extends LeafTag
case class Linear(weights: Constant, bias: Option[Constant]) extends Module {
override val state = List(
weights -> Weights
) ++ bias.toList.map(b => (b, Bias))
def forward[S: Sc](x: Variable): Variable = {
val v = x.mm(weights)
bias.map(_ + v).getOrElse(v)
}
}
Some other attributes of modules are attached by type classes e.g. with the nn.TrainingMode, nn.Load type classes.
Type class about how to initialize recurrent neural networks
Type class about how to initialize recurrent neural networks
- Companion:
- object
Inputs of size (sequence length * batch * vocab) Outputs of size (sequence length * batch * output dim)
Inputs of size (sequence length * batch * vocab) Outputs of size (sequence length * batch * output dim)
- Companion:
- object
- Companion:
- object
Type class about how to load the contents of the state of modules from external tensors
Type class about how to load the contents of the state of modules from external tensors
- Companion:
- object
Loss and Gradient calculation
Loss and Gradient calculation
Takes samples, target, module, loss function and computes the loss and the gradients
Factory for multilayer fully connected feed forward networks
Factory for multilayer fully connected feed forward networks
Returned network has the following repeated structure: [linear -> batchnorm -> nonlinearity -> dropout]*
The last block does not include the nonlinearity and the dropout.
- Value parameters:
- dropout
dropout applied to each block
- hidden
list of hidden dimensions
- in
input dimensions
- out
output dimensions
- Companion:
- object
Multi-head scaled dot product attention module
Multi-head scaled dot product attention module
Input: (query,key,value,tokens) where query: batch x num queries x query dim key: batch x num k-v x key dim value: batch x num k-v x key value tokens: batch x num queries, long type
Tokens is used to carry over padding information and ignore the padding
- Companion:
- object
A small trait to mark paramters for unique identification
A small trait to mark paramters for unique identification
- Companion:
- object
Evaluates the gradient at current point + eps where eps is I * N(0,noiseLevel)
Evaluates the gradient at current point + eps where eps is I * N(0,noiseLevel)
Rectified Adam optimizer algorithm
Rectified Adam optimizer algorithm
- Companion:
- object
Inputs of size (sequence length * batch * in dim) Outputs of size (sequence length * batch * hidden dim)
Inputs of size (sequence length * batch * in dim) Outputs of size (sequence length * batch * hidden dim)
- Companion:
- object
- Companion:
- object
- Companion:
- object
- Companion:
- object
- Companion:
- object
- Companion:
- object
- Companion:
- object
- Companion:
- object
- Companion:
- object
- Companion:
- object
Inputs of size (sequence length * batch * in dim) Outputs of size (sequence length * batch * output dim) Applies a linear function to each time step
Inputs of size (sequence length * batch * in dim) Outputs of size (sequence length * batch * output dim) Applies a linear function to each time step
- Companion:
- object
- Companion:
- object
- See also:
https://arxiv.org/pdf/1802.09568.pdf Algorithm 1
- Companion:
- object
- Companion:
- object
- Companion:
- object
- Companion:
- object
- Companion:
- object
Type class about how to switch a module into training or evaluation mode
Type class about how to switch a module into training or evaluation mode
- Companion:
- object
Gradients are not computed for positionalEmbedding
Gradients are not computed for positionalEmbedding
- Companion:
- object
TransformerEncoder module
TransformerEncoder module
Input is (data, tokens)
where data
is (batch, num tokens, in dimension),
double tensor tokens
is (batch,num tokens) long tensor.
Output is (bach, num tokens, out dimension)
The sole purpose of tokens
is to carry over the padding
- Companion:
- object
A single block of the transformer encoder as defined in Fig 10.7.1 in d2l v0.16
A single block of the transformer encoder as defined in Fig 10.7.1 in d2l v0.16
- Companion:
- object
- Companion:
- object
- Companion:
- object
- Companion:
- object
The Yogi optimizer algorithm I added the decoupled weight decay term following https://arxiv.org/pdf/1711.05101.pdf
The Yogi optimizer algorithm I added the decoupled weight decay term following https://arxiv.org/pdf/1711.05101.pdf
- See also:
- Companion:
- object