MultiheadAttention

lamp.nn.MultiheadAttention

See theMultiheadAttention companion object

case class MultiheadAttention(wQ: Constant, wK: Constant, wV: Constant, wO: Constant, dropout: Double, train: Boolean, numHeads: Int, linearized: Boolean, causalMask: Boolean) extends GenericModule[(Variable, Variable, Variable, Option[STen]), Variable]

Multi-head scaled dot product attention module

Input: (query,key,value,maxLength) where

Attributes

Companion: object
Graph
Supertypes: trait Serializable

trait Product

trait Equals

trait GenericModule[(Variable, Variable, Variable, Option[STen]), Variable]

class Object

trait Matchable

class Any
Show all

The implementation of the function.

In addition of x it can also use all the `state to compute its value.

Alias of forward

Computes the gradient of loss with respect to the parameters.

Returns the total number of optimizable parameters.

Returns the state variables which need gradient computation.

List of optimizable, or non-optimizable, but stateful parameters

Stateful means that the state is carried over the repeated forward calls.

In this article

Generated with