MultiheadAttention

case class MultiheadAttention(wQ: Constant, wK: Constant, wV: Constant, wO: Constant, dropout: Double, train: Boolean, numHeads: Int, padToken: Long, linearized: Boolean) extends GenericModule[(Variable, Variable, Variable, STen), Variable]

Multi-head scaled dot product attention module

Input: (query,key,value,tokens) where query: batch x num queries x query dim key: batch x num k-v x key dim value: batch x num k-v x key value tokens: batch x num queries, long type

Tokens is used to carry over padding information and ignore the padding

Companion:
object
trait Serializable
trait Product
trait Equals
class Object
trait Matchable
class Any

Value members

Concrete methods

override def forward[S : Sc](x: (Variable, Variable, Variable, STen)): Variable
Definition Classes

Inherited methods

def apply[S : Sc](a: (Variable, Variable, Variable, STen)): Variable

Alias of forward

Alias of forward

Inherited from:
GenericModule
final def gradients(loss: Variable, zeroGrad: Boolean): Seq[Option[STen]]

Computes the gradient of loss with respect to the parameters.

Computes the gradient of loss with respect to the parameters.

Inherited from:
GenericModule
final def learnableParameters: Long

Returns the total number of optimizable parameters.

Returns the total number of optimizable parameters.

Inherited from:
GenericModule
final def parameters: Seq[(Constant, PTag)]

Returns the state variables which need gradient computation.

Returns the state variables which need gradient computation.

Inherited from:
GenericModule
def productElementNames: Iterator[String]
Inherited from:
Product
def productIterator: Iterator[Any]
Inherited from:
Product
final def zeroGrad(): Unit
Inherited from:
GenericModule

Concrete fields

override val state: Seq[(Constant, PTag)]