AttentionDecoder

case class AttentionDecoder[T, M <: StatefulModule[Variable, Variable, T], M0 <: Module](decoder: M & StatefulModule[Variable, Variable, T], embedding: M0 & Module, stateToKey: T => Variable, keyValue: Variable, tokens: Variable, padToken: Long) extends StatefulModule[Variable, Variable, T]

trait Serializable

trait Product

trait Equals

class Object

trait Matchable

class Any

Value members

Alias of forward

Computes the gradient of loss with respect to the parameters.

Returns the total number of optimizable parameters.

Returns the state variables which need gradient computation.