case class MultiheadAttention(wQ: Constant, wK: Constant, wV: Constant, wO: Constant, dropout: Double, train: Boolean, numHeads: Int, padToken: Long, linearized: Boolean) extends GenericModule[(Variable, Variable, Variable, STen), Variable] with Product with Serializable
Multi-head scaled dot product attention module
Input: (query,key,value,tokens) where query: batch x num queries x query dim key: batch x num k-v x key dim value: batch x num k-v x key value tokens: batch x num queries, long type
Tokens is used to carry over padding information and ignore the padding
- Alphabetic
- By Inheritance
- MultiheadAttention
- Serializable
- Serializable
- Product
- Equals
- GenericModule
- AnyRef
- Any
- Hide All
- Show All
- Public
- All
Instance Constructors
Value Members
-
final
def
!=(arg0: Any): Boolean
- Definition Classes
- AnyRef → Any
-
final
def
##(): Int
- Definition Classes
- AnyRef → Any
-
final
def
==(arg0: Any): Boolean
- Definition Classes
- AnyRef → Any
-
def
apply[S](a: (Variable, Variable, Variable, STen))(implicit arg0: Sc[S]): Variable
Alias of forward
Alias of forward
- Definition Classes
- GenericModule
-
final
def
asInstanceOf[T0]: T0
- Definition Classes
- Any
-
def
clone(): AnyRef
- Attributes
- protected[lang]
- Definition Classes
- AnyRef
- Annotations
- @throws( ... ) @native()
- val dropout: Double
-
final
def
eq(arg0: AnyRef): Boolean
- Definition Classes
- AnyRef
-
def
finalize(): Unit
- Attributes
- protected[lang]
- Definition Classes
- AnyRef
- Annotations
- @throws( classOf[java.lang.Throwable] )
-
def
forward[S](x: (Variable, Variable, Variable, STen))(implicit arg0: Sc[S]): Variable
The implementation of the function.
The implementation of the function.
In addition of
x
it can also use all thestate to compute its value.
- Definition Classes
- MultiheadAttention → GenericModule
-
final
def
getClass(): Class[_]
- Definition Classes
- AnyRef → Any
- Annotations
- @native()
-
final
def
gradients(loss: Variable, zeroGrad: Boolean = true): Seq[Option[STen]]
Computes the gradient of loss with respect to the parameters.
Computes the gradient of loss with respect to the parameters.
- Definition Classes
- GenericModule
-
final
def
isInstanceOf[T0]: Boolean
- Definition Classes
- Any
-
final
def
learnableParameters: Long
Returns the total number of optimizable parameters.
Returns the total number of optimizable parameters.
- Definition Classes
- GenericModule
- val linearized: Boolean
-
final
def
ne(arg0: AnyRef): Boolean
- Definition Classes
- AnyRef
-
final
def
notify(): Unit
- Definition Classes
- AnyRef
- Annotations
- @native()
-
final
def
notifyAll(): Unit
- Definition Classes
- AnyRef
- Annotations
- @native()
- val numHeads: Int
- val padToken: Long
-
final
def
parameters: Seq[(Constant, PTag)]
Returns the state variables which need gradient computation.
Returns the state variables which need gradient computation.
- Definition Classes
- GenericModule
-
val
state: List[(Constant, LeafTag with Product with Serializable)]
List of optimizable, or non-optimizable, but stateful parameters
List of optimizable, or non-optimizable, but stateful parameters
Stateful means that the state is carried over the repeated forward calls.
- Definition Classes
- MultiheadAttention → GenericModule
-
final
def
synchronized[T0](arg0: ⇒ T0): T0
- Definition Classes
- AnyRef
- val train: Boolean
- val wK: Constant
- val wO: Constant
- val wQ: Constant
- val wV: Constant
-
final
def
wait(): Unit
- Definition Classes
- AnyRef
- Annotations
- @throws( ... )
-
final
def
wait(arg0: Long, arg1: Int): Unit
- Definition Classes
- AnyRef
- Annotations
- @throws( ... )
-
final
def
wait(arg0: Long): Unit
- Definition Classes
- AnyRef
- Annotations
- @throws( ... ) @native()
-
final
def
zeroGrad(): Unit
- Definition Classes
- GenericModule