Packages

object MultiheadAttention extends Serializable

Linear Supertypes
Serializable, AnyRef, Any
Ordering
  1. Alphabetic
  2. By Inheritance
Inherited
  1. MultiheadAttention
  2. Serializable
  3. AnyRef
  4. Any
  1. Hide All
  2. Show All
Visibility
  1. Public
  2. Protected

Value Members

  1. final def !=(arg0: Any): Boolean
    Definition Classes
    AnyRef → Any
  2. final def ##: Int
    Definition Classes
    AnyRef → Any
  3. final def ==(arg0: Any): Boolean
    Definition Classes
    AnyRef → Any
  4. def apply[S](dQ: Int, dK: Int, dV: Int, hiddenPerHead: Int, out: Int, dropout: Double, numHeads: Int, padToken: Long, tOpt: STenOptions, linearized: Boolean)(implicit arg0: Sc[S]): MultiheadAttention
  5. final def asInstanceOf[T0]: T0
    Definition Classes
    Any
  6. def clone(): AnyRef
    Attributes
    protected[lang]
    Definition Classes
    AnyRef
    Annotations
    @throws(classOf[java.lang.CloneNotSupportedException]) @native()
  7. final def eq(arg0: AnyRef): Boolean
    Definition Classes
    AnyRef
  8. def equals(arg0: AnyRef): Boolean
    Definition Classes
    AnyRef → Any
  9. def finalize(): Unit
    Attributes
    protected[lang]
    Definition Classes
    AnyRef
    Annotations
    @throws(classOf[java.lang.Throwable])
  10. final def getClass(): Class[_ <: AnyRef]
    Definition Classes
    AnyRef → Any
    Annotations
    @native()
  11. def hashCode(): Int
    Definition Classes
    AnyRef → Any
    Annotations
    @native()
  12. final def isInstanceOf[T0]: Boolean
    Definition Classes
    Any
  13. def linearizedAttention[S](query: Variable, keys: Variable, values: Variable, tokens: STen, padToken: Long, dropout: Double, trainDropout: Boolean)(implicit arg0: Sc[S]): Variable

    Linearized dot product attention https://arxiv.org/pdf/2006.16236.pdf

    Linearized dot product attention https://arxiv.org/pdf/2006.16236.pdf

    replaces exp(a dot b) with f(a) dot f(b) where f is any elementwise function, in the paper f(x) = elu(x)+1 here f(x) = swish1(x)+1 due to this decomposition a more efficient configuration of the chained matrix multiplication may be used: (Q Kt) V = Q (Kt V)

    (batch,query) locations where tokens(batch,query) == pad are ignored

    query

    batch x num queries x key dim

    tokens

    batch x num queries , type long

    returns

    batch x num queries x value dim

  14. implicit val load: Load[MultiheadAttention]
  15. def maskedSoftmax[S](input: Variable, pad: Long, tokens: STen)(implicit arg0: Sc[S]): Variable

    input

    batch x seq x ???

    tokens

    batch x seq , long

    returns

    batch x seq x ???

  16. def multiheadAttention[S](query: Variable, keys: Variable, values: Variable, tokens: STen, padToken: Long, dropout: Double, trainDropout: Boolean, wQuery: Variable, wKeys: Variable, wValues: Variable, wOutput: Variable, numHeads: Int, linearized: Boolean)(implicit arg0: Sc[S]): Variable

    Multi-head scaled dot product attention

    Multi-head scaled dot product attention

    (batch,query) locations where tokens(batch,query) == pad are ignored

    query

    batch x num queries x dq

    tokens

    batch x num queries , type long

    wQuery

    dq x hidden

    wKeys

    dk x hidden

    wValues

    dv x hidden

    wOutput

    hidden x po

    numHeads

    number of output heads, must be divisible by hidden

    returns

    batch x num queries x po

  17. final def ne(arg0: AnyRef): Boolean
    Definition Classes
    AnyRef
  18. final def notify(): Unit
    Definition Classes
    AnyRef
    Annotations
    @native()
  19. final def notifyAll(): Unit
    Definition Classes
    AnyRef
    Annotations
    @native()
  20. def scaledDotProductAttention[S](query: Variable, keys: Variable, values: Variable, tokens: STen, padToken: Long, dropout: Double, trainDropout: Boolean)(implicit arg0: Sc[S]): Variable

    Scaled dot product attention

    Scaled dot product attention

    (batch,query) locations where tokens(batch,query) == pad are ignored

    query

    batch x num queries x key dim

    tokens

    batch x num queries , type long

    returns

    batch x num queries x value dim

  21. def sequenceMask[S](tokens: STen, maskable: Variable, pad: Long, fill: Double)(implicit arg0: Sc[S]): Variable

    tokens

    batch x seq , type long

    maskable

    batch x seq x ???

    returns

    batch x seq x ??? where (seq,batch,:) is set to fill if tokens(seq,batch)== maskedToken

  22. final def synchronized[T0](arg0: => T0): T0
    Definition Classes
    AnyRef
  23. def toString(): String
    Definition Classes
    AnyRef → Any
  24. implicit val trainingMode: TrainingMode[MultiheadAttention]
  25. final def wait(): Unit
    Definition Classes
    AnyRef
    Annotations
    @throws(classOf[java.lang.InterruptedException])
  26. final def wait(arg0: Long, arg1: Int): Unit
    Definition Classes
    AnyRef
    Annotations
    @throws(classOf[java.lang.InterruptedException])
  27. final def wait(arg0: Long): Unit
    Definition Classes
    AnyRef
    Annotations
    @throws(classOf[java.lang.InterruptedException]) @native()
  28. case object WeightsK extends LeafTag with Product with Serializable
  29. case object WeightsO extends LeafTag with Product with Serializable
  30. case object WeightsQ extends LeafTag with Product with Serializable
  31. case object WeightsV extends LeafTag with Product with Serializable

Inherited from Serializable

Inherited from AnyRef

Inherited from Any

Ungrouped