MultiheadAttention

def apply[S : Sc](dQ: Int, dK: Int, dV: Int, hiddenPerHead: Int, out: Int, dropout: Double, numHeads: Int, padToken: Long, tOpt: STenOptions, linearized: Boolean): MultiheadAttention

Linearized dot product attention https://arxiv.org/pdf/2006.16236.pdf

replaces exp(a dot b) with f(a) dot f(b) where f is any elementwise function, in the paper f(x) = elu(x)+1 here f(x) = swish1(x)+1 due to this decomposition a more efficient configuration of the chained matrix multiplication may be used: (Q Kt) V = Q (Kt V)

(batch,query) locations where tokens(batch,query) == pad are ignored

Value parameters:

key: batch x num k-v pairs x key dim
pad: scalar long
query: batch x num queries x key dim
tokens: batch x num queries , type long
value: batch x num k-v pairs x value dim

Returns:

batch x num queries x value dim

Value parameters:

input: batch x seq x ???
mask: scalar long
tokens: batch x seq , long

Returns:

batch x seq x ???

Multi-head scaled dot product attention

(batch,query) locations where tokens(batch,query) == pad are ignored

Value parameters:

key: batch x num k-v pairs x dk
numHeads: number of output heads, must be divisible by hidden
pad: scalar long
query: batch x num queries x dq
tokens: batch x num queries , type long
value: batch x num k-v pairs x dv
wKeys: dk x hidden
wOutput: hidden x po
wQuery: dq x hidden
wValues: dv x hidden

Returns:

batch x num queries x po

Scaled dot product attention

(batch,query) locations where tokens(batch,query) == pad are ignored

Value parameters:

key: batch x num k-v pairs x key dim
pad: scalar long
query: batch x num queries x key dim
tokens: batch x num queries , type long
value: batch x num k-v pairs x value dim

Returns:

batch x num queries x value dim

Value parameters:

maskable: batch x seq x ???
tokens: batch x seq , type long

Returns:

batch x seq x ??? where (seq,batch,:) is set to fill if tokens(seq,batch)== maskedToken

MultiheadAttention

Type members

Classlikes

Inherited types

Value members

Concrete methods

Implicits

Implicits