Attributes
- Companion
- class
- Graph
-
- Supertypes
-
trait Producttrait Mirrorclass Objecttrait Matchableclass Any
- Self type
-
MultiheadAttention.type
Members list
Type members
Classlikes
Inherited types
The names of the product elements
The names of the product elements
Attributes
- Inherited from:
- Mirror
The name of the type
The name of the type
Attributes
- Inherited from:
- Mirror
Value members
Concrete methods
Linearized dot product attention https://arxiv.org/pdf/2006.16236.pdf
Linearized dot product attention https://arxiv.org/pdf/2006.16236.pdf
replaces exp(a dot b) with f(a) dot f(b) where f is any elementwise function, in the paper f(x) = elu(x)+1 here f(x) = swish1(x)+1 due to this decomposition a more efficient configuration of the chained matrix multiplication may be used: (Q Kt) V = Q (Kt V)
applies masking according to maskedSoftmax
Value parameters
- key
-
batch x num k-v pairs x key dim
- maxLength
-
batch x num queries OR batch , type long
- query
-
batch x num queries x key dim
- value
-
batch x num k-v pairs x value dim
Attributes
- Returns
-
batch x num queries x value dim
Value parameters
- input
-
batch x seq x ???
- maxLength
-
batch x seq OR batch , long
Attributes
- Returns
-
batch x seq x ???
Multi-head scaled dot product attention
Multi-head scaled dot product attention
See chapter 11.5 in d2l v1.0.0-beta0
Attention masking is implemented similarly to chapter 11.3.2.1 in d2l.ai v1.0.0-beta0. It supports unmasked attention, attention on variable length input, and left-to-right attention.
Value parameters
- key
-
batch x num k-v pairs x dk
- linearized
-
if true uses linearized attention. if false used scaled dot product attention
- maxLength
-
batch x num queries OR batch , type long
- numHeads
-
number of output heads, must be divisible by hidden
- query
-
batch x num queries x dq
- value
-
batch x num k-v pairs x dv
- wKeys
-
dk x hidden
- wOutput
-
hidden x po
- wQuery
-
dq x hidden
- wValues
-
dv x hidden
Attributes
- Returns
-
batch x num queries x po
Scaled dot product attention
Scaled dot product attention
if maxLength is 2D: (batch,query,key) locations where maxLength(batch,query) > key are ignored.
if maxLength is 1D: (batch,query,key) locations where maxLength(batch) > query are ignored
See chapter 11.3.3 in d2l v1.0.0-beta0
Value parameters
- key
-
batch x num k-v pairs x key dim
- maxLength
-
batch x num queries OR batch, type long
- query
-
batch x num queries x key dim
- value
-
batch x num k-v pairs x value dim
Attributes
- Returns
-
batch x num queries x value dim
Masks on the 3rd axis of maskable depending on the dimensions of maxLength
Masks on the 3rd axis of maskable depending on the dimensions of maxLength
if maxLength is 2D: (batch,query,key) locations where maxLength(batch,query) > key are ignored.
if maxLength is 1D: (batch,query,key) locations where maxLength(batch) > query are ignored
Attributes
Masks the maskable(i,j,k) cell iff k >= maxLength(i)
Masks the maskable(i,j,k) cell iff k >= maxLength(i)
Value parameters
- fill
-
scalar
- maskable
-
batch x seq x ???
- maxLength
-
batch, type Long
Attributes
Masks the maskable(i,j,k) cell iff k >= maxLength(i,j)
Masks the maskable(i,j,k) cell iff k >= maxLength(i,j)
Masks some elements on the last (3rd) axis of maskable
Value parameters
- fill
-
scalar
- maskable
-
batch x seq x ???
- maxLength
-
batch x seq, type Long