Masked Language Model Input of (embedding, positions) Embedding of size (batch, num tokens, embedding dim) Positions of size (batch, max num tokens) long tensor indicating which positions to make predictions on Output (batch, len(Positions), vocabulary size)
- Companion:
- object
trait Serializable
trait Product
trait Equals
class Object
trait Matchable
class Any
Value members
Inherited methods
Computes the gradient of loss with respect to the parameters.
Computes the gradient of loss with respect to the parameters.
- Inherited from:
- GenericModule
Returns the total number of optimizable parameters.
Returns the total number of optimizable parameters.
- Inherited from:
- GenericModule
Returns the state variables which need gradient computation.
Returns the state variables which need gradient computation.
- Inherited from:
- GenericModule