Input for BERT pretrain module
- Tokens: Long tensor of size (batch, sequence length). Sequence length includes cls and sep tokens. Values are tokens of the input vocabulary and 4 additional control tokens: cls, sep, pad, mask. First token must be cls.
- Segments: Long tensor of size (batch, sequence length). Values are segment tokens.
- Positions: Long tensor of size (batch, mask size (variable)). Values are indices in [0,sequence length) selecting masked sequence positions. They never select positions of cls, sep, pad.
- Companion:
- object
trait Serializable
trait Product
trait Equals
class Object
trait Matchable
class Any