lamp.data.languagemodel

Data loader and inference utilities for the language model module in lamp.nn.langaugemodel

Attributes

Members list

Value members

Concrete methods

def autoregressiveInference(model: LanguageModelModule, modelBlockSize: Int, prefix: Array[Char], length: Int, temperature: Double)(scope: Scope): IO[Array[Char]]

Recursive single next token inference of LanguageModelModule

Recursive single next token inference of LanguageModelModule

Value parameters

length

Length of inference. Each inferred token is appended to the prefix and fed back to the model after truncating to modelBlockSize from the right.

modelBlockSize

also known as context length or maximum length of model

prefix

The inference starts from this prefix sequence

temperature

Sampling temperature. 1.0 means no change from model output, <1 less random, >1 more random.

Attributes

def autoregressiveMinibatchesFromCorpus(minibatchSize: Int, numBatches: Int, corpus: STen, blockLength: Int, createMaxLength: Boolean): BatchStream[(LossInput, STen), Int, Unit]

Creates random minibatches of fixed size from an in memory corpus. The attention mask is set up for autoregressive (left-to-right / causal) attention.

Creates random minibatches of fixed size from an in memory corpus. The attention mask is set up for autoregressive (left-to-right / causal) attention.

Value parameters

blockLength

Length of sequences, also known as context length

corpus

Tokens of corpus 1D int32 tensor

minibatchSize

Number of sequences in the minibatch

numBatches

Number of minibatches to generate

Attributes