Packages

p

lamp.data

bytesegmentencoding

package bytesegmentencoding

Greedy contraction of consecutive n-grams

Linear Supertypes
AnyRef, Any
Ordering
  1. Alphabetic
  2. By Inheritance
Inherited
  1. bytesegmentencoding
  2. AnyRef
  3. Any
  1. Hide All
  2. Show All
Visibility
  1. Public
  2. Protected

Type Members

  1. case class ByteSegmentCodec(trained: Vector[(Vector[Byte], Char)], unknownToken: Char, unknownByte: Byte) extends Codec with Product with Serializable
  2. case class ByteSegmentCodecFactory(vocabularyMin: Char, vocabularyMax: Char, maxMergedSegmentLength: Int, unknownToken: Char, unknownByte: Byte) extends CodecFactory[ByteSegmentCodec] with Product with Serializable

Value Members

  1. def decode(encoded: Array[Char], encoding: Vector[(Vector[Byte], Char)], unknown: Byte): Array[Byte]
  2. def encode(corpus: Array[Byte], encoding: Vector[(Vector[Byte], Char)], unknownToken: Char): Array[Char]
  3. def readEncodingFromFile(file: File): ByteSegmentEncoding
  4. def saveEncodingToFile(file: File, encoding: Vector[(Vector[Byte], Char)], unknownToken: Char, unknownByte: Byte): Unit
  5. def train(corpus: Array[Byte], vocabularyMin: Char, vocabularyMax: Char, maxMergedSegmentLength: Int): Vector[(Vector[Byte], Char)]

    Trains BPE encoding

    Trains BPE encoding

    Char here is used as unsigned 16 bit integer

Inherited from AnyRef

Inherited from Any

Ungrouped