Packages

class BartTokenizer extends Gpt2Tokenizer

Linear Supertypes
Gpt2Tokenizer, BpeTokenizer, AnyRef, Any
Ordering
  1. Alphabetic
  2. By Inheritance
Inherited
  1. BartTokenizer
  2. Gpt2Tokenizer
  3. BpeTokenizer
  4. AnyRef
  5. Any
  1. Hide All
  2. Show All
Visibility
  1. Public
  2. All

Instance Constructors

  1. new BartTokenizer(merges: Map[(String, String), Int], vocab: Map[String, Int], specialTokens: SpecialTokens, padWithSentenceTokens: Boolean = false, addPrefixSpace: Boolean = false)

Value Members

  1. final def !=(arg0: Any): Boolean
    Definition Classes
    AnyRef → Any
  2. final def ##(): Int
    Definition Classes
    AnyRef → Any
  3. final def ==(arg0: Any): Boolean
    Definition Classes
    AnyRef → Any
  4. val addPrefixSpace: Boolean
    Definition Classes
    BpeTokenizer
  5. val appendForPieceId: Option[String]
    Attributes
    protected
    Definition Classes
    BpeTokenizer
  6. final def asInstanceOf[T0]: T0
    Definition Classes
    Any
  7. def bpe(indToken: IndexedToken): Array[TokenPiece]

    Do the BPE algorithm.

    Do the BPE algorithm. Goal is to find the token as the largest words in the known vocabulary. If not possible, the word is split into smaller subwords, until they are known.

    returns

    Array of TokenPieces, corresponding to encoded token

    Attributes
    protected
    Definition Classes
    BpeTokenizer
  8. val bpeRanks: Map[(String, String), Int]
    Attributes
    protected
    Definition Classes
    BpeTokenizer
  9. val cache: Map[String, Array[String]]

    cache for already encoded tokens

    cache for already encoded tokens

    Attributes
    protected
    Definition Classes
    BpeTokenizer
  10. def clone(): AnyRef
    Attributes
    protected[lang]
    Definition Classes
    AnyRef
    Annotations
    @throws( ... ) @native()
  11. def decodeTokens(tokens: Array[Int]): String
    Definition Classes
    Gpt2Tokenizer
  12. val decoderVocab: Map[Int, String]
    Attributes
    protected
    Definition Classes
    Gpt2Tokenizer
  13. def encode(indTokens: Array[IndexedToken]): Array[TokenPiece]
    Definition Classes
    BpeTokenizer
  14. def encode(indToken: IndexedToken): Array[TokenPiece]
    Definition Classes
    BpeTokenizer
  15. final def eq(arg0: AnyRef): Boolean
    Definition Classes
    AnyRef
  16. def equals(arg0: Any): Boolean
    Definition Classes
    AnyRef → Any
  17. def finalize(): Unit
    Attributes
    protected[lang]
    Definition Classes
    AnyRef
    Annotations
    @throws( classOf[java.lang.Throwable] )
  18. def getBpeRanking: ((String, String)) ⇒ Int

    Rankings for the byte pairs.

    Rankings for the byte pairs. Derived from merges.txt

    Attributes
    protected
    Definition Classes
    BpeTokenizer
  19. def getBytePairs(word: Array[String]): Array[(String, String)]

    Create a sequence of byte-pairs of the word

    Create a sequence of byte-pairs of the word

    Attributes
    protected
    Definition Classes
    BpeTokenizer
  20. final def getClass(): Class[_]
    Definition Classes
    AnyRef → Any
    Annotations
    @native()
  21. def getTokenPieces(indToken: IndexedToken, word: Array[String]): Array[TokenPiece]
    Attributes
    protected
    Definition Classes
    BpeTokenizer
  22. def hashCode(): Int
    Definition Classes
    AnyRef → Any
    Annotations
    @native()
  23. final def isInstanceOf[T0]: Boolean
    Definition Classes
    Any
  24. val merges: Map[(String, String), Int]
    Definition Classes
    BpeTokenizer
  25. final def ne(arg0: AnyRef): Boolean
    Definition Classes
    AnyRef
  26. final def notify(): Unit
    Definition Classes
    AnyRef
    Annotations
    @native()
  27. final def notifyAll(): Unit
    Definition Classes
    AnyRef
    Annotations
    @native()
  28. val padWithSentenceTokens: Boolean
    Definition Classes
    BpeTokenizer
  29. def performMerges(wordChars: Array[String], charPairs: Array[(String, String)]): Array[String]
    Attributes
    protected
    Definition Classes
    BpeTokenizer
  30. def preProcessTokenForBpe(token: String): String
    Definition Classes
    Gpt2Tokenizer → BpeTokenizer
  31. val prependForPieceId: Option[String]
    Definition Classes
    Gpt2Tokenizer → BpeTokenizer
  32. val sentencePadding: (String, String)

    Special tokens of the model for processing

    Special tokens of the model for processing

    Definition Classes
    BpeTokenizer
  33. val specialTokens: SpecialTokens
    Definition Classes
    BpeTokenizer
  34. def splitOnSpecialToken(specialToken: SpecialToken, text: String): ListBuffer[String]

    Split the the individual sub texts on special tokens, e.g.

    Split the the individual sub texts on special tokens, e.g. masking etc.

    Attributes
    protected
    Definition Classes
    BpeTokenizer
  35. val splitPattern: Regex
    Definition Classes
    Gpt2Tokenizer
  36. final def synchronized[T0](arg0: ⇒ T0): T0
    Definition Classes
    AnyRef
  37. def toString(): String
    Definition Classes
    AnyRef → Any
  38. def tokenize(sentence: Sentence): Array[IndexedToken]

    Tokenize considering special tokens and split algorithm

    Tokenize considering special tokens and split algorithm

    Definition Classes
    BpeTokenizer
  39. def tokenizeSubText(text: String, indexOffset: Int): Array[IndexedToken]

    Needs to be implemented

    Needs to be implemented

    Definition Classes
    Gpt2Tokenizer → BpeTokenizer
  40. val unicodeToByteMapping: Map[String, Int]
    Attributes
    protected
    Definition Classes
    Gpt2Tokenizer
  41. val vocab: Map[String, Int]
    Definition Classes
    BpeTokenizer
  42. final def wait(): Unit
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  43. final def wait(arg0: Long, arg1: Int): Unit
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  44. final def wait(arg0: Long): Unit
    Definition Classes
    AnyRef
    Annotations
    @throws( ... ) @native()

Inherited from Gpt2Tokenizer

Inherited from BpeTokenizer

Inherited from AnyRef

Inherited from Any

Ungrouped