All Classes and Interfaces

Class
Description
 
Determines the class of a given character.
CharacterUtils provides a unified interface to Character-related operations to implement backwards compatible character operations.
A simple IO buffer to use with CharacterUtils.fill(CharacterBuffer, Reader).
A simple class that stores key Strings as char[]'s in a hash table.
A simple class that stores Strings as char[]'s in a hash table.
 
Exception that is thrown when detection fails.
Abstract superclass of all Detectors used for language and encoding detection.
An embedder converts a text string to a tensor
 
 
A class which splits consecutive word character sequences into overlapping character n-grams.
An immutable start index and length pair
 
A hint that can be given to a Detector.
A stemmer implementing the Kstem algorithm by Bob Krovetz.
 
Factory of linguistic processors.
 
This class provides a case normalization operation to be used e.g.
 
This interface provides NFKC normalization of Strings through the underlying linguistics library.
A StringBuilder that allows one to access the array.
Exception class indicating that a fatal error occured during linguistic processing.
Interface providing segmentation, i.e.
 
Includes functionality for determining the langCode from a sample or from the encoding.
Factory of simple linguistic processor implementations.
 
 
A tokenizer which splits on whitespace, normalizes and transforms using the given implementations and stems using the kstem algorithm.
 
Converts all accented characters into their de-accented counterparts followed by their combining diacritics, then strips off the diacritics using a regex.
Immutable named lists of "special tokens" - strings which should override the normal tokenizer semantics and be tokenized into a single token.
An immutable list of special tokens - strings which should override the normal tokenizer semantics and be tokenized into a single token.
An immutable special token
A list of strings which does not allow for duplicate elements.
Interface providing stemming of single words.
 
An enum of the stemming modes which can be requested.
A single token produced by the tokenizer.
Language-sensitive tokenization of a text string.
List of token scripts (e.g.
An enumeration of token types.
Interface for providers of text transformations such as accent removal.