Index

A B C D E F G H I J K L M N O P Q R S T U V W X Y Z 
All Classes and Interfaces|All Packages|Constant Field Values|Serialized Form

A

ABKHAZIAN - Enum constant in enum class com.yahoo.language.Language
Language tag "ab".
AbstractDetector - Class in com.yahoo.language.detect
 
AbstractDetector() - Constructor for class com.yahoo.language.detect.AbstractDetector
 
accentDrop(String, Language) - Method in interface com.yahoo.language.process.Transformer
Remove accents from input text.
accentDrop(String, Language) - Method in class com.yahoo.language.simple.SimpleTransformer
 
add(char[]) - Method in class com.yahoo.language.simple.kstem.CharArraySet
Add this char[] directly to the set.
add(int, String) - Method in class com.yahoo.language.process.StemList
 
add(CharSequence) - Method in class com.yahoo.language.simple.kstem.CharArraySet
Add this CharSequence into the set
add(Object) - Method in class com.yahoo.language.simple.kstem.CharArraySet
 
add(String) - Method in class com.yahoo.language.simple.kstem.CharArraySet
Add this String into the set
addComponent(Token) - Method in class com.yahoo.language.simple.SimpleToken
 
AFAR - Enum constant in enum class com.yahoo.language.Language
Language tag "aa".
AFRIKAANS - Enum constant in enum class com.yahoo.language.Language
Language tag "af".
ALBANIAN - Enum constant in enum class com.yahoo.language.Language
Language tag "sq".
ALL - Enum constant in enum class com.yahoo.language.process.StemMode
 
ALPHABETIC - Enum constant in enum class com.yahoo.language.process.TokenType
 
AMHARIC - Enum constant in enum class com.yahoo.language.Language
Language tag "am".
append(char) - Method in class com.yahoo.language.simple.kstem.OpenStringBuilder
 
append(CharSequence) - Method in class com.yahoo.language.simple.kstem.OpenStringBuilder
 
append(CharSequence, int, int) - Method in class com.yahoo.language.simple.kstem.OpenStringBuilder
 
ARABIC - Enum constant in enum class com.yahoo.language.Language
Language tag "ar".
ARABIC - Enum constant in enum class com.yahoo.language.process.TokenScript
 
ARMENIAN - Enum constant in enum class com.yahoo.language.Language
Language tag "hy".
ARMENIAN - Enum constant in enum class com.yahoo.language.process.TokenScript
 
ASCII - Enum constant in enum class com.yahoo.language.process.TokenScript
 
asMap() - Method in interface com.yahoo.language.process.Embedder
Returns this embedder instance as a map with the default embedder name
asMap() - Method in class com.yahoo.language.process.SpecialTokens
Returns the tokens of this as an immutable map from token to replacement.
asMap(String) - Method in interface com.yahoo.language.process.Embedder
Returns this embedder instance as a map with the given name
ASSAMESE - Enum constant in enum class com.yahoo.language.Language
Language tag "as".
AYMARA - Enum constant in enum class com.yahoo.language.Language
Language tag "ay".
AZERBAIJANI - Enum constant in enum class com.yahoo.language.Language
Language tag "az".

B

BASHKIR - Enum constant in enum class com.yahoo.language.Language
Language tag "ba".
BASQUE - Enum constant in enum class com.yahoo.language.Language
Language tag "eu".
BENGALI - Enum constant in enum class com.yahoo.language.Language
Language tag "bn".
BENGALI - Enum constant in enum class com.yahoo.language.process.TokenScript
 
BEST - Enum constant in enum class com.yahoo.language.process.StemMode
 
BHUTANI - Enum constant in enum class com.yahoo.language.Language
Language tag "dz".
BIHARI - Enum constant in enum class com.yahoo.language.Language
Language tag "bh".
BISLAMA - Enum constant in enum class com.yahoo.language.Language
Language tag "bi".
BRAILLE - Enum constant in enum class com.yahoo.language.process.TokenScript
 
BRETON - Enum constant in enum class com.yahoo.language.Language
Language tag "br".
buf - Variable in class com.yahoo.language.simple.kstem.OpenStringBuilder
 
BUGINESE - Enum constant in enum class com.yahoo.language.Language
Language tag "bug".
BUGINESE - Enum constant in enum class com.yahoo.language.process.TokenScript
 
BUHID - Enum constant in enum class com.yahoo.language.process.TokenScript
 
BULGARIAN - Enum constant in enum class com.yahoo.language.Language
Language tag "bg".
BURMESE - Enum constant in enum class com.yahoo.language.Language
Language tag "my".
BYELORUSSIAN - Enum constant in enum class com.yahoo.language.Language
Language tag "be".

C

CAMBODIAN - Enum constant in enum class com.yahoo.language.Language
Language tag "km".
CANADIAN - Enum constant in enum class com.yahoo.language.process.TokenScript
 
capacity() - Method in class com.yahoo.language.simple.kstem.OpenStringBuilder
 
CATALAN - Enum constant in enum class com.yahoo.language.Language
Language tag "ca".
CHARACTER_CLASSES - Enum constant in enum class com.yahoo.language.Linguistics.Component
 
CharacterClasses - Class in com.yahoo.language.process
Determines the class of a given character.
CharacterClasses() - Constructor for class com.yahoo.language.process.CharacterClasses
 
CharacterUtils - Class in com.yahoo.language.simple.kstem
CharacterUtils provides a unified interface to Character-related operations to implement backwards compatible character operations.
CharacterUtils() - Constructor for class com.yahoo.language.simple.kstem.CharacterUtils
 
CharacterUtils.CharacterBuffer - Class in com.yahoo.language.simple.kstem
A simple IO buffer to use with CharacterUtils.fill(CharacterBuffer, Reader).
CharArrayMap<V> - Class in com.yahoo.language.simple.kstem
A simple class that stores key Strings as char[]'s in a hash table.
CharArrayMap(int, boolean) - Constructor for class com.yahoo.language.simple.kstem.CharArrayMap
Create map with enough capacity to hold startSize terms
CharArrayMap(Map<?, ? extends V>, boolean) - Constructor for class com.yahoo.language.simple.kstem.CharArrayMap
Creates a map from the mappings in another map.
CharArrayMap.EntryIterator - Class in com.yahoo.language.simple.kstem
public iterator class so efficient methods are exposed to users
CharArrayMap.EntrySet - Class in com.yahoo.language.simple.kstem
public EntrySet class so efficient methods are exposed to users
CharArraySet - Class in com.yahoo.language.simple.kstem
A simple class that stores Strings as char[]'s in a hash table.
CharArraySet(int, boolean) - Constructor for class com.yahoo.language.simple.kstem.CharArraySet
Create set with enough capacity to hold startSize terms
CharArraySet(Collection<?>, boolean) - Constructor for class com.yahoo.language.simple.kstem.CharArraySet
Creates a set from a Collection of objects.
charAt(int) - Method in class com.yahoo.language.simple.kstem.OpenStringBuilder
 
CHEROKEE - Enum constant in enum class com.yahoo.language.Language
Language tag "chr".
CHEROKEE - Enum constant in enum class com.yahoo.language.process.TokenScript
 
CHINESE - Enum constant in enum class com.yahoo.language.process.TokenScript
 
CHINESE_SIMPLIFIED - Enum constant in enum class com.yahoo.language.Language
Language tag "zh-hans".
CHINESE_TRADITIONAL - Enum constant in enum class com.yahoo.language.Language
Language tag "zh-hant".
clear() - Method in class com.yahoo.language.simple.kstem.CharArrayMap
Clears all entries in this map.
clear() - Method in class com.yahoo.language.simple.kstem.CharArrayMap.EntrySet
 
clear() - Method in class com.yahoo.language.simple.kstem.CharArraySet
Clears all entries in this set.
codePointAt(char[], int, int) - Method in class com.yahoo.language.simple.kstem.CharacterUtils
Returns the code point at the given index of the char array where only elements with index less than the limit are used.
codePointAt(CharSequence, int) - Method in class com.yahoo.language.simple.kstem.CharacterUtils
Returns the code point at the given index of the CharSequence.
codePointCount(CharSequence) - Method in class com.yahoo.language.simple.kstem.CharacterUtils
Return the number of characters in seq.
com.yahoo.language - package com.yahoo.language
 
com.yahoo.language.detect - package com.yahoo.language.detect
 
com.yahoo.language.opennlp - package com.yahoo.language.opennlp
 
com.yahoo.language.process - package com.yahoo.language.process
 
com.yahoo.language.simple - package com.yahoo.language.simple
 
com.yahoo.language.simple.kstem - package com.yahoo.language.simple.kstem
 
COMMON - Enum constant in enum class com.yahoo.language.process.TokenScript
 
compareTo(SpecialTokens.Token) - Method in class com.yahoo.language.process.SpecialTokens.Token
 
contains(char[], int, int) - Method in class com.yahoo.language.simple.kstem.CharArraySet
true if the len chars of text starting at off are in the set
contains(CharSequence) - Method in class com.yahoo.language.simple.kstem.CharArraySet
true if the CharSequence is in the set
contains(Object) - Method in class com.yahoo.language.simple.kstem.CharArrayMap.EntrySet
 
contains(Object) - Method in class com.yahoo.language.simple.kstem.CharArraySet
 
containsKey(char[], int, int) - Method in class com.yahoo.language.simple.kstem.CharArrayMap
true if the len chars of text starting at off are in the CharArrayMap.keySet()
containsKey(CharSequence) - Method in class com.yahoo.language.simple.kstem.CharArrayMap
true if the CharSequence is in the CharArrayMap.keySet()
containsKey(Object) - Method in class com.yahoo.language.simple.kstem.CharArrayMap
 
Context(String) - Constructor for class com.yahoo.language.process.Embedder.Context
 
COPTIC - Enum constant in enum class com.yahoo.language.Language
Language tag "cop".
COPTIC - Enum constant in enum class com.yahoo.language.process.TokenScript
 
copy(Map<?, ? extends V>) - Static method in class com.yahoo.language.simple.kstem.CharArrayMap
Returns a copy of the given map as a CharArrayMap.
copy(Set<?>) - Static method in class com.yahoo.language.simple.kstem.CharArraySet
Returns a copy of the given set as a CharArraySet.
CORSICAN - Enum constant in enum class com.yahoo.language.Language
Language tag "co".
CROATIAN - Enum constant in enum class com.yahoo.language.Language
Language tag "hr".
currentValue() - Method in class com.yahoo.language.simple.kstem.CharArrayMap.EntryIterator
returns the value associated with the last key returned
CYPRIOT - Enum constant in enum class com.yahoo.language.process.TokenScript
 
CYRILLIC - Enum constant in enum class com.yahoo.language.process.TokenScript
 
CZECH - Enum constant in enum class com.yahoo.language.Language
Language tag "cs".

D

DANISH - Enum constant in enum class com.yahoo.language.Language
Language tag "da".
DEFAULT - Enum constant in enum class com.yahoo.language.process.StemMode
 
defaultEmbedderId - Static variable in interface com.yahoo.language.process.Embedder
Name of embedder when none is explicity given
DefaultLanguageDetectorContextGenerator - Class in com.yahoo.language.opennlp
Avoids using the unnecessarily slow NGramCharModel.
DefaultLanguageDetectorContextGenerator(int, int, CharSequenceNormalizer...) - Constructor for class com.yahoo.language.opennlp.DefaultLanguageDetectorContextGenerator
 
DESERET - Enum constant in enum class com.yahoo.language.process.TokenScript
 
detect(byte[], int, int, Hint) - Method in interface com.yahoo.language.detect.Detector
Detects language and encoding of the supplied byte array, possibly using a language/encoding hint.
detect(byte[], int, int, Hint) - Method in class com.yahoo.language.simple.SimpleDetector
 
detect(String, Hint) - Method in class com.yahoo.language.detect.AbstractDetector
 
detect(String, Hint) - Method in interface com.yahoo.language.detect.Detector
Detects language of the supplied String, possibly using a language hint.
detect(String, Hint) - Method in class com.yahoo.language.simple.SimpleDetector
 
detect(ByteBuffer, Hint) - Method in class com.yahoo.language.detect.AbstractDetector
 
detect(ByteBuffer, Hint) - Method in interface com.yahoo.language.detect.Detector
Detects language and encoding of the supplied ByteBuffer, possibly using a language/encoding hint.
detect(ByteBuffer, Hint) - Method in class com.yahoo.language.simple.SimpleDetector
 
Detection - Class in com.yahoo.language.detect
 
Detection(Language, String, boolean) - Constructor for class com.yahoo.language.detect.Detection
 
DetectionException - Exception in com.yahoo.language.detect
Exception that is thrown when detection fails.
DetectionException(String) - Constructor for exception com.yahoo.language.detect.DetectionException
 
Detector - Interface in com.yahoo.language.detect
Abstract superclass of all Detectors used for language and encoding detection.
DETECTOR - Enum constant in enum class com.yahoo.language.Linguistics.Component
 
DEVANAGARI - Enum constant in enum class com.yahoo.language.process.TokenScript
 
DIVEHI - Enum constant in enum class com.yahoo.language.Language
Language tag "div".
DUTCH - Enum constant in enum class com.yahoo.language.Language
Language tag "nl".

E

embed(String, Embedder.Context) - Method in interface com.yahoo.language.process.Embedder
Converts text into a list of token id's (a vector embedding)
embed(String, Embedder.Context) - Method in class com.yahoo.language.process.Embedder.FailingEmbedder
 
embed(String, Embedder.Context, TensorType) - Method in interface com.yahoo.language.process.Embedder
Converts text into tokens in a tensor.
embed(String, Embedder.Context, TensorType) - Method in class com.yahoo.language.process.Embedder.FailingEmbedder
 
Embedder - Interface in com.yahoo.language.process
An embedder converts a text string to a tensor
Embedder.Context - Class in com.yahoo.language.process
 
Embedder.FailingEmbedder - Class in com.yahoo.language.process
 
empty() - Static method in class com.yahoo.language.process.SpecialTokens
 
EMPTY_SET - Static variable in class com.yahoo.language.simple.kstem.CharArraySet
 
emptyMap() - Static method in class com.yahoo.language.simple.kstem.CharArrayMap
Returns an empty, unmodifiable map.
ENGLISH - Enum constant in enum class com.yahoo.language.Language
Language tag "en".
entrySet() - Method in class com.yahoo.language.simple.kstem.CharArrayMap
 
equals(Linguistics) - Method in interface com.yahoo.language.Linguistics
Check if another instance is equivalent to this one
equals(Linguistics) - Method in class com.yahoo.language.opennlp.OpenNlpLinguistics
 
equals(Linguistics) - Method in class com.yahoo.language.simple.SimpleLinguistics
 
equals(Object) - Method in class com.yahoo.language.process.GramSplitter.Gram
 
equals(Object) - Method in class com.yahoo.language.process.SpecialTokens.Token
 
equals(Object) - Method in class com.yahoo.language.simple.SimpleToken
 
ESPERANTO - Enum constant in enum class com.yahoo.language.Language
Language tag "eo".
ESTONIAN - Enum constant in enum class com.yahoo.language.Language
Language tag "et".
ETHIOPIC - Enum constant in enum class com.yahoo.language.process.TokenScript
 
extractFrom(GramSplitter.UnicodeString) - Method in class com.yahoo.language.process.GramSplitter.Gram
Returns this gram as a string from the input string
extractFrom(String) - Method in class com.yahoo.language.process.GramSplitter.Gram
Returns this gram as a string from the input string

F

FailingEmbedder() - Constructor for class com.yahoo.language.process.Embedder.FailingEmbedder
 
FailingEmbedder(String) - Constructor for class com.yahoo.language.process.Embedder.FailingEmbedder
 
FAROESE - Enum constant in enum class com.yahoo.language.Language
Language tag "fo".
FIJI - Enum constant in enum class com.yahoo.language.Language
Language tag "fj".
fill(CharacterUtils.CharacterBuffer, Reader) - Method in class com.yahoo.language.simple.kstem.CharacterUtils
Convenience method which calls fill(buffer, reader, buffer.buffer.length).
fill(CharacterUtils.CharacterBuffer, Reader, int) - Method in class com.yahoo.language.simple.kstem.CharacterUtils
Fills the CharacterUtils.CharacterBuffer with characters read from the given reader Reader.
FINNISH - Enum constant in enum class com.yahoo.language.Language
Language tag "fi".
flush() - Method in class com.yahoo.language.simple.kstem.OpenStringBuilder
 
FRENCH - Enum constant in enum class com.yahoo.language.Language
Language tag "fr".
FRISIAN - Enum constant in enum class com.yahoo.language.Language
Language tag "fy".
from(String) - Static method in enum class com.yahoo.language.Language
Returns the Language from a language tag
fromEncoding(String) - Static method in enum class com.yahoo.language.Language
Returns the language from an encoding, or Language.UNKNOWN if it cannot be determined.
fromLanguageTag(String) - Static method in enum class com.yahoo.language.Language
Convenience method for calling fromLocale(LocaleFactory.fromLanguageTag(languageTag)).
fromLanguageTag(String) - Static method in class com.yahoo.language.LocaleFactory
Implements a simple parser for RFC5646 language tags.
fromLocale(Locale) - Static method in enum class com.yahoo.language.Language
Returns the Language whose Language.languageCode() is equal to locale.getLanguage(), with the following additions:

G

GALICIAN - Enum constant in enum class com.yahoo.language.Language
Language tag "gl".
GEORGIAN - Enum constant in enum class com.yahoo.language.Language
Language tag "ka".
GEORGIAN - Enum constant in enum class com.yahoo.language.process.TokenScript
 
GERMAN - Enum constant in enum class com.yahoo.language.Language
Language tag "de".
get(char[], int, int) - Method in class com.yahoo.language.simple.kstem.CharArrayMap
returns the value of the mapping of len chars of text starting at off
get(int) - Method in class com.yahoo.language.process.StemList
 
get(CharSequence) - Method in class com.yahoo.language.simple.kstem.CharArrayMap
returns the value of the mapping of the chars inside this CharSequence
get(Object) - Method in class com.yahoo.language.simple.kstem.CharArrayMap
 
getArray() - Method in class com.yahoo.language.simple.kstem.OpenStringBuilder
 
getBuffer() - Method in class com.yahoo.language.simple.kstem.CharacterUtils.CharacterBuffer
Returns the internal buffer
getCharacterClasses() - Method in interface com.yahoo.language.Linguistics
Returns a thread-unsafe character classes instance.
getCharacterClasses() - Method in class com.yahoo.language.simple.SimpleLinguistics
 
getCodePointCount() - Method in class com.yahoo.language.process.GramSplitter.Gram
 
getComponent(int) - Method in interface com.yahoo.language.process.Token
Returns a component token of this
getComponent(int) - Method in class com.yahoo.language.simple.SimpleToken
 
getContext(CharSequence) - Method in class com.yahoo.language.opennlp.DefaultLanguageDetectorContextGenerator
 
getContextGenerator() - Method in class com.yahoo.language.opennlp.LanguageDetectorFactory
 
getCountry() - Method in class com.yahoo.language.detect.Hint
 
getDestination() - Method in class com.yahoo.language.process.Embedder.Context
Returns the name of the recipient of this tensor.
getDetector() - Method in interface com.yahoo.language.Linguistics
Returns a thread-unsafe detector.
getDetector() - Method in class com.yahoo.language.opennlp.OpenNlpLinguistics
 
getDetector() - Method in class com.yahoo.language.simple.SimpleLinguistics
 
getEncoding() - Method in class com.yahoo.language.detect.Detection
 
getEncodingName() - Method in class com.yahoo.language.detect.Detection
 
getGramSplitter() - Method in interface com.yahoo.language.Linguistics
Returns a thread-unsafe gram splitter.
getGramSplitter() - Method in class com.yahoo.language.simple.SimpleLinguistics
 
getInstance() - Static method in class com.yahoo.language.opennlp.UrlCharSequenceNormalizer
 
getInstance() - Static method in class com.yahoo.language.simple.kstem.CharacterUtils
Returns a CharacterUtils implementation.
getLanguage() - Method in class com.yahoo.language.detect.Detection
 
getLanguage() - Method in class com.yahoo.language.process.Embedder.Context
Returns the language of the text, or UNKNOWN (default) to use a language independent embedding
getLength() - Method in class com.yahoo.language.simple.kstem.CharacterUtils.CharacterBuffer
Return the length of the data in the internal buffer starting at CharacterUtils.CharacterBuffer.getOffset()
getMarket() - Method in class com.yahoo.language.detect.Hint
 
getNormalizer() - Method in interface com.yahoo.language.Linguistics
Returns a thread-unsafe normalizer.
getNormalizer() - Method in class com.yahoo.language.simple.SimpleLinguistics
 
getNumComponents() - Method in interface com.yahoo.language.process.Token
Returns the number of components, if this token is a compound word (e.g.
getNumComponents() - Method in class com.yahoo.language.simple.SimpleToken
 
getNumStems() - Method in interface com.yahoo.language.process.Token
Returns the number of stem forms available for this token.
getNumStems() - Method in class com.yahoo.language.simple.SimpleToken
 
getOffset() - Method in interface com.yahoo.language.process.Token
Returns the offset position of this token
getOffset() - Method in class com.yahoo.language.simple.kstem.CharacterUtils.CharacterBuffer
Returns the data offset in the internal buffer.
getOffset() - Method in class com.yahoo.language.simple.SimpleToken
 
getOrig() - Method in interface com.yahoo.language.process.Token
Returns the original form of this token
getOrig() - Method in class com.yahoo.language.simple.SimpleToken
 
getScript() - Method in interface com.yahoo.language.process.Token
Returns the script of this token
getScript() - Method in class com.yahoo.language.simple.SimpleToken
 
getSegmenter() - Method in interface com.yahoo.language.Linguistics
Returns a thread-unsafe segmenter.
getSegmenter() - Method in class com.yahoo.language.simple.SimpleLinguistics
 
getSpecialTokens(String) - Method in class com.yahoo.language.process.SpecialTokenRegistry
Returns the list of special tokens for a given name.
getStart() - Method in class com.yahoo.language.process.GramSplitter.Gram
 
getStem(int) - Method in interface com.yahoo.language.process.Token
Returns the stem at position i
getStem(int) - Method in class com.yahoo.language.simple.SimpleToken
 
getStemmer() - Method in interface com.yahoo.language.Linguistics
Returns a thread-unsafe stemmer or lemmatizer.
getStemmer() - Method in class com.yahoo.language.simple.SimpleLinguistics
 
getTokenizer() - Method in interface com.yahoo.language.Linguistics
Returns a thread-unsafe tokenizer.
getTokenizer() - Method in class com.yahoo.language.opennlp.OpenNlpLinguistics
 
getTokenizer() - Method in class com.yahoo.language.simple.SimpleLinguistics
 
getTokenString() - Method in interface com.yahoo.language.process.Token
Returns the token string in a form suitable for indexing: The most lowercased variant of the most processed token form available, If called on a compound token this returns a lowercased form of the entire word.
getTokenString() - Method in class com.yahoo.language.simple.SimpleToken
 
getTransformer() - Method in interface com.yahoo.language.Linguistics
Returns a thread-unsafe transformer.
getTransformer() - Method in class com.yahoo.language.simple.SimpleLinguistics
 
getType() - Method in interface com.yahoo.language.process.Token
Returns the type of this token - word, space or punctuation etc.
getType() - Method in class com.yahoo.language.simple.SimpleToken
 
getValue() - Method in enum class com.yahoo.language.process.TokenType
Returns an int code for this type
GLAGOLITIC - Enum constant in enum class com.yahoo.language.process.TokenScript
 
GOTHIC - Enum constant in enum class com.yahoo.language.Language
Language tag "got".
GOTHIC - Enum constant in enum class com.yahoo.language.process.TokenScript
 
Gram(int, int) - Constructor for class com.yahoo.language.process.GramSplitter.Gram
 
GRAM_SPLITTER - Enum constant in enum class com.yahoo.language.Linguistics.Component
 
GramSplitter - Class in com.yahoo.language.process
A class which splits consecutive word character sequences into overlapping character n-grams.
GramSplitter(CharacterClasses) - Constructor for class com.yahoo.language.process.GramSplitter
 
GramSplitter.Gram - Class in com.yahoo.language.process
An immutable start index and length pair
GramSplitter.GramSplitterIterator - Class in com.yahoo.language.process
 
GramSplitterIterator(String, int, CharacterClasses) - Constructor for class com.yahoo.language.process.GramSplitter.GramSplitterIterator
 
GREEK - Enum constant in enum class com.yahoo.language.Language
Language tag "el".
GREEK - Enum constant in enum class com.yahoo.language.process.TokenScript
 
GREENLANDIC - Enum constant in enum class com.yahoo.language.Language
Language tag "kl".
GUARANI - Enum constant in enum class com.yahoo.language.Language
Language tag "gn".
guessEncoding(byte[]) - Method in class com.yahoo.language.simple.SimpleDetector
 
guessEncoding(byte[], int, int) - Method in class com.yahoo.language.simple.SimpleDetector
 
guessLanguage(byte[], int, int) - Method in class com.yahoo.language.simple.SimpleDetector
 
guessLanguage(String) - Method in class com.yahoo.language.simple.SimpleDetector
 
GUJARATI - Enum constant in enum class com.yahoo.language.Language
Language tag "gu".
GUJARATI - Enum constant in enum class com.yahoo.language.process.TokenScript
 
GURMUKHI - Enum constant in enum class com.yahoo.language.process.TokenScript
 

H

HAN - Enum constant in enum class com.yahoo.language.process.TokenScript
 
HANGUL - Enum constant in enum class com.yahoo.language.process.TokenScript
 
HANUNOO - Enum constant in enum class com.yahoo.language.process.TokenScript
 
hashCode() - Method in class com.yahoo.language.process.GramSplitter.Gram
 
hashCode() - Method in class com.yahoo.language.process.SpecialTokens.Token
 
hashCode() - Method in class com.yahoo.language.simple.SimpleToken
 
hasNext() - Method in class com.yahoo.language.process.GramSplitter.GramSplitterIterator
 
hasNext() - Method in class com.yahoo.language.simple.kstem.CharArrayMap.EntryIterator
 
HAUSA - Enum constant in enum class com.yahoo.language.Language
Language tag "ha".
HEBREW - Enum constant in enum class com.yahoo.language.Language
Language tag "he".
HEBREW - Enum constant in enum class com.yahoo.language.process.TokenScript
 
HINDI - Enum constant in enum class com.yahoo.language.Language
Language tag "hi".
Hint - Class in com.yahoo.language.detect
A hint that can be given to a Detector.
HIRAGANA - Enum constant in enum class com.yahoo.language.process.TokenScript
 
HUNGARIAN - Enum constant in enum class com.yahoo.language.Language
Language tag "hu".

I

ICELANDIC - Enum constant in enum class com.yahoo.language.Language
Language tag "is".
INDONESIAN - Enum constant in enum class com.yahoo.language.Language
Language tag "id".
INHERITED - Enum constant in enum class com.yahoo.language.process.TokenScript
 
INTERLINGUA - Enum constant in enum class com.yahoo.language.Language
Language tag "ia".
INTERLINGUE - Enum constant in enum class com.yahoo.language.Language
Language tag "ie".
INUKTITUT - Enum constant in enum class com.yahoo.language.Language
Language tag "iu".
INUPIAK - Enum constant in enum class com.yahoo.language.Language
Language tag "ik".
IRISH - Enum constant in enum class com.yahoo.language.Language
Language tag "ga".
isCjk() - Method in enum class com.yahoo.language.Language
Returns whether this is a "cjk" language.
isDigit(int) - Method in class com.yahoo.language.process.CharacterClasses
Returns true for code points which should be considered digits - same as java.lang.Character.isDigit
isIndexable() - Method in interface com.yahoo.language.process.Token
Whether this token should be indexed
isIndexable() - Method in enum class com.yahoo.language.process.TokenType
Marker for whether this type of token can be indexed for search.
isIndexable() - Method in class com.yahoo.language.simple.SimpleToken
 
isLatin(int) - Method in class com.yahoo.language.process.CharacterClasses
Returns true if this is a latin character
isLatinDigit(int) - Method in class com.yahoo.language.process.CharacterClasses
Returns true if this is a latin digit (other digits are not consistently parsed into numbers by Java)
isLetter(int) - Method in class com.yahoo.language.process.CharacterClasses
Returns true for code points which are letters in unicode 3 or 4, plus some additional characters which are useful to view as letters even though not defined as such in unicode.
isLetterOrDigit(int) - Method in class com.yahoo.language.process.CharacterClasses
Convenience, returns isLetter(c) || isDigit(c)
isLocal() - Method in class com.yahoo.language.detect.Detection
 
isSpecialToken() - Method in interface com.yahoo.language.process.Token
Returns whether this is an instance of a declared special token (e.g.
isSpecialToken() - Method in class com.yahoo.language.simple.SimpleToken
 
ITALIAN - Enum constant in enum class com.yahoo.language.Language
Language tag "it".
iterator() - Method in class com.yahoo.language.simple.kstem.CharArrayMap.EntrySet
 
iterator() - Method in class com.yahoo.language.simple.kstem.CharArraySet
Returns an Iterator for char[] instances in this set.

J

JAPANESE - Enum constant in enum class com.yahoo.language.Language
Language tag "ja".
JAVANESE - Enum constant in enum class com.yahoo.language.Language
Language tag "jw".

K

KANNADA - Enum constant in enum class com.yahoo.language.Language
Language tag "kn".
KANNADA - Enum constant in enum class com.yahoo.language.process.TokenScript
 
KASHMIRI - Enum constant in enum class com.yahoo.language.Language
Language tag "ks".
KATAKANA - Enum constant in enum class com.yahoo.language.process.TokenScript
 
KAZAKH - Enum constant in enum class com.yahoo.language.Language
Language tag "kk".
keySet() - Method in class com.yahoo.language.simple.kstem.CharArrayMap
Returns an CharArraySet view on the map's keys.
KHAROSHTHI - Enum constant in enum class com.yahoo.language.process.TokenScript
 
KHMER - Enum constant in enum class com.yahoo.language.process.TokenScript
 
KINYARWANDA - Enum constant in enum class com.yahoo.language.Language
Language tag "rw".
KIRGHIZ - Enum constant in enum class com.yahoo.language.Language
Language tag "ky".
KIRUNDI - Enum constant in enum class com.yahoo.language.Language
Language tag "rn".
KOREAN - Enum constant in enum class com.yahoo.language.Language
Language tag "ko".
KStemmer - Class in com.yahoo.language.simple.kstem
A stemmer implementing the Kstem algorithm by Bob Krovetz.
KStemmer() - Constructor for class com.yahoo.language.simple.kstem.KStemmer
 
KURDISH - Enum constant in enum class com.yahoo.language.Language
Language tag "ku".

L

Language - Enum Class in com.yahoo.language
 
languageCode() - Method in enum class com.yahoo.language.Language
 
LanguageDetectorFactory - Class in com.yahoo.language.opennlp
Overrides the UrlCharSequenceNormalizer, which has a bad regex, until fixed: https://issues.apache.org/jira/browse/OPENNLP-1350
LanguageDetectorFactory() - Constructor for class com.yahoo.language.opennlp.LanguageDetectorFactory
 
LAO - Enum constant in enum class com.yahoo.language.process.TokenScript
 
LAOTHIAN - Enum constant in enum class com.yahoo.language.Language
Language tag "lo".
LATIN - Enum constant in enum class com.yahoo.language.Language
Language tag "la".
LATIN - Enum constant in enum class com.yahoo.language.process.TokenScript
 
LATVIAN - Enum constant in enum class com.yahoo.language.Language
Language tag "lv".
len - Variable in class com.yahoo.language.simple.kstem.OpenStringBuilder
 
length() - Method in class com.yahoo.language.simple.kstem.OpenStringBuilder
 
LIMBU - Enum constant in enum class com.yahoo.language.process.TokenScript
 
LINEARB - Enum constant in enum class com.yahoo.language.process.TokenScript
 
LINGALA - Enum constant in enum class com.yahoo.language.Language
Language tag "ln".
Linguistics - Interface in com.yahoo.language
Factory of linguistic processors.
Linguistics.Component - Enum Class in com.yahoo.language
 
LinguisticsCase - Class in com.yahoo.language
This class provides a case normalization operation to be used e.g.
LinguisticsCase() - Constructor for class com.yahoo.language.LinguisticsCase
 
LITHUANIAN - Enum constant in enum class com.yahoo.language.Language
Language tag "lt".
LocaleFactory - Class in com.yahoo.language
 

M

MACEDONIAN - Enum constant in enum class com.yahoo.language.Language
Language tag "mk".
MALAGASY - Enum constant in enum class com.yahoo.language.Language
Language tag "mg".
MALAY - Enum constant in enum class com.yahoo.language.Language
Language tag "ms".
MALAYALAM - Enum constant in enum class com.yahoo.language.Language
Language tag "ml".
MALAYALAM - Enum constant in enum class com.yahoo.language.process.TokenScript
 
MALTESE - Enum constant in enum class com.yahoo.language.Language
Language tag "mt".
MANIPURI - Enum constant in enum class com.yahoo.language.Language
Language tag "mni".
MAORI - Enum constant in enum class com.yahoo.language.Language
Language tag "mi".
MARATHI - Enum constant in enum class com.yahoo.language.Language
Language tag "mr".
MARKER - Enum constant in enum class com.yahoo.language.process.TokenType
 
MOLDAVIAN - Enum constant in enum class com.yahoo.language.Language
Language tag "mo".
MONGOLIAN - Enum constant in enum class com.yahoo.language.Language
Language tag "mn".
MONGOLIAN - Enum constant in enum class com.yahoo.language.process.TokenScript
 
MUNDA - Enum constant in enum class com.yahoo.language.Language
Language tag "mun".
MYANMAR - Enum constant in enum class com.yahoo.language.process.TokenScript
 

N

name() - Method in class com.yahoo.language.process.SpecialTokens
Returns the name of this special tokens list
NAURU - Enum constant in enum class com.yahoo.language.Language
Language tag "na".
NEPALI - Enum constant in enum class com.yahoo.language.Language
Language tag "ne".
newCharacterBuffer(int) - Static method in class com.yahoo.language.simple.kstem.CharacterUtils
Creates a new CharacterUtils.CharacterBuffer and allocates a char[] of the given bufferSize.
newCountryHint(String) - Static method in class com.yahoo.language.detect.Hint
 
newInstance(String, String) - Static method in class com.yahoo.language.detect.Hint
 
newMarketHint(String) - Static method in class com.yahoo.language.detect.Hint
 
next() - Method in class com.yahoo.language.process.GramSplitter.GramSplitterIterator
 
next() - Method in class com.yahoo.language.simple.kstem.CharArrayMap.EntryIterator
use nextCharArray() + currentValue() for better efficiency.
nextKey() - Method in class com.yahoo.language.simple.kstem.CharArrayMap.EntryIterator
gets the next key...
nextKeyString() - Method in class com.yahoo.language.simple.kstem.CharArrayMap.EntryIterator
gets the next key as a newly created String object
NONE - Enum constant in enum class com.yahoo.language.process.StemMode
 
normalize(CharSequence) - Method in class com.yahoo.language.opennlp.UrlCharSequenceNormalizer
 
normalize(String) - Method in interface com.yahoo.language.process.Normalizer
NFKC normalizes a String.
normalize(String) - Method in class com.yahoo.language.simple.SimpleNormalizer
 
Normalizer - Interface in com.yahoo.language.process
This interface provides NFKC normalization of Strings through the underlying linguistics library.
NORMALIZER - Enum constant in enum class com.yahoo.language.Linguistics.Component
 
NORWEGIAN_BOKMAL - Enum constant in enum class com.yahoo.language.Language
Language tag "nb".
NORWEGIAN_NYNORSK - Enum constant in enum class com.yahoo.language.Language
Language tag "nn".
NUMERIC - Enum constant in enum class com.yahoo.language.process.TokenType
 

O

OCCITAN - Enum constant in enum class com.yahoo.language.Language
Language tag "oc".
offsetByCodePoints(char[], int, int, int, int) - Method in class com.yahoo.language.simple.kstem.CharacterUtils
Return the index within buf[start:start+count] which is by offset code points from index.
OGHAM - Enum constant in enum class com.yahoo.language.process.TokenScript
 
OLDITALIC - Enum constant in enum class com.yahoo.language.process.TokenScript
 
OLDPERSIAN - Enum constant in enum class com.yahoo.language.process.TokenScript
 
OpenNlpLinguistics - Class in com.yahoo.language.opennlp
Returns a linguistics implementation based on OpenNlp.
OpenNlpLinguistics() - Constructor for class com.yahoo.language.opennlp.OpenNlpLinguistics
 
OpenNlpTokenizer - Class in com.yahoo.language.opennlp
Tokenizer using OpenNlp
OpenNlpTokenizer() - Constructor for class com.yahoo.language.opennlp.OpenNlpTokenizer
 
OpenNlpTokenizer(Normalizer, Transformer) - Constructor for class com.yahoo.language.opennlp.OpenNlpTokenizer
 
OpenNlpTokenizer(Normalizer, Transformer, SpecialTokenRegistry) - Constructor for class com.yahoo.language.opennlp.OpenNlpTokenizer
 
OpenStringBuilder - Class in com.yahoo.language.simple.kstem
A StringBuilder that allows one to access the array.
OpenStringBuilder() - Constructor for class com.yahoo.language.simple.kstem.OpenStringBuilder
 
OpenStringBuilder(int) - Constructor for class com.yahoo.language.simple.kstem.OpenStringBuilder
 
ORIYA - Enum constant in enum class com.yahoo.language.Language
Language tag "or".
ORIYA - Enum constant in enum class com.yahoo.language.process.TokenScript
 
OROMO - Enum constant in enum class com.yahoo.language.Language
Language tag "om".
OSMANYA - Enum constant in enum class com.yahoo.language.process.TokenScript
 

P

PASHTO - Enum constant in enum class com.yahoo.language.Language
Language tag "ps".
PERSIAN - Enum constant in enum class com.yahoo.language.Language
Language tag "fa".
POLISH - Enum constant in enum class com.yahoo.language.Language
Language tag "pl".
PORTUGUESE - Enum constant in enum class com.yahoo.language.Language
Language tag "pt".
ProcessingException - Exception in com.yahoo.language.process
Exception class indicating that a fatal error occured during linguistic processing.
ProcessingException(String) - Constructor for exception com.yahoo.language.process.ProcessingException
 
ProcessingException(String, Throwable) - Constructor for exception com.yahoo.language.process.ProcessingException
 
PUNCTUATION - Enum constant in enum class com.yahoo.language.process.TokenType
 
PUNJABI - Enum constant in enum class com.yahoo.language.Language
Language tag "pa".
put(char[], V) - Method in class com.yahoo.language.simple.kstem.CharArrayMap
Add the given mapping.
put(CharSequence, V) - Method in class com.yahoo.language.simple.kstem.CharArrayMap
Add the given mapping.
put(Object, V) - Method in class com.yahoo.language.simple.kstem.CharArrayMap
 
put(String, V) - Method in class com.yahoo.language.simple.kstem.CharArrayMap
Add the given mapping.

Q

QUECHUA - Enum constant in enum class com.yahoo.language.Language
Language tag "qu".

R

remove() - Method in class com.yahoo.language.process.GramSplitter.GramSplitterIterator
 
remove() - Method in class com.yahoo.language.simple.kstem.CharArrayMap.EntryIterator
 
remove(int) - Method in class com.yahoo.language.process.StemList
 
remove(Object) - Method in class com.yahoo.language.simple.kstem.CharArrayMap.EntrySet
 
remove(Object) - Method in class com.yahoo.language.simple.kstem.CharArrayMap
 
replacement() - Method in class com.yahoo.language.process.SpecialTokens.Token
Returns the token to replace occurrences of this by, which equals token() unless this has a replacement.
reserve(int) - Method in class com.yahoo.language.simple.kstem.OpenStringBuilder
 
reset() - Method in class com.yahoo.language.simple.kstem.CharacterUtils.CharacterBuffer
Resets the CharacterBuffer.
reset() - Method in class com.yahoo.language.simple.kstem.OpenStringBuilder
 
resize(int) - Method in class com.yahoo.language.simple.kstem.OpenStringBuilder
 
RHAETO_ROMANCE - Enum constant in enum class com.yahoo.language.Language
Language tag "rm".
ROMANIAN - Enum constant in enum class com.yahoo.language.Language
Language tag "ro".
RUNIC - Enum constant in enum class com.yahoo.language.process.TokenScript
 
RUSSIAN - Enum constant in enum class com.yahoo.language.Language
Language tag "ru".

S

SAMOAN - Enum constant in enum class com.yahoo.language.Language
Language tag "sm".
SANGHO - Enum constant in enum class com.yahoo.language.Language
Language tag "sg".
SANSKRIT - Enum constant in enum class com.yahoo.language.Language
Language tag "sa".
SCOTS_GAELIC - Enum constant in enum class com.yahoo.language.Language
Language tag "gd".
segment(String, Language) - Method in interface com.yahoo.language.process.Segmenter
Split input-string into tokens, and returned a list of tokens in unprocessed form (i.e.
segment(String, Language) - Method in class com.yahoo.language.process.SegmenterImpl
 
Segmenter - Interface in com.yahoo.language.process
Interface providing segmentation, i.e.
SEGMENTER - Enum constant in enum class com.yahoo.language.Linguistics.Component
 
SegmenterImpl - Class in com.yahoo.language.process
 
SegmenterImpl(Tokenizer) - Constructor for class com.yahoo.language.process.SegmenterImpl
 
SERBIAN - Enum constant in enum class com.yahoo.language.Language
Language tag "sr".
SERBO_CROATIAN - Enum constant in enum class com.yahoo.language.Language
Language tag "s".
SESOTHO - Enum constant in enum class com.yahoo.language.Language
Language tag "st".
set(char[], int) - Method in class com.yahoo.language.simple.kstem.OpenStringBuilder
 
set(int, String) - Method in class com.yahoo.language.process.StemList
 
setCharAt(int, char) - Method in class com.yahoo.language.simple.kstem.OpenStringBuilder
 
setDestination(String) - Method in class com.yahoo.language.process.Embedder.Context
Sets the name of the recipient of this tensor.
setLanguage(Language) - Method in class com.yahoo.language.process.Embedder.Context
Sets the language of the text, or UNKNOWN to use language independent embedding
setLength(int) - Method in class com.yahoo.language.simple.kstem.OpenStringBuilder
 
setOffset(long) - Method in class com.yahoo.language.simple.SimpleToken
 
setScript(TokenScript) - Method in class com.yahoo.language.simple.SimpleToken
 
setSpecialToken(boolean) - Method in class com.yahoo.language.simple.SimpleToken
 
SETSWANA - Enum constant in enum class com.yahoo.language.Language
Language tag "tn".
setTokenString(String) - Method in class com.yahoo.language.simple.SimpleToken
 
setType(TokenType) - Method in class com.yahoo.language.simple.SimpleToken
 
setValue(V) - Method in class com.yahoo.language.simple.kstem.CharArrayMap.EntryIterator
sets the value associated with the last key returned
SHAVIAN - Enum constant in enum class com.yahoo.language.process.TokenScript
 
SHONA - Enum constant in enum class com.yahoo.language.Language
Language tag "sn".
SHORTEST - Enum constant in enum class com.yahoo.language.process.StemMode
 
SICHUAN_YI - Enum constant in enum class com.yahoo.language.Language
Language tag "ii".
SimpleDetector - Class in com.yahoo.language.simple
Includes functionality for determining the langCode from a sample or from the encoding.
SimpleDetector() - Constructor for class com.yahoo.language.simple.SimpleDetector
 
SimpleLinguistics - Class in com.yahoo.language.simple
Factory of simple linguistic processor implementations.
SimpleLinguistics() - Constructor for class com.yahoo.language.simple.SimpleLinguistics
 
SimpleNormalizer - Class in com.yahoo.language.simple
 
SimpleNormalizer() - Constructor for class com.yahoo.language.simple.SimpleNormalizer
 
SimpleToken - Class in com.yahoo.language.simple
 
SimpleToken(String) - Constructor for class com.yahoo.language.simple.SimpleToken
 
SimpleToken(String, String) - Constructor for class com.yahoo.language.simple.SimpleToken
 
SimpleTokenizer - Class in com.yahoo.language.simple
A tokenizer which splits on whitespace, normalizes and transforms using the given implementations and stems using the kstem algorithm.
SimpleTokenizer() - Constructor for class com.yahoo.language.simple.SimpleTokenizer
 
SimpleTokenizer(Normalizer) - Constructor for class com.yahoo.language.simple.SimpleTokenizer
 
SimpleTokenizer(Normalizer, Transformer) - Constructor for class com.yahoo.language.simple.SimpleTokenizer
 
SimpleTokenizer(Normalizer, Transformer, SpecialTokenRegistry) - Constructor for class com.yahoo.language.simple.SimpleTokenizer
 
SimpleTokenType - Class in com.yahoo.language.simple
 
SimpleTokenType() - Constructor for class com.yahoo.language.simple.SimpleTokenType
 
SimpleTransformer - Class in com.yahoo.language.simple
Converts all accented characters into their de-accented counterparts followed by their combining diacritics, then strips off the diacritics using a regex.
SimpleTransformer() - Constructor for class com.yahoo.language.simple.SimpleTransformer
 
SINDHI - Enum constant in enum class com.yahoo.language.Language
Language tag "sd".
SINHALA - Enum constant in enum class com.yahoo.language.process.TokenScript
 
SINHALESE - Enum constant in enum class com.yahoo.language.Language
Language tag "si".
SISWATI - Enum constant in enum class com.yahoo.language.Language
Language tag "ss".
size() - Method in class com.yahoo.language.process.StemList
 
size() - Method in class com.yahoo.language.simple.kstem.CharArrayMap.EntrySet
 
size() - Method in class com.yahoo.language.simple.kstem.CharArrayMap
 
size() - Method in class com.yahoo.language.simple.kstem.CharArraySet
 
size() - Method in class com.yahoo.language.simple.kstem.OpenStringBuilder
 
SLOVAK - Enum constant in enum class com.yahoo.language.Language
Language tag "sk".
SLOVENIAN - Enum constant in enum class com.yahoo.language.Language
Language tag "sl".
SOMALI - Enum constant in enum class com.yahoo.language.Language
Language tag "so".
SPACE - Enum constant in enum class com.yahoo.language.process.TokenType
 
SPANISH - Enum constant in enum class com.yahoo.language.Language
Language tag "es".
SpecialTokenRegistry - Class in com.yahoo.language.process
Immutable named lists of "special tokens" - strings which should override the normal tokenizer semantics and be tokenized into a single token.
SpecialTokenRegistry() - Constructor for class com.yahoo.language.process.SpecialTokenRegistry
Creates an empty special token registry
SpecialTokenRegistry(SpecialtokensConfig) - Constructor for class com.yahoo.language.process.SpecialTokenRegistry
Create a special token registry from a configuration object.
SpecialTokenRegistry(List<SpecialTokens>) - Constructor for class com.yahoo.language.process.SpecialTokenRegistry
 
SpecialTokens - Class in com.yahoo.language.process
An immutable list of special tokens - strings which should override the normal tokenizer semantics and be tokenized into a single token.
SpecialTokens(String, List<SpecialTokens.Token>) - Constructor for class com.yahoo.language.process.SpecialTokens
 
SpecialTokens.Token - Class in com.yahoo.language.process
An immutable special token
split(String, int) - Method in class com.yahoo.language.process.GramSplitter
Splits the input into grams of size n and returns an iterator over grams represented as [start index,length] pairs into the input string.
stem(String) - Method in class com.yahoo.language.simple.kstem.KStemmer
 
stem(String, StemMode, Language) - Method in interface com.yahoo.language.process.Stemmer
Stem input according to specified stemming mode.
stem(String, StemMode, Language) - Method in class com.yahoo.language.process.StemmerImpl
 
StemList - Class in com.yahoo.language.process
A list of strings which does not allow for duplicate elements.
StemList() - Constructor for class com.yahoo.language.process.StemList
 
StemList(String...) - Constructor for class com.yahoo.language.process.StemList
 
Stemmer - Interface in com.yahoo.language.process
Interface providing stemming of single words.
STEMMER - Enum constant in enum class com.yahoo.language.Linguistics.Component
 
StemmerImpl - Class in com.yahoo.language.process
 
StemmerImpl(Tokenizer) - Constructor for class com.yahoo.language.process.StemmerImpl
 
StemMode - Enum Class in com.yahoo.language.process
An enum of the stemming modes which can be requested.
subSequence(int, int) - Method in class com.yahoo.language.simple.kstem.OpenStringBuilder
 
SUNDANESE - Enum constant in enum class com.yahoo.language.Language
Language tag "su".
SWAHILI - Enum constant in enum class com.yahoo.language.Language
Language tag "sw".
SWEDISH - Enum constant in enum class com.yahoo.language.Language
Language tag "sv".
SYLOTINAGRI - Enum constant in enum class com.yahoo.language.process.TokenScript
 
SYMBOL - Enum constant in enum class com.yahoo.language.process.TokenType
 
SYRIAC - Enum constant in enum class com.yahoo.language.Language
Language tag "syr".
SYRIAC - Enum constant in enum class com.yahoo.language.process.TokenScript
 

T

TAGALOG - Enum constant in enum class com.yahoo.language.Language
Language tag "fil".
TAGALOG - Enum constant in enum class com.yahoo.language.process.TokenScript
 
TAGBANWA - Enum constant in enum class com.yahoo.language.process.TokenScript
 
TAILE - Enum constant in enum class com.yahoo.language.process.TokenScript
 
TAILUE - Enum constant in enum class com.yahoo.language.process.TokenScript
 
TAJIK - Enum constant in enum class com.yahoo.language.Language
Language tag "tg".
TAMIL - Enum constant in enum class com.yahoo.language.Language
Language tag "ta".
TAMIL - Enum constant in enum class com.yahoo.language.process.TokenScript
 
TATAR - Enum constant in enum class com.yahoo.language.Language
Language tag "tt".
TELUGU - Enum constant in enum class com.yahoo.language.Language
Language tag "te".
TELUGU - Enum constant in enum class com.yahoo.language.process.TokenScript
 
THAANA - Enum constant in enum class com.yahoo.language.process.TokenScript
 
THAI - Enum constant in enum class com.yahoo.language.Language
Language tag "th".
THAI - Enum constant in enum class com.yahoo.language.process.TokenScript
 
throwsOnUse - Static variable in interface com.yahoo.language.process.Embedder
An instance of this which throws IllegalStateException if attempted used
TIBETAN - Enum constant in enum class com.yahoo.language.Language
Language tag "bo".
TIBETAN - Enum constant in enum class com.yahoo.language.process.TokenScript
 
TIFINAGH - Enum constant in enum class com.yahoo.language.process.TokenScript
 
TIGRINYA - Enum constant in enum class com.yahoo.language.Language
Language tag "ti".
toCharArray() - Method in class com.yahoo.language.simple.kstem.OpenStringBuilder
 
toChars(int[], int, int, char[], int) - Method in class com.yahoo.language.simple.kstem.CharacterUtils
Converts a sequence of unicode code points to a sequence of Java characters.
toCodePoints(char[], int, int, int[], int) - Method in class com.yahoo.language.simple.kstem.CharacterUtils
Converts a sequence of Java characters to a sequence of unicode code points.
toExtractedList() - Method in class com.yahoo.language.process.GramSplitter.GramSplitterIterator
Convenience list which splits the remaining items in this iterator into a list of gram strings
token() - Method in class com.yahoo.language.process.SpecialTokens.Token
Returns the special token
Token - Interface in com.yahoo.language.process
A single token produced by the tokenizer.
Token(String) - Constructor for class com.yahoo.language.process.SpecialTokens.Token
Creates a special token
Token(String, String) - Constructor for class com.yahoo.language.process.SpecialTokens.Token
Creates a special token which will be represented by the given replacement token
tokenize(String, boolean) - Method in class com.yahoo.language.process.SpecialTokens
Returns the special token starting at the start of the given string, or null if no special token starts at this string
tokenize(String, Language, StemMode, boolean) - Method in class com.yahoo.language.opennlp.OpenNlpTokenizer
 
tokenize(String, Language, StemMode, boolean) - Method in interface com.yahoo.language.process.Tokenizer
Returns the tokens produced from an input string under the rules of the given Language and additional options
tokenize(String, Language, StemMode, boolean) - Method in class com.yahoo.language.simple.SimpleTokenizer
 
Tokenizer - Interface in com.yahoo.language.process
Language-sensitive tokenization of a text string.
TOKENIZER - Enum constant in enum class com.yahoo.language.Linguistics.Component
 
TokenScript - Enum Class in com.yahoo.language.process
List of token scripts (e.g.
TokenType - Enum Class in com.yahoo.language.process
An enumeration of token types.
toLowerCase(char[], int, int) - Method in class com.yahoo.language.simple.kstem.CharacterUtils
Converts each unicode codepoint to lowerCase via Character.toLowerCase(int) starting at the given offset.
toLowerCase(String) - Static method in class com.yahoo.language.LinguisticsCase
The lower casing method to use in Vespa when doing language independent processing of natural language data.
TONGA - Enum constant in enum class com.yahoo.language.Language
Language tag "to".
toString() - Method in class com.yahoo.language.process.SpecialTokens.Token
 
toString() - Method in class com.yahoo.language.simple.kstem.CharArrayMap
 
toString() - Method in class com.yahoo.language.simple.kstem.CharArraySet
 
toString() - Method in class com.yahoo.language.simple.kstem.OpenStringBuilder
 
toString() - Method in class com.yahoo.language.simple.SimpleToken
 
toUpperCase(char[], int, int) - Method in class com.yahoo.language.simple.kstem.CharacterUtils
Converts each unicode codepoint to UpperCase via Character.toUpperCase(int) starting at the given offset.
Transformer - Interface in com.yahoo.language.process
Interface for providers of text transformations such as accent removal.
TRANSFORMER - Enum constant in enum class com.yahoo.language.Linguistics.Component
 
TSONGA - Enum constant in enum class com.yahoo.language.Language
Language tag "ts".
TURKISH - Enum constant in enum class com.yahoo.language.Language
Language tag "tr".
TURKMEN - Enum constant in enum class com.yahoo.language.Language
Language tag "tk".
TWI - Enum constant in enum class com.yahoo.language.Language
Language tag "tw".

U

UGARITIC - Enum constant in enum class com.yahoo.language.Language
Language tag "uga".
UGARITIC - Enum constant in enum class com.yahoo.language.process.TokenScript
 
UIGHUR - Enum constant in enum class com.yahoo.language.Language
Language tag "ug".
UKRAINIAN - Enum constant in enum class com.yahoo.language.Language
Language tag "uk".
UNKNOWN - Enum constant in enum class com.yahoo.language.Language
Language tag "un".
UNKNOWN - Enum constant in enum class com.yahoo.language.process.TokenScript
 
UNKNOWN - Enum constant in enum class com.yahoo.language.process.TokenType
 
unmodifiableMap(CharArrayMap<V>) - Static method in class com.yahoo.language.simple.kstem.CharArrayMap
Returns an unmodifiable CharArrayMap.
unmodifiableSet(CharArraySet) - Static method in class com.yahoo.language.simple.kstem.CharArraySet
Returns an unmodifiable CharArraySet.
unsafeWrite(char) - Method in class com.yahoo.language.simple.kstem.OpenStringBuilder
 
unsafeWrite(char[], int, int) - Method in class com.yahoo.language.simple.kstem.OpenStringBuilder
 
URDU - Enum constant in enum class com.yahoo.language.Language
Language tag "ur".
UrlCharSequenceNormalizer - Class in com.yahoo.language.opennlp
Modifies UrlCharSequenceNormalizer to avoid the bad email regex.
UrlCharSequenceNormalizer() - Constructor for class com.yahoo.language.opennlp.UrlCharSequenceNormalizer
 
UZBEK - Enum constant in enum class com.yahoo.language.Language
Language tag "uz".

V

valueOf(int) - Static method in enum class com.yahoo.language.process.TokenType
Translates this from the int code representation returned from TokenType.getValue()
valueOf(int) - Static method in class com.yahoo.language.simple.SimpleTokenType
 
valueOf(String) - Static method in enum class com.yahoo.language.Language
Returns the enum constant of this class with the specified name.
valueOf(String) - Static method in enum class com.yahoo.language.Linguistics.Component
Returns the enum constant of this class with the specified name.
valueOf(String) - Static method in enum class com.yahoo.language.process.StemMode
Returns the enum constant of this class with the specified name.
valueOf(String) - Static method in enum class com.yahoo.language.process.TokenScript
Returns the enum constant of this class with the specified name.
valueOf(String) - Static method in enum class com.yahoo.language.process.TokenType
Returns the enum constant of this class with the specified name.
values() - Static method in enum class com.yahoo.language.Language
Returns an array containing the constants of this enum class, in the order they are declared.
values() - Static method in enum class com.yahoo.language.Linguistics.Component
Returns an array containing the constants of this enum class, in the order they are declared.
values() - Static method in enum class com.yahoo.language.process.StemMode
Returns an array containing the constants of this enum class, in the order they are declared.
values() - Static method in enum class com.yahoo.language.process.TokenScript
Returns an array containing the constants of this enum class, in the order they are declared.
values() - Static method in enum class com.yahoo.language.process.TokenType
Returns an array containing the constants of this enum class, in the order they are declared.
VIETNAMESE - Enum constant in enum class com.yahoo.language.Language
Language tag "vi".
VIETNAMESE - Enum constant in enum class com.yahoo.language.process.TokenScript
 
VOLAPUK - Enum constant in enum class com.yahoo.language.Language
Language tag "vo".

W

WELSH - Enum constant in enum class com.yahoo.language.Language
Language tag "cy".
WOLOF - Enum constant in enum class com.yahoo.language.Language
Language tag "wo".
write(char) - Method in class com.yahoo.language.simple.kstem.OpenStringBuilder
 
write(char[]) - Method in class com.yahoo.language.simple.kstem.OpenStringBuilder
 
write(char[], int, int) - Method in class com.yahoo.language.simple.kstem.OpenStringBuilder
 
write(int) - Method in class com.yahoo.language.simple.kstem.OpenStringBuilder
 
write(OpenStringBuilder) - Method in class com.yahoo.language.simple.kstem.OpenStringBuilder
 
write(String) - Method in class com.yahoo.language.simple.kstem.OpenStringBuilder
 

X

XHOSA - Enum constant in enum class com.yahoo.language.Language
Language tag "xh".

Y

YI - Enum constant in enum class com.yahoo.language.process.TokenScript
 
YIDDISH - Enum constant in enum class com.yahoo.language.Language
Language tag "yi".
YORUBA - Enum constant in enum class com.yahoo.language.Language
Language tag "yo".

Z

ZHUANG - Enum constant in enum class com.yahoo.language.Language
Language tag "za".
ZULU - Enum constant in enum class com.yahoo.language.Language
Language tag "zu".
A B C D E F G H I J K L M N O P Q R S T U V W X Y Z 
All Classes and Interfaces|All Packages|Constant Field Values|Serialized Form