public final class Tokenizer extends Object
Constructor and Description |
---|
Tokenizer(com.yahoo.language.Linguistics linguistics)
Creates a tokenizer which initializes from a given Linguistics
|
Modifier and Type | Method and Description |
---|---|
void |
setSpecialTokens(SpecialTokens specialTokens)
Sets a list of tokens (Strings) which should be returned as WORD tokens regardless
of their content.
|
void |
setSubstringSpecialTokens(boolean substringSpecialTokens)
Sets whether to recognize tokens also as substrings of other tokens, needed for cjk.
|
List<Token> |
tokenize(String string)
Resets this tokenizer and create tokens from the given string, using
"default" as the default index, and using no index information.
|
List<Token> |
tokenize(String string,
IndexFacts.Session indexFacts)
Resets this tokenizer and create tokens from the given string, using
"default" as the default index
|
List<Token> |
tokenize(String string,
String defaultIndexName,
IndexFacts.Session indexFacts)
Resets this tokenizer and create tokens from the given string.
|
public Tokenizer(com.yahoo.language.Linguistics linguistics)
public void setSpecialTokens(SpecialTokens specialTokens)
public void setSubstringSpecialTokens(boolean substringSpecialTokens)
public List<Token> tokenize(String string)
public List<Token> tokenize(String string, IndexFacts.Session indexFacts)
public List<Token> tokenize(String string, String defaultIndexName, IndexFacts.Session indexFacts)
string
- the string to tokenizedefaultIndexName
- the name of the index to use as defaultindexFacts
- information about the indexes we will searchCopyright © 2018. All rights reserved.