Interface TokenizerFactory
-
- All Known Implementing Classes:
BertWordPieceTokenizerFactory,DefaultTokenizerFactory,NGramTokenizerFactory
public interface TokenizerFactoryGenerates a tokenizer for a given string- Author:
- Adam Gibson
-
-
Method Summary
All Methods Instance Methods Abstract Methods Modifier and Type Method Description Tokenizercreate(InputStream toTokenize)Create a tokenizer based on an input streamTokenizercreate(String toTokenize)The tokenizer to createComplexTokenPreProcessgetTokenPreProcessor()Returns TokenPreProcessor set for this TokenizerFactory instancevoidsetTokenPreProcessor(TokenPreProcess preProcessor)Sets a token pre processor to be used with every tokenizer
-
-
-
Method Detail
-
create
Tokenizer create(String toTokenize)
The tokenizer to createComplex- Parameters:
toTokenize- the string to createComplex the tokenizer with- Returns:
- the new tokenizer
-
create
Tokenizer create(InputStream toTokenize)
Create a tokenizer based on an input stream- Parameters:
toTokenize-- Returns:
-
setTokenPreProcessor
void setTokenPreProcessor(TokenPreProcess preProcessor)
Sets a token pre processor to be used with every tokenizer- Parameters:
preProcessor- the token pre processor to use
-
getTokenPreProcessor
TokenPreProcess getTokenPreProcessor()
Returns TokenPreProcessor set for this TokenizerFactory instance- Returns:
- TokenPreProcessor instance, or null if no preprocessor was defined
-
-