Interface TokenizerFactory
-
- All Known Implementing Classes:
BertWordPieceTokenizerFactory
,DefaultTokenizerFactory
,NGramTokenizerFactory
public interface TokenizerFactory
Generates a tokenizer for a given string- Author:
- Adam Gibson
-
-
Method Summary
All Methods Instance Methods Abstract Methods Modifier and Type Method Description Tokenizer
create(InputStream toTokenize)
Create a tokenizer based on an input streamTokenizer
create(String toTokenize)
The tokenizer to createComplexTokenPreProcess
getTokenPreProcessor()
Returns TokenPreProcessor set for this TokenizerFactory instancevoid
setTokenPreProcessor(TokenPreProcess preProcessor)
Sets a token pre processor to be used with every tokenizer
-
-
-
Method Detail
-
create
Tokenizer create(String toTokenize)
The tokenizer to createComplex- Parameters:
toTokenize
- the string to createComplex the tokenizer with- Returns:
- the new tokenizer
-
create
Tokenizer create(InputStream toTokenize)
Create a tokenizer based on an input stream- Parameters:
toTokenize
-- Returns:
-
setTokenPreProcessor
void setTokenPreProcessor(TokenPreProcess preProcessor)
Sets a token pre processor to be used with every tokenizer- Parameters:
preProcessor
- the token pre processor to use
-
getTokenPreProcessor
TokenPreProcess getTokenPreProcessor()
Returns TokenPreProcessor set for this TokenizerFactory instance- Returns:
- TokenPreProcessor instance, or null if no preprocessor was defined
-
-