Interface Tokenizer
-
- All Known Implementing Classes:
BertWordPieceStreamTokenizer,BertWordPieceTokenizer,DefaultStreamTokenizer,DefaultTokenizer,NGramTokenizer
public interface Tokenizer
-
-
Method Summary
All Methods Instance Methods Abstract Methods Modifier and Type Method Description intcountTokens()The number of tokens in the tokenizerList<String>getTokens()Returns a list of all the tokensbooleanhasMoreTokens()An iterator for tracking whether more tokens are left in the iterator notStringnextToken()The next token (word usually) in the stringvoidsetTokenPreProcessor(TokenPreProcess tokenPreProcessor)Set the token pre process
-
-
-
Method Detail
-
hasMoreTokens
boolean hasMoreTokens()
An iterator for tracking whether more tokens are left in the iterator not- Returns:
- whether there is anymore tokens to iterate over
-
countTokens
int countTokens()
The number of tokens in the tokenizer- Returns:
- the number of tokens
-
nextToken
String nextToken()
The next token (word usually) in the string- Returns:
- the next token in the string if any
-
getTokens
List<String> getTokens()
Returns a list of all the tokens- Returns:
- a list of all the tokens
-
setTokenPreProcessor
void setTokenPreProcessor(TokenPreProcess tokenPreProcessor)
Set the token pre process- Parameters:
tokenPreProcessor- the token pre processor to set
-
-