Interface Tokenizer
-
- All Known Implementing Classes:
BertWordPieceStreamTokenizer
,BertWordPieceTokenizer
,DefaultStreamTokenizer
,DefaultTokenizer
,NGramTokenizer
public interface Tokenizer
-
-
Method Summary
All Methods Instance Methods Abstract Methods Modifier and Type Method Description int
countTokens()
The number of tokens in the tokenizerList<String>
getTokens()
Returns a list of all the tokensboolean
hasMoreTokens()
An iterator for tracking whether more tokens are left in the iterator notString
nextToken()
The next token (word usually) in the stringvoid
setTokenPreProcessor(TokenPreProcess tokenPreProcessor)
Set the token pre process
-
-
-
Method Detail
-
hasMoreTokens
boolean hasMoreTokens()
An iterator for tracking whether more tokens are left in the iterator not- Returns:
- whether there is anymore tokens to iterate over
-
countTokens
int countTokens()
The number of tokens in the tokenizer- Returns:
- the number of tokens
-
nextToken
String nextToken()
The next token (word usually) in the string- Returns:
- the next token in the string if any
-
getTokens
List<String> getTokens()
Returns a list of all the tokens- Returns:
- a list of all the tokens
-
setTokenPreProcessor
void setTokenPreProcessor(TokenPreProcess tokenPreProcessor)
Set the token pre process- Parameters:
tokenPreProcessor
- the token pre processor to set
-
-