Class DefaultStreamTokenizer
- java.lang.Object
-
- org.deeplearning4j.text.tokenization.tokenizer.DefaultStreamTokenizer
-
- All Implemented Interfaces:
Tokenizer
public class DefaultStreamTokenizer extends Object implements Tokenizer
Tokenizer based on theStreamTokenizer- Author:
- Adam Gibson
-
-
Field Summary
Fields Modifier and Type Field Description protected static org.slf4j.Loggerlog
-
Constructor Summary
Constructors Constructor Description DefaultStreamTokenizer(InputStream is)
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description intcountTokens()Returns number of tokens PLEASE NOTE: this method effectively preloads all tokens.List<String>getTokens()Returns all tokens as list of StringsbooleanhasMoreTokens()Checks, if any prebuffered tokens left, otherswise checks underlying streamStringnextToken()This method returns next token from prebuffered list of tokens or underlying InputStreamvoidsetTokenPreProcessor(TokenPreProcess tokenPreProcessor)Set the token pre process
-
-
-
Constructor Detail
-
DefaultStreamTokenizer
public DefaultStreamTokenizer(InputStream is)
-
-
Method Detail
-
hasMoreTokens
public boolean hasMoreTokens()
Checks, if any prebuffered tokens left, otherswise checks underlying stream- Specified by:
hasMoreTokensin interfaceTokenizer- Returns:
-
countTokens
public int countTokens()
Returns number of tokens PLEASE NOTE: this method effectively preloads all tokens. So use it with caution, since on large streams it will consume big amount of memory- Specified by:
countTokensin interfaceTokenizer- Returns:
-
nextToken
public String nextToken()
This method returns next token from prebuffered list of tokens or underlying InputStream
-
setTokenPreProcessor
public void setTokenPreProcessor(TokenPreProcess tokenPreProcessor)
Description copied from interface:TokenizerSet the token pre process- Specified by:
setTokenPreProcessorin interfaceTokenizer- Parameters:
tokenPreProcessor- the token pre processor to set
-
-