Class DefaultStreamTokenizer
- java.lang.Object
-
- org.deeplearning4j.text.tokenization.tokenizer.DefaultStreamTokenizer
-
- All Implemented Interfaces:
Tokenizer
public class DefaultStreamTokenizer extends Object implements Tokenizer
Tokenizer based on theStreamTokenizer
- Author:
- Adam Gibson
-
-
Field Summary
Fields Modifier and Type Field Description protected static org.slf4j.Logger
log
-
Constructor Summary
Constructors Constructor Description DefaultStreamTokenizer(InputStream is)
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description int
countTokens()
Returns number of tokens PLEASE NOTE: this method effectively preloads all tokens.List<String>
getTokens()
Returns all tokens as list of Stringsboolean
hasMoreTokens()
Checks, if any prebuffered tokens left, otherswise checks underlying streamString
nextToken()
This method returns next token from prebuffered list of tokens or underlying InputStreamvoid
setTokenPreProcessor(TokenPreProcess tokenPreProcessor)
Set the token pre process
-
-
-
Constructor Detail
-
DefaultStreamTokenizer
public DefaultStreamTokenizer(InputStream is)
-
-
Method Detail
-
hasMoreTokens
public boolean hasMoreTokens()
Checks, if any prebuffered tokens left, otherswise checks underlying stream- Specified by:
hasMoreTokens
in interfaceTokenizer
- Returns:
-
countTokens
public int countTokens()
Returns number of tokens PLEASE NOTE: this method effectively preloads all tokens. So use it with caution, since on large streams it will consume big amount of memory- Specified by:
countTokens
in interfaceTokenizer
- Returns:
-
nextToken
public String nextToken()
This method returns next token from prebuffered list of tokens or underlying InputStream
-
setTokenPreProcessor
public void setTokenPreProcessor(TokenPreProcess tokenPreProcessor)
Description copied from interface:Tokenizer
Set the token pre process- Specified by:
setTokenPreProcessor
in interfaceTokenizer
- Parameters:
tokenPreProcessor
- the token pre processor to set
-
-