java.lang.Object
- org.deeplearning4j.text.tokenization.tokenizer.BertWordPieceTokenizer

All Implemented Interfaces:

Tokenizer

Direct Known Subclasses:

BertWordPieceStreamTokenizer
```
public class BertWordPieceTokenizer
extends Object
implements Tokenizer
```

Field Summary

Fields
Modifier and Type Field Description

static Pattern splitPattern

Constructor Summary

Constructors
Constructor	Description
`BertWordPieceTokenizer(String tokens, NavigableMap<String,Integer> vocab, TokenPreProcess preTokenizePreProcessor, TokenPreProcess tokenPreProcess)`

Method Summary

All Methods Instance Methods Concrete Methods
Modifier and Type	Method	Description
`protected void`	`checkIfEmpty(Map<String,Integer> m, String candidate)`
`int`	`countTokens()`	The number of tokens in the tokenizer
`protected String`	`findLongestSubstring(NavigableMap<String,Integer> vocab, String candidate)`
`List<String>`	`getTokens()`	Returns a list of all the tokens
`boolean`	`hasMoreTokens()`	An iterator for tracking whether more tokens are left in the iterator not
`String`	`nextToken()`	The next token (word usually) in the string
`void`	`setTokenPreProcessor(TokenPreProcess tokenPreProcessor)`	Set the token pre process

Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

- Field Detail
  - splitPattern
```
public static final Pattern splitPattern
```
- Constructor Detail
  - BertWordPieceTokenizer
```
public BertWordPieceTokenizer(String tokens,
                              NavigableMap<String,Integer> vocab,
                              TokenPreProcess preTokenizePreProcessor,
                              TokenPreProcess tokenPreProcess)
```
- Method Detail
  - hasMoreTokens
```
public boolean hasMoreTokens()
```
    Description copied from interface: Tokenizer
    
    An iterator for tracking whether more tokens are left in the iterator not
    
    Specified by:
    
    hasMoreTokens in interface Tokenizer
    
    Returns:
    
    whether there is anymore tokens to iterate over
  - countTokens
```
public int countTokens()
```
    Description copied from interface: Tokenizer
    
    The number of tokens in the tokenizer
    
    Specified by:
    
    countTokens in interface Tokenizer
    
    Returns:
    
    the number of tokens
  - nextToken
```
public String nextToken()
```
    Description copied from interface: Tokenizer
    
    The next token (word usually) in the string
    
    Specified by:
    
    nextToken in interface Tokenizer
    
    Returns:
    
    the next token in the string if any
  - getTokens
```
public List<String> getTokens()
```
    Description copied from interface: Tokenizer
    
    Returns a list of all the tokens
    
    Specified by:
    
    getTokens in interface Tokenizer
    
    Returns:
    
    a list of all the tokens
  - setTokenPreProcessor
```
public void setTokenPreProcessor(TokenPreProcess tokenPreProcessor)
```
    Description copied from interface: Tokenizer
    
    Set the token pre process
    
    Specified by:
    
    setTokenPreProcessor in interface Tokenizer
    
    Parameters:
    
    tokenPreProcessor - the token pre processor to set
  - findLongestSubstring
```
protected String findLongestSubstring(NavigableMap<String,Integer> vocab,
                                      String candidate)
```
  - checkIfEmpty
```
protected void checkIfEmpty(Map<String,Integer> m,
                            String candidate)
```

Class BertWordPieceTokenizer

Field Summary

Constructor Summary

Method Summary

Methods inherited from class java.lang.Object

Field Detail

splitPattern

Constructor Detail

BertWordPieceTokenizer

Method Detail

hasMoreTokens

countTokens

nextToken

getTokens

setTokenPreProcessor

findLongestSubstring

checkIfEmpty