BertIterator.Builder (deeplearning4j-nlp 1.0.0-M2 API)

java.lang.Object
- org.deeplearning4j.iterator.BertIterator.Builder

Enclosing class:: BertIterator

public static class BertIterator.Builder
extends Object

Field Summary

Fields
Modifier and Type	Field and Description
`protected String`	`appendToken`
`protected BertIterator.FeatureArrays`	`featureArrays`
`protected BertIterator.LengthHandling`	`lengthHandling`
`protected BertSequenceMasker`	`masker`
`protected String`	`maskToken`
`protected int`	`maxTokens`
`protected int`	`minibatchSize`
`protected boolean`	`padMinibatches`
`protected String`	`prependToken`
`protected org.nd4j.linalg.dataset.api.MultiDataSetPreProcessor`	`preProcessor`
`protected LabeledPairSentenceProvider`	`sentencePairProvider`
`protected LabeledSentenceProvider`	`sentenceProvider`
`protected BertIterator.Task`	`task`
`protected TokenizerFactory`	`tokenizerFactory`
`protected BertIterator.UnsupervisedLabelFormat`	`unsupervisedLabelFormat`
`protected Map<String,Integer>`	`vocabMap`

Constructor Summary

Constructors
Constructor and Description

Builder()

Constructors
Constructor and Description
`Builder()`

Method Summary

All Methods Instance Methods Concrete Methods
Modifier and Type	Method and Description
`BertIterator.Builder`	`appendToken(String appendToken)` Append the specified token to the sequences, when doing training on sentence pairs. Generally "[SEP]" is used No token in appended by default.
`BertIterator`	`build()`
`BertIterator.Builder`	`featureArrays(BertIterator.FeatureArrays featureArrays)` Specify what arrays should be returned.
`BertIterator.Builder`	`lengthHandling(@NonNull BertIterator.LengthHandling lengthHandling, int maxLength)` Specifies how the sequence length of the output data should be handled.
`BertIterator.Builder`	`masker(BertSequenceMasker masker)` Used only for unsupervised training (i.e., when task is set to `BertIterator.Task.UNSUPERVISED` for learning a masked language model.
`BertIterator.Builder`	`maskToken(String maskToken)` Used only for unsupervised training (i.e., when task is set to `BertIterator.Task.UNSUPERVISED` for learning a masked language model.
`BertIterator.Builder`	`minibatchSize(int minibatchSize)` Minibatch size to use (number of examples to train on for each iteration) See also: `padMinibatches`
`BertIterator.Builder`	`padMinibatches(boolean padMinibatches)` Default: false (disabled) If the dataset is not an exact multiple of the minibatch size, should we pad the smaller final minibatch? For example, if we have 100 examples total, and 32 minibatch size, the following number of examples will be returned for subsequent calls of next() in the one epoch: padMinibatches = false (default): 32, 32, 32, 4. padMinibatches = true: 32, 32, 32, 32 (note: the last minibatch will have 4 real examples, and 28 masked out padding examples). Both options should result in exactly the same model.
`BertIterator.Builder`	`prependToken(String prependToken)` Prepend the specified token to the sequences, when doing supervised training. i.e., any token sequences will have this added at the start. Some BERT/Transformer models may need this - for example sequences starting with a "[CLS]" token. No token is prepended by default.
`BertIterator.Builder`	`preProcessor(org.nd4j.linalg.dataset.api.MultiDataSetPreProcessor preProcessor)` Set the preprocessor to be used on the MultiDataSets before returning them.
`BertIterator.Builder`	`sentencePairProvider(LabeledPairSentenceProvider sentencePairProvider)` Specify the source of the data for classification on sentence pairs.
`BertIterator.Builder`	`sentenceProvider(LabeledSentenceProvider sentenceProvider)` Specify the source of the data for classification.
`BertIterator.Builder`	`task(BertIterator.Task task)` Specify the `BertIterator.Task` the iterator should be set up for.
`BertIterator.Builder`	`tokenizer(TokenizerFactory tokenizerFactory)` Specify the TokenizerFactory to use.
`BertIterator.Builder`	`unsupervisedLabelFormat(BertIterator.UnsupervisedLabelFormat labelFormat)` Used only for unsupervised training (i.e., when task is set to `BertIterator.Task.UNSUPERVISED` for learning a masked language model.
`BertIterator.Builder`	`vocabMap(Map<String,Integer> vocabMap)` Provide the vocabulary as a map.

Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

- Field Detail
  - task
```
protected BertIterator.Task task
```
  - tokenizerFactory
```
protected TokenizerFactory tokenizerFactory
```
  - lengthHandling
```
protected BertIterator.LengthHandling lengthHandling
```
  - maxTokens
```
protected int maxTokens
```
  - minibatchSize
```
protected int minibatchSize
```
  - padMinibatches
```
protected boolean padMinibatches
```
  - preProcessor
```
protected org.nd4j.linalg.dataset.api.MultiDataSetPreProcessor preProcessor
```
  - sentenceProvider
```
protected LabeledSentenceProvider sentenceProvider
```
  - sentencePairProvider
```
protected LabeledPairSentenceProvider sentencePairProvider
```
  - featureArrays
```
protected BertIterator.FeatureArrays featureArrays
```
  - vocabMap
```
protected Map<String,Integer> vocabMap
```
  - masker
```
protected BertSequenceMasker masker
```
  - unsupervisedLabelFormat
```
protected BertIterator.UnsupervisedLabelFormat unsupervisedLabelFormat
```
  - maskToken
```
protected String maskToken
```
  - prependToken
```
protected String prependToken
```
  - appendToken
```
protected String appendToken
```
- Constructor Detail
  - Builder
```
public Builder()
```
- Method Detail
  - task
```
public BertIterator.Builder task(BertIterator.Task task)
```
    Specify the BertIterator.Task the iterator should be set up for. See BertIterator for more details.
  - tokenizer
```
public BertIterator.Builder tokenizer(TokenizerFactory tokenizerFactory)
```
    Specify the TokenizerFactory to use. For BERT, typically BertWordPieceTokenizerFactory is used
  - lengthHandling
```
public BertIterator.Builder lengthHandling(@NonNull
                                           @NonNull BertIterator.LengthHandling lengthHandling,
                                           int maxLength)
```
    Specifies how the sequence length of the output data should be handled. See BertIterator for more details.
    
    Parameters:
    
    lengthHandling - Length handling
    
    maxLength - Not used if LengthHandling is set to BertIterator.LengthHandling.ANY_LENGTH
    
    Returns:
  - minibatchSize
```
public BertIterator.Builder minibatchSize(int minibatchSize)
```
    Minibatch size to use (number of examples to train on for each iteration) See also: padMinibatches
    
    Parameters:
    
    minibatchSize - Minibatch size
  - padMinibatches
```
public BertIterator.Builder padMinibatches(boolean padMinibatches)
```
    Default: false (disabled)
    If the dataset is not an exact multiple of the minibatch size, should we pad the smaller final minibatch?
    For example, if we have 100 examples total, and 32 minibatch size, the following number of examples will be returned for subsequent calls of next() in the one epoch:
    padMinibatches = false (default): 32, 32, 32, 4.
    padMinibatches = true: 32, 32, 32, 32 (note: the last minibatch will have 4 real examples, and 28 masked out padding examples).
    Both options should result in exactly the same model. However, some BERT implementations may require exactly an exact number of examples in all minibatches to function.
  - preProcessor
```
public BertIterator.Builder preProcessor(org.nd4j.linalg.dataset.api.MultiDataSetPreProcessor preProcessor)
```
    Set the preprocessor to be used on the MultiDataSets before returning them. Default: none (null)
  - sentenceProvider
```
public BertIterator.Builder sentenceProvider(LabeledSentenceProvider sentenceProvider)
```
    Specify the source of the data for classification.
  - sentencePairProvider
```
public BertIterator.Builder sentencePairProvider(LabeledPairSentenceProvider sentencePairProvider)
```
    Specify the source of the data for classification on sentence pairs.
  - featureArrays
```
public BertIterator.Builder featureArrays(BertIterator.FeatureArrays featureArrays)
```
    Specify what arrays should be returned. See BertIterator for more details.
  - vocabMap
```
public BertIterator.Builder vocabMap(Map<String,Integer> vocabMap)
```
    Provide the vocabulary as a map. Keys are the words in the vocabulary, and values are the indices of those words. For indices, they should be in range 0 to vocabMap.size()-1 inclusive.
    If using BertWordPieceTokenizerFactory, this can be obtained using BertWordPieceTokenizerFactory.getVocab()
  - masker
```
public BertIterator.Builder masker(BertSequenceMasker masker)
```
    Used only for unsupervised training (i.e., when task is set to BertIterator.Task.UNSUPERVISED for learning a masked language model. This can be used to customize how the masking is performed.
    Default: BertMaskedLMMasker
  - unsupervisedLabelFormat
```
public BertIterator.Builder unsupervisedLabelFormat(BertIterator.UnsupervisedLabelFormat labelFormat)
```
    Used only for unsupervised training (i.e., when task is set to BertIterator.Task.UNSUPERVISED for learning a masked language model. Used to specify the format that the labels should be returned in. See BertIterator for more details.
  - maskToken
```
public BertIterator.Builder maskToken(String maskToken)
```
    Used only for unsupervised training (i.e., when task is set to BertIterator.Task.UNSUPERVISED for learning a masked language model. This specifies the token (such as "[MASK]") that should be used when a value is masked out. Note that this is passed to the BertSequenceMasker defined by masker(BertSequenceMasker) hence the exact behaviour will depend on what masker is used.
    Note that this must be in the vocabulary map set in vocabMap
  - prependToken
```
public BertIterator.Builder prependToken(String prependToken)
```
    Prepend the specified token to the sequences, when doing supervised training.
    i.e., any token sequences will have this added at the start.
    Some BERT/Transformer models may need this - for example sequences starting with a "[CLS]" token.
    No token is prepended by default.
    
    Parameters:
    
    prependToken - The token to start each sequence with (null: no token will be prepended)
  - appendToken
```
public BertIterator.Builder appendToken(String appendToken)
```
    Append the specified token to the sequences, when doing training on sentence pairs.
    Generally "[SEP]" is used No token in appended by default.
    
    Parameters:
    
    appendToken - Token at end of each sentence for pairs of sentences (null: no token will be appended)
    
    Returns:
  - build
```
public BertIterator build()
```

Class BertIterator.Builder

Field Summary

Constructor Summary

Method Summary

Methods inherited from class java.lang.Object

Field Detail

task

tokenizerFactory

lengthHandling

maxTokens

minibatchSize

padMinibatches

preProcessor

sentenceProvider

sentencePairProvider

featureArrays

vocabMap

masker

unsupervisedLabelFormat

maskToken

prependToken

appendToken

Constructor Detail

Builder

Method Detail

task

tokenizer

lengthHandling

minibatchSize

padMinibatches

preProcessor

sentenceProvider

sentencePairProvider

featureArrays

vocabMap

masker

unsupervisedLabelFormat

maskToken

prependToken

appendToken

build