BertIterator.Builder |
BertIterator.Builder.appendToken(String appendToken) |
Append the specified token to the sequences, when doing training on sentence pairs.
Generally "[SEP]" is used
No token in appended by default.
|
static BertIterator.Builder |
BertIterator.builder() |
|
BertIterator.Builder |
BertIterator.Builder.featureArrays(BertIterator.FeatureArrays featureArrays) |
Specify what arrays should be returned.
|
BertIterator.Builder |
BertIterator.Builder.lengthHandling(@NonNull BertIterator.LengthHandling lengthHandling,
int maxLength) |
Specifies how the sequence length of the output data should be handled.
|
BertIterator.Builder |
BertIterator.Builder.masker(BertSequenceMasker masker) |
|
BertIterator.Builder |
BertIterator.Builder.maskToken(String maskToken) |
|
BertIterator.Builder |
BertIterator.Builder.minibatchSize(int minibatchSize) |
Minibatch size to use (number of examples to train on for each iteration)
See also: padMinibatches
|
BertIterator.Builder |
BertIterator.Builder.padMinibatches(boolean padMinibatches) |
Default: false (disabled)
If the dataset is not an exact multiple of the minibatch size, should we pad the smaller final minibatch?
For example, if we have 100 examples total, and 32 minibatch size, the following number of examples will be returned
for subsequent calls of next() in the one epoch:
padMinibatches = false (default): 32, 32, 32, 4.
padMinibatches = true: 32, 32, 32, 32 (note: the last minibatch will have 4 real examples, and 28 masked out padding examples).
Both options should result in exactly the same model.
|
BertIterator.Builder |
BertIterator.Builder.prependToken(String prependToken) |
Prepend the specified token to the sequences, when doing supervised training.
i.e., any token sequences will have this added at the start.
Some BERT/Transformer models may need this - for example sequences starting with a "[CLS]" token.
No token is prepended by default.
|
BertIterator.Builder |
BertIterator.Builder.preProcessor(org.nd4j.linalg.dataset.api.MultiDataSetPreProcessor preProcessor) |
Set the preprocessor to be used on the MultiDataSets before returning them.
|
BertIterator.Builder |
BertIterator.Builder.sentencePairProvider(LabeledPairSentenceProvider sentencePairProvider) |
Specify the source of the data for classification on sentence pairs.
|
BertIterator.Builder |
BertIterator.Builder.sentenceProvider(LabeledSentenceProvider sentenceProvider) |
Specify the source of the data for classification.
|
BertIterator.Builder |
BertIterator.Builder.task(BertIterator.Task task) |
|
BertIterator.Builder |
BertIterator.Builder.tokenizer(TokenizerFactory tokenizerFactory) |
Specify the TokenizerFactory to use.
|
BertIterator.Builder |
BertIterator.Builder.unsupervisedLabelFormat(BertIterator.UnsupervisedLabelFormat labelFormat) |
|
BertIterator.Builder |
BertIterator.Builder.vocabMap(Map<String,Integer> vocabMap) |
Provide the vocabulary as a map.
|