Package | Description |
---|---|
org.deeplearning4j.iterator |
Modifier and Type | Method and Description |
---|---|
BertIterator.Builder |
BertIterator.Builder.appendToken(String appendToken)
Append the specified token to the sequences, when doing training on sentence pairs.
Generally "[SEP]" is used No token in appended by default. |
static BertIterator.Builder |
BertIterator.builder() |
BertIterator.Builder |
BertIterator.Builder.featureArrays(BertIterator.FeatureArrays featureArrays)
Specify what arrays should be returned.
|
BertIterator.Builder |
BertIterator.Builder.lengthHandling(@NonNull BertIterator.LengthHandling lengthHandling,
int maxLength)
Specifies how the sequence length of the output data should be handled.
|
BertIterator.Builder |
BertIterator.Builder.masker(BertSequenceMasker masker)
Used only for unsupervised training (i.e., when task is set to
BertIterator.Task.UNSUPERVISED for learning a
masked language model. |
BertIterator.Builder |
BertIterator.Builder.maskToken(String maskToken)
Used only for unsupervised training (i.e., when task is set to
BertIterator.Task.UNSUPERVISED for learning a
masked language model. |
BertIterator.Builder |
BertIterator.Builder.minibatchSize(int minibatchSize)
Minibatch size to use (number of examples to train on for each iteration)
See also:
padMinibatches |
BertIterator.Builder |
BertIterator.Builder.padMinibatches(boolean padMinibatches)
Default: false (disabled)
If the dataset is not an exact multiple of the minibatch size, should we pad the smaller final minibatch? For example, if we have 100 examples total, and 32 minibatch size, the following number of examples will be returned for subsequent calls of next() in the one epoch: padMinibatches = false (default): 32, 32, 32, 4. padMinibatches = true: 32, 32, 32, 32 (note: the last minibatch will have 4 real examples, and 28 masked out padding examples). Both options should result in exactly the same model. |
BertIterator.Builder |
BertIterator.Builder.prependToken(String prependToken)
Prepend the specified token to the sequences, when doing supervised training.
i.e., any token sequences will have this added at the start. Some BERT/Transformer models may need this - for example sequences starting with a "[CLS]" token. No token is prepended by default. |
BertIterator.Builder |
BertIterator.Builder.preProcessor(org.nd4j.linalg.dataset.api.MultiDataSetPreProcessor preProcessor)
Set the preprocessor to be used on the MultiDataSets before returning them.
|
BertIterator.Builder |
BertIterator.Builder.sentencePairProvider(LabeledPairSentenceProvider sentencePairProvider)
Specify the source of the data for classification on sentence pairs.
|
BertIterator.Builder |
BertIterator.Builder.sentenceProvider(LabeledSentenceProvider sentenceProvider)
Specify the source of the data for classification.
|
BertIterator.Builder |
BertIterator.Builder.task(BertIterator.Task task)
Specify the
BertIterator.Task the iterator should be set up for. |
BertIterator.Builder |
BertIterator.Builder.tokenizer(TokenizerFactory tokenizerFactory)
Specify the TokenizerFactory to use.
|
BertIterator.Builder |
BertIterator.Builder.unsupervisedLabelFormat(BertIterator.UnsupervisedLabelFormat labelFormat)
Used only for unsupervised training (i.e., when task is set to
BertIterator.Task.UNSUPERVISED for learning a
masked language model. |
BertIterator.Builder |
BertIterator.Builder.vocabMap(Map<String,Integer> vocabMap)
Provide the vocabulary as a map.
|
Constructor and Description |
---|
BertIterator(BertIterator.Builder b) |
Copyright © 2022. All rights reserved.