Class BertMaskedLMMasker

    • Field Detail

      • DEFAULT_MASK_TOKEN_PROB

        public static final double DEFAULT_MASK_TOKEN_PROB
        See Also:
        Constant Field Values
      • DEFAULT_RANDOM_WORD_PROB

        public static final double DEFAULT_RANDOM_WORD_PROB
        See Also:
        Constant Field Values
      • maskProb

        protected final double maskProb
      • maskTokenProb

        protected final double maskTokenProb
      • randomTokenProb

        protected final double randomTokenProb
    • Constructor Detail

      • BertMaskedLMMasker

        public BertMaskedLMMasker()
        Create a BertMaskedLMMasker with all default probabilities
      • BertMaskedLMMasker

        public BertMaskedLMMasker​(Random r,
                                  double maskProb,
                                  double maskTokenProb,
                                  double randomTokenProb)
        See: BertMaskedLMMasker for details.
        Parameters:
        r - Random number generator
        maskProb - Probability of masking each token
        maskTokenProb - Probability of replacing a selected token with the mask token
        randomTokenProb - Probability of replacing a selected token with a random token
    • Method Detail

      • maskSequence

        public org.nd4j.common.primitives.Pair<List<String>,​boolean[]> maskSequence​(List<String> input,
                                                                                          String maskToken,
                                                                                          List<String> vocabWords)
        Specified by:
        maskSequence in interface BertSequenceMasker
        Parameters:
        input - Input sequence of tokens
        maskToken - Token to use for masking - usually something like "[MASK]"
        vocabWords - Vocabulary, as a list
        Returns:
        Pair: The new input tokens (after masking out), along with a boolean[] for whether the token is masked or not (same length as number of tokens). boolean[i] is true if token i was masked.