Encode the input sequence to indexes IDs adding padding where necessary
Word-level and span-level alignment with Tokenizer https://github.com/google-research/bert#tokenization
Word-level and span-level alignment with Tokenizer https://github.com/google-research/bert#tokenization
### Input orig_tokens = ["John", "Johanson", "'s", "house"] labels = ["NNP", "NNP", "POS", "NN"]
# bert_tokens == ["[CLS]", "john", "johan", "##son", "'", "s", "house", "[SEP]"] # orig_to_tok_map == [1, 2, 4, 6]