Reads the provided dataset file with given parameters and returns a DataFrame ready to for training a part-of-speech tagger.
Reads the provided dataset file with given parameters and returns a DataFrame ready to for training a part-of-speech tagger.
Current Spark sessions
Path to the resource
Delimiter used to separate word from their tag in the text
Name for the output column of the part-of-tags
Name for the DocumentAssembler column
Name for the column of the raw text
DataFrame of parsed text
Helper class for creating DataFrames for training a part-of-speech tagger.
The dataset needs to consist of sentences on each line, where each word is delimited with its respective tag:
The sentence can then be parsed with readDataset into a column with annotations of type
POS
.Example
In this example, the file
test-training.txt
has the content of the sentence above.