Matches described regex rules that come in tuples in a text file
Matches described regex rules that come in tuples in a text file
external resource to rules, needs 'delimiter' in options
input annotations columns currently used
Gets annotation column name going to generate
Gets annotation column name going to generate
Strategy for which to match the expressions (Default: "MATCH_ALL"
)
Input annotator type: DOCUMENT
Input annotator type: DOCUMENT
columns that contain annotations necessary to run this annotator AnnotatorType is used both as input and output columns if not specified
columns that contain annotations necessary to run this annotator AnnotatorType is used both as input and output columns if not specified
Input annotator type: CHUNK
Input annotator type: CHUNK
External dictionary to be used by the lemmatizer, which needs delimiter
set for parsing
the resource
External dictionary already in the form of ExternalResource, for which the Map member options
has "delimiter"
defined.
External dictionary already in the form of ExternalResource, for which the Map member options
has "delimiter"
defined.
val regexMatcher = new RegexMatcher() .setExternalRules(ExternalResource( "src/test/resources/regex-matcher/rules.txt", ReadAs.TEXT, Map("delimiter" -> ",") )) .setInputCols("sentence") .setOutputCol("regex") .setStrategy(strategy)
Overrides required annotators column if different than default
Overrides required annotators column if different than default
Overrides annotation column name when transforming
Overrides annotation column name when transforming
Strategy for which to match the expressions (Default: "MATCH_ALL"
)
Strategy for which to match the expressions (Default: "MATCH_ALL"
).
Strategy for which to match the expressions (Default: "MATCH_ALL"
).
Possible values are:
requirement for pipeline transformation validation.
requirement for pipeline transformation validation. It is called on fit()
internal element required for storing annotator to disk
internal element required for storing annotator to disk
takes a Dataset and checks to see if all the required annotation types are present.
takes a Dataset and checks to see if all the required annotation types are present.
to be validated
True if all the required types are present, else false
Required input and expected output annotator types
A list of (hyper-)parameter keys this annotator can take. Users can set and get the parameter values through setters and getters, respectively.
Uses a reference file to match a set of regular expressions and associate them with a provided identifier.
A dictionary of predefined regular expressions must be provided with
setExternalRules
. The dictionary can be set in either in the form of a delimited text file or directly as an ExternalResource.Pretrained pipelines are available for this module, see Pipelines.
For extended examples of usage, see the Spark NLP Workshop and the RegexMatcherTestSpec.
Example
In this example, the
rules.txt
has the form ofthe\s\w+, followed by 'the' ceremonies, ceremony
where each regex is separated by the identifier by
","