Whether the TextMatcher should take the CHUNK from TOKEN or not (Default: false
)
Whether to match regardless of case (Default: true
)
Extracts entities from target dataset given in a text file
Extracts entities from target dataset given in a text file
External resource for the entities, e.g.
External resource for the entities, e.g. a text file where each line is the string of an entity
Value for the entity metadata field (Default: "entity"
)
Getter for buildFromTokens param
Whether to match regardless of case (Default: true
)
Getter for Value for the entity metadata field
input annotations columns currently used
Whether to merge overlapping matched chunks (Default: false
)
Gets annotation column name going to generate
Gets annotation column name going to generate
The Tokenizer to perform tokenization with
Output annotator type : DOCUMENT, TOKEN
Output annotator type : DOCUMENT, TOKEN
columns that contain annotations necessary to run this annotator AnnotatorType is used both as input and output columns if not specified
columns that contain annotations necessary to run this annotator AnnotatorType is used both as input and output columns if not specified
Whether to merge overlapping matched chunks (Default: false
)
Output annotator type : CHUNK
Output annotator type : CHUNK
Setter for buildFromTokens param
Whether to match regardless of case (Default: true
)
Provides a file with phrases to match.
Provides a file with phrases to match. Default: Looks up path in configuration.
a path to a file that contains the entities in the specified format.
the format of the file, can be one of {ReadAs.TEXT, ReadAs.SPARK}. Defaults to ReadAs.TEXT.
a map of additional parameters. Defaults to Map("format" -> "text")
.
this
Provides a file with phrases to match (Default: Looks up path in configuration)
Setter for Value for the entity metadata field
Overrides required annotators column if different than default
Overrides required annotators column if different than default
Whether to merge overlapping matched chunks (Default: false
)
Overrides annotation column name when transforming
Overrides annotation column name when transforming
The Tokenizer to perform tokenization with
The Tokenizer to perform tokenization with
requirement for pipeline transformation validation.
requirement for pipeline transformation validation. It is called on fit()
internal uid required to generate writable annotators
internal uid required to generate writable annotators
takes a Dataset and checks to see if all the required annotation types are present.
takes a Dataset and checks to see if all the required annotation types are present.
to be validated
True if all the required types are present, else false
Required input and expected output annotator types
A list of (hyper-)parameter keys this annotator can take. Users can set and get the parameter values through setters and getters, respectively.
Annotator to match exact phrases (by token) provided in a file against a Document.
A text file of predefined phrases must be provided with
setEntities
. The text file can als be set directly as an ExternalResource.For extended examples of usage, see the Spark NLP Workshop and the TextMatcherTestSpec.
Example
In this example, the entities file is of the form
where each line represents an entity phrase to be extracted.