com.johnsnowlabs.nlp.annotators.btm
Whether to ignore case in index lookups (Default depends on model)
Whether to ignore case in index lookups (Default depends on model)
input annotations columns currently used
Gets annotation column name going to generate
Gets annotation column name going to generate
Input annotator Types: DOCUMENT, TOKEN
Input annotator Types: DOCUMENT, TOKEN
columns that contain annotations necessary to run this annotator AnnotatorType is used both as input and output columns if not specified
columns that contain annotations necessary to run this annotator AnnotatorType is used both as input and output columns if not specified
Whether to merge overlapping matched chunks (Default: false
)
Output annotator Types: CHUNK
Output annotator Types: CHUNK
Overrides required annotators column if different than default
Overrides required annotators column if different than default
Overrides annotation column name when transforming
Overrides annotation column name when transforming
Unique identifier for storage (Default: this.uid
)
Unique identifier for storage (Default: this.uid
)
The Tokenizer to perform tokenization with
requirement for pipeline transformation validation.
requirement for pipeline transformation validation. It is called on fit()
internal uid required to generate writable annotators
internal uid required to generate writable annotators
takes a Dataset and checks to see if all the required annotation types are present.
takes a Dataset and checks to see if all the required annotation types are present.
to be validated
True if all the required types are present, else false
Required input and expected output annotator types
A list of (hyper-)parameter keys this annotator can take. Users can set and get the parameter values through setters and getters, respectively.
Annotator to match exact phrases (by token) provided in a file against a Document.
A text file of predefined phrases must be provided with
setStoragePath
. The text file can als be set directly as an ExternalResource.In contrast to the normal
TextMatcher
, theBigTextMatcher
is designed for large corpora.For extended examples of usage, see the BigTextMatcherTestSpec.
Example
In this example, the entities file is of the form
where each line represents an entity phrase to be extracted.