Column that contains string.
Column that contains string. Must be part of DOCUMENT
requirement for annotators copies
requirement for annotators copies
Override for additional custom schema checks
Override for additional custom schema checks
Whether to fail the job if a chunk is not found within document, return empty otherwise (Default: false
)
Column that contains string.
Column that contains string. Must be part of DOCUMENT
Whether to fail the job if a chunk is not found within document, return empty otherwise (Default: false
)
input annotations columns currently used
Whether the chunkCol is an array of strings (Default: false
)
Whether to lower case for matching case (Default: true
)
Gets annotation column name going to generate
Gets annotation column name going to generate
Column that has a reference of where the chunk begins
Whether start col is by whitespace tokens (Default: false
)
Input annotator types: DOCUMENT
Input annotator types: DOCUMENT
columns that contain annotations necessary to run this annotator AnnotatorType is used both as input and output columns if not specified
columns that contain annotations necessary to run this annotator AnnotatorType is used both as input and output columns if not specified
Whether the chunkCol is an array of strings (Default: false
)
Whether to lower case for matching case (Default: true
)
Output annotator types: CHUNK
Output annotator types: CHUNK
Column that contains string.
Column that contains string. Must be part of DOCUMENT
Whether to fail the job if a chunk is not found within document, return empty otherwise (Default: false
)
Overrides required annotators column if different than default
Overrides required annotators column if different than default
Whether the chunkCol is an array of strings (Default: false
)
Whether to lower case for matching case (Default: true
)
Overrides annotation column name when transforming
Overrides annotation column name when transforming
Column that has a reference of where the chunk begins
Whether start col is by whitespace tokens (Default: false
)
Column that has a reference of where the chunk begins
Whether start col is by whitespace tokens (Default: false
)
requirement for pipeline transformation validation.
requirement for pipeline transformation validation. It is called on fit()
required uid for storing annotator to disk
required uid for storing annotator to disk
takes a Dataset and checks to see if all the required annotation types are present.
takes a Dataset and checks to see if all the required annotation types are present.
to be validated
True if all the required types are present, else false
A list of (hyper-)parameter keys this annotator can take. Users can set and get the parameter values through setters and getters, respectively.
Required input and expected output annotator types
Converts
DOCUMENT
type annotations intoCHUNK
type with the contents of achunkCol
. Chunk text must be contained within inputDOCUMENT
. May be eitherStringType
orArrayType[StringType]
(using setIsArray). Useful for annotators that require a CHUNK type input.For more extended examples on document pre-processing see the Spark NLP Workshop.
Example
Chunk2Doc for converting
CHUNK
annotations toDOCUMENT