Generates all accepted lexical variations for this entity For example: "insulin receptor substrate 1" => "insulin receptor substrate-1" User: mihais Date: 10/20/16
Validates if the span identified as an entity is actually valid User: mihais Date: 10/24/16
Validates if the span identified as an entity is actually valid User: mihais Date: 10/24/16
User: mihais Date: 10/22/17
Fixes some common POS tagging mistakes in the bio domain (in place)
Fixes some common POS tagging mistakes in the bio domain (in place)
Note: this class is used by the CRF-based BioNER to cleanup its training data (from BioCreative 2), through org.clulab.processors.bionlp.BioNLPPOSTaggerPostProcessor. This means that every time there are changes here, the CRF should be retrained. Tell Mihai. User: mihais Date: 9/23/17
Processes tokenization so it suits bio analysis
Preprocesses bio text, including Unicode normalization, and removing figure and table references User: mihais Date: 9/10/17
Generates all accepted lexical variations for this entity For example: "insulin receptor substrate 1" => "insulin receptor substrate-1" User: mihais Date: 10/20/16