public class TermSuitePipeline
extends java.lang.Object
Modifier and Type | Method and Description |
---|---|
TermSuitePipeline |
addPipelineListener(PipelineListener pipelineListener)
Registers a pipeline listener.
|
TermSuitePipeline |
aeChineseTokenizer()
Tokenizer for chinese collections.
|
TermSuitePipeline |
aeCompostSplitter() |
TermSuitePipeline |
aeCompoundSplitter()
Deprecated.
Use
aeCompostSplitter() instead |
TermSuitePipeline |
aeContextualizer(int scope,
boolean allTerms)
Computes the
Contextualizer vector of all
single-word terms in the term index. |
TermSuitePipeline |
aeExtensionDetector()
Detects all inclusion/extension relation between terms that have size >= 2.
|
TermSuitePipeline |
aeGraphicalVariantGatherer() |
TermSuitePipeline |
aeMateTaggerLemmatizer() |
TermSuitePipeline |
aeMaxSizeThresholdCleaner(TermProperty property,
int maxSize) |
TermSuitePipeline |
aeMerger()
Merges the variants (only those who are extensions of the base term)
of a terms by graphical variation.
|
TermSuitePipeline |
aeNeoClassicalSplitter()
Deprecated.
Use
aeCompostSplitter() instead |
TermSuitePipeline |
aePrefixSplitter()
Deprecated.
Use
aeCompostSplitter() instead |
TermSuitePipeline |
aePrimaryOccurrenceDetector(int detectionStrategy) |
TermSuitePipeline |
aeRanker(TermProperty property,
boolean desc)
|
TermSuitePipeline |
aeRegexSpotter()
The single-word and multi-word term spotter AE
base on UIMA Tokens Regex.
|
TermSuitePipeline |
aeScorer()
Transforms the
TermIndex into a flat one-n scored model. |
TermSuitePipeline |
aeSpecificityComputer()
Computes
TermProperty.WR values (and additional
term properties of type TermProperty in the future). |
TermSuitePipeline |
aeStemmer() |
TermSuitePipeline |
aeStopWordsFilter()
Removes from the term index any term having a
stop word at its boundaries.
|
TermSuitePipeline |
aeSyntacticVariantGatherer()
Gathers terms according to their syntactic structures.
|
TermSuitePipeline |
aeTermClassifier(TermProperty sortingProperty) |
TermSuitePipeline |
aeThresholdCleaner(TermProperty property,
float threshold) |
TermSuitePipeline |
aeThresholdCleaner(TermProperty property,
float threshold,
boolean isPeriodic,
int cleaningPeriod,
int termIndexSizeTrigger) |
TermSuitePipeline |
aeThresholdCleanerPeriodic(TermProperty property,
float threshold,
int cleaningPeriod) |
TermSuitePipeline |
aeThresholdCleanerSizeTrigger(TermProperty property,
float threshold,
int termIndexSizeTrigger) |
TermSuitePipeline |
aeTopNCleaner(TermProperty property,
int n) |
TermSuitePipeline |
aeTopNCleanerPeriodic(TermProperty property,
int n,
boolean isPeriodic,
int cleaningPeriod) |
TermSuitePipeline |
aeTreeTagger() |
TermSuitePipeline |
aeUrlFilter()
Filters out URLs from CAS.
|
TermSuitePipeline |
aeWordTokenizer() |
static TermSuitePipeline |
create(java.lang.String lang) |
static TermSuitePipeline |
create(java.lang.String lang,
java.lang.String urlPrefix)
Starts a chaining
TermSuitePipeline builder and overrides the default
URL prefix (file:). |
static TermSuitePipeline |
create(TermIndex termIndex,
java.lang.String urlPrefix) |
org.apache.uima.analysis_engine.AnalysisEngineDescription |
createDescription() |
TermSuitePipeline |
emptyCollection() |
TermSuitePipeline |
emptyTermIndex(java.lang.String name)
Creates a new in-memory
TermIndex on which this
piepline with run. |
TermSuitePipeline |
enableSyntacticLabels() |
java.lang.Thread |
getStreamThread() |
TermIndex |
getTermIndex()
Returns the term index produced (or last modified) by this pipeline.
|
TermSuitePipeline |
haeCasStatCounter(java.lang.String statName) |
TermSuitePipeline |
haeCompoundExporter(java.lang.String toFilePath)
Exports all compound words of the terminology to given file path.
|
TermSuitePipeline |
haeEval(java.lang.String refFileURI,
java.lang.String outputFile,
java.lang.String customLogHeader,
java.lang.String rFile,
java.lang.String evalTraceName,
boolean rtlWithVariants) |
TermSuitePipeline |
haeEvalExporter(java.lang.String toFilePath,
boolean withVariants) |
TermSuitePipeline |
haeExportVariationRuleExamples(java.lang.String toFilePath)
Exports examples of matching pairs for each variation rule.
|
TermSuitePipeline |
haeJsonCasExporter(java.lang.String toDirectoryPath) |
TermSuitePipeline |
haeJsonExporter(java.lang.String toFilePath) |
TermSuitePipeline |
haeLogOverlappingRules() |
TermSuitePipeline |
haeSpotterTSVWriter(java.lang.String toDirectoryPath)
Export all CAS in TSV format to a given directory.
|
TermSuitePipeline |
haeTbxExporter(java.lang.String toFilePath) |
TermSuitePipeline |
haeTraceTimePerf(java.lang.String toFile)
Exports time progress to TSV file.
|
TermSuitePipeline |
haeTsvExporter(java.lang.String toFilePath)
Exports the
TermIndex in tsv format |
TermSuitePipeline |
haeVariantEvalExporter(java.lang.String toFilePath,
int topN,
int maxVariantsPerTerm)
Creates a tsv output with :
- the occurrence list of each term and theirs in-text contexts
|
TermSuitePipeline |
haeXmiCasExporter(java.lang.String toDirectoryPath)
Exports all CAS as XMI files to a given directory.
|
TermSuitePipeline |
linkMongoStore()
Configures the
JsonExporter to not embed the occurrences
in the json file, but to link the mongodb occurrence store instead. |
org.apache.uima.resource.ExternalResourceDescription |
resTermIndex() |
TermSuitePipeline |
run()
Runs the pipeline with
SimplePipeline on the CollectionReader that must have been defined. |
TermSuitePipeline |
run(org.apache.uima.jcas.JCas cas)
Runs the pipeline with
SimplePipeline without requiring a CollectionReader
to be defined. |
TermSuitePipeline |
setAddSpottedAnnoToTermIndex(boolean addToTermIndex)
Configures
RegexSpotter . |
TermSuitePipeline |
setCollection(TermSuiteCollection termSuiteCollection,
java.lang.String collectionPath,
java.lang.String collectionEncoding)
Creates a collection reader for this pipeline.
|
TermSuitePipeline |
setCollection(TermSuiteCollection termSuiteCollection,
java.lang.String collectionPath,
java.lang.String collectionEncoding,
java.lang.String droppedTags,
java.lang.String txtTags)
Creates a collection reader of type
GenericXMLToTxtCollectionReader for this pipeline. |
TermSuitePipeline |
setCompostCoeffs(float alpha,
float beta,
float gamma,
float delta) |
TermSuitePipeline |
setCompostMaxComponentNum(int compostMaxComponentNum) |
TermSuitePipeline |
setCompostMinComponentSize(int compostMinComponentSize) |
TermSuitePipeline |
setCompostScoreThreshold(float compostScoreThreshold) |
TermSuitePipeline |
setCompostSegmentSimilarityThreshold(java.lang.Object compostSegmentSimilarityThreshold) |
TermSuitePipeline |
setContextAssocRateMeasure(java.lang.String contextAssocRateMeasure) |
TermSuitePipeline |
setContextualizeCoTermsType(OccurrenceType contextualizeCoTermsType) |
TermSuitePipeline |
setContextualizeWithCoOccurrenceFrequencyThreshhold(int contextualizeWithCoOccurrenceFrequencyThreshhold) |
TermSuitePipeline |
setContextualizeWithTermClasses(boolean contextualizeWithTermClasses) |
TermSuitePipeline |
setExportFilteringRule(java.lang.String exportFilteringRule) |
TermSuitePipeline |
setExportFilteringThreshold(float exportFilteringThreshold) |
TermSuitePipeline |
setExportJsonWithContext(boolean b) |
TermSuitePipeline |
setExportJsonWithOccurrences(boolean exportJsonWithOccurrences) |
TermSuitePipeline |
setGraphicalVariantSimilarityThreshold(float th) |
TermSuitePipeline |
setInlineString(java.lang.String text) |
TermSuitePipeline |
setKeepVariantsWhileCleaning(boolean keepVariantsWhileCleaning) |
TermSuitePipeline |
setMateModelPath(java.lang.String path) |
TermSuitePipeline |
setMongoDBOccurrenceStore(java.lang.String mongoDBUri)
Stores occurrences to MongoDB
|
TermSuitePipeline |
setPostProcessingStrategy(java.lang.String postProcessingStrategy)
Sets the post processing strategy for
RegexSpotter analysis engine |
TermSuitePipeline |
setResourcePath(java.lang.String resourcePath) |
TermSuitePipeline |
setSpotWithOccurrences(boolean activate)
Deprecated.
Use TermSuitePipeline#setOccurrenceStoreMode instead.
|
TermSuitePipeline |
setSyntacticRegexesFilePath(java.lang.String syntacticRegexesFilePath)
Deprecated.
Overrides ressources directly
|
TermSuitePipeline |
setTermIndex(TermIndex termIndex)
Sets the term index on which this pipeline will run.
|
TermSuitePipeline |
setTreeTaggerHome(java.lang.String treeTaggerPath) |
TermSuitePipeline |
setTsvExportProperties(TermProperty... properties)
Defines the term properties that appear in tsv export file
|
TermSuitePipeline |
setTsvShowHeaders(boolean tsvWithHeaders)
Configures tsvExporter to (not) show headers on the
first line.
|
TermSuitePipeline |
setTsvShowScores(boolean tsvWithVariantScores)
Configures tsvExporter to (not) show variant scores with the
"V" label
|
TermSuitePipeline |
setYamlVariantRulesFilePath(java.lang.String yamlVariantRulesFilePath)
Deprecated.
|
DocumentStream |
stream(CasConsumer consumer) |
public static TermSuitePipeline create(java.lang.String lang)
public static TermSuitePipeline create(java.lang.String lang, java.lang.String urlPrefix)
TermSuitePipeline
builder and overrides the default
URL
prefix (file:).lang
- TheurlPrefix
- The URL
prefix to use for accessing TermSuite resourcesTermSuiteResourceHelper.TermSuiteResourceHelper(Lang, String)
public static TermSuitePipeline create(TermIndex termIndex, java.lang.String urlPrefix)
public TermSuitePipeline run()
SimplePipeline
on the CollectionReader
that must have been defined.TermSuitePipelineException
- if no CollectionReader
has been declared on this pipelinepublic DocumentStream stream(CasConsumer consumer)
public java.lang.Thread getStreamThread()
public TermSuitePipeline addPipelineListener(PipelineListener pipelineListener)
pipelineListener
- TermSuitePipeline
builder objectpublic TermSuitePipeline run(org.apache.uima.jcas.JCas cas)
SimplePipeline
without requiring a CollectionReader
to be defined.cas
- the JCas
on which the pipeline operates.TermSuitePipeline
builder objectpublic TermSuitePipeline setInlineString(java.lang.String text)
public TermSuitePipeline setCollection(TermSuiteCollection termSuiteCollection, java.lang.String collectionPath, java.lang.String collectionEncoding)
termSuiteCollection
- collectionPath
- collectionEncoding
- TermSuitePipeline
builder objectpublic TermSuitePipeline setCollection(TermSuiteCollection termSuiteCollection, java.lang.String collectionPath, java.lang.String collectionEncoding, java.lang.String droppedTags, java.lang.String txtTags)
GenericXMLToTxtCollectionReader
for this pipeline.
Requires a list of dropped tags and txt tags for collection parsing.termSuiteCollection
- collectionPath
- collectionEncoding
- droppedTags
- txtTags
- TermSuitePipeline
builder objectAbstractToTxtSaxHandler
public TermSuitePipeline setResourcePath(java.lang.String resourcePath)
public TermSuitePipeline setContextAssocRateMeasure(java.lang.String contextAssocRateMeasure)
public TermSuitePipeline emptyCollection()
public org.apache.uima.analysis_engine.AnalysisEngineDescription createDescription()
public TermSuitePipeline aeWordTokenizer()
public TermSuitePipeline aeTreeTagger()
public TermSuitePipeline setMateModelPath(java.lang.String path)
public TermSuitePipeline aeMateTaggerLemmatizer()
public TermSuitePipeline setTsvExportProperties(TermProperty... properties)
properties
- TermSuitePipeline
builder objecthaeTsvExporter(String)
public TermSuitePipeline haeTsvExporter(java.lang.String toFilePath)
TermIndex
in tsv formattoFilePath
- TermSuitePipeline
builder objectsetTsvExportProperties(TermProperty...)
public TermSuitePipeline haeExportVariationRuleExamples(java.lang.String toFilePath)
toFilePath
- the file path where to write the examples for each variation rulesTermSuitePipeline
builder objectpublic TermSuitePipeline haeCompoundExporter(java.lang.String toFilePath)
toFilePath
- TermSuitePipeline
builder objectpublic TermSuitePipeline haeTbxExporter(java.lang.String toFilePath)
public TermSuitePipeline haeEvalExporter(java.lang.String toFilePath, boolean withVariants)
public TermSuitePipeline setExportJsonWithOccurrences(boolean exportJsonWithOccurrences)
public TermSuitePipeline setExportJsonWithContext(boolean b)
public TermSuitePipeline haeJsonExporter(java.lang.String toFilePath)
public TermSuitePipeline haeVariantEvalExporter(java.lang.String toFilePath, int topN, int maxVariantsPerTerm)
toFilePath
- The output file pathtopN
- The number of variants to keep in the filemaxVariantsPerTerm
- The maximum number of variants to eval for each termTermSuitePipeline
builder objectpublic TermSuitePipeline aeStemmer()
public TermSuitePipeline aeRegexSpotter()
TermSuitePipeline
builder objectpublic TermSuitePipeline aeCompoundSplitter()
aeCompostSplitter()
insteadTermSuitePipeline
builder objectpublic TermSuitePipeline aeNeoClassicalSplitter()
aeCompostSplitter()
insteadTermSuitePipeline
builder objectpublic TermSuitePipeline aePrefixSplitter()
aeCompostSplitter()
insteadTermSuitePipeline
builder objectpublic TermSuitePipeline aeStopWordsFilter()
TermSuitePipeline
builder objectTermIndexBlacklistWordFilterAE
public TermSuitePipeline haeXmiCasExporter(java.lang.String toDirectoryPath)
toDirectoryPath
- TermSuitePipeline
builder objectpublic TermSuitePipeline haeSpotterTSVWriter(java.lang.String toDirectoryPath)
toDirectoryPath
- TermSuitePipeline
builder objectSpotterTSVWriter
public TermSuitePipeline aeChineseTokenizer()
TermSuitePipeline
builder objectChineseSegmenter
public org.apache.uima.resource.ExternalResourceDescription resTermIndex()
public TermIndex getTermIndex()
public TermSuitePipeline setTermIndex(TermIndex termIndex)
termIndex
- TermSuitePipeline
builder objectpublic TermSuitePipeline emptyTermIndex(java.lang.String name)
TermIndex
on which this
piepline with run.name
- the name of the new term indexTermSuitePipeline
builder objectpublic TermSuitePipeline aeSpecificityComputer()
TermProperty.WR
values (and additional
term properties of type TermProperty
in the future).TermSuitePipeline
builder objectTermSpecificityComputer
,
TermProperty
public TermSuitePipeline setContextualizeCoTermsType(OccurrenceType contextualizeCoTermsType)
public TermSuitePipeline setContextualizeWithTermClasses(boolean contextualizeWithTermClasses)
public TermSuitePipeline setContextualizeWithCoOccurrenceFrequencyThreshhold(int contextualizeWithCoOccurrenceFrequencyThreshhold)
public TermSuitePipeline aeContextualizer(int scope, boolean allTerms)
Contextualizer
vector of all
single-word terms in the term index.scope
- allTerms
- TermSuitePipeline
builder objectContextualizer
public TermSuitePipeline aeMaxSizeThresholdCleaner(TermProperty property, int maxSize)
public TermSuitePipeline aeThresholdCleaner(TermProperty property, float threshold, boolean isPeriodic, int cleaningPeriod, int termIndexSizeTrigger)
public TermSuitePipeline aePrimaryOccurrenceDetector(int detectionStrategy)
public TermSuitePipeline aeThresholdCleanerPeriodic(TermProperty property, float threshold, int cleaningPeriod)
property
- threshold
- cleaningPeriod
- TermSuitePipeline
builder objectpublic TermSuitePipeline aeThresholdCleanerSizeTrigger(TermProperty property, float threshold, int termIndexSizeTrigger)
public TermSuitePipeline setKeepVariantsWhileCleaning(boolean keepVariantsWhileCleaning)
public TermSuitePipeline aeThresholdCleaner(TermProperty property, float threshold)
public TermSuitePipeline aeTopNCleaner(TermProperty property, int n)
public TermSuitePipeline aeTopNCleanerPeriodic(TermProperty property, int n, boolean isPeriodic, int cleaningPeriod)
property
- n
- isPeriodic
- cleaningPeriod
- TermSuitePipeline
builder objectpublic TermSuitePipeline setGraphicalVariantSimilarityThreshold(float th)
public TermSuitePipeline aeGraphicalVariantGatherer()
public TermSuitePipeline aeUrlFilter()
TermSuitePipeline
builder objectpublic TermSuitePipeline aeSyntacticVariantGatherer()
TermSuitePipeline
builder objectpublic TermSuitePipeline aeExtensionDetector()
TermSuitePipeline
builder objectpublic TermSuitePipeline aeScorer()
TermIndex
into a flat one-n scored model.TermSuitePipeline
builder objectpublic TermSuitePipeline aeMerger()
TermSuitePipeline
builder objectpublic TermSuitePipeline aeRanker(TermProperty property, boolean desc)
property
- desc
- public TermSuitePipeline setExportFilteringRule(java.lang.String exportFilteringRule)
public TermSuitePipeline setExportFilteringThreshold(float exportFilteringThreshold)
public TermSuitePipeline setTreeTaggerHome(java.lang.String treeTaggerPath)
@Deprecated public TermSuitePipeline setSyntacticRegexesFilePath(java.lang.String syntacticRegexesFilePath)
syntacticRegexesFilePath
- public TermSuitePipeline haeLogOverlappingRules()
public TermSuitePipeline enableSyntacticLabels()
@Deprecated public TermSuitePipeline setYamlVariantRulesFilePath(java.lang.String yamlVariantRulesFilePath)
yamlVariantRulesFilePath
- public TermSuitePipeline setCompostCoeffs(float alpha, float beta, float gamma, float delta)
public TermSuitePipeline setCompostMaxComponentNum(int compostMaxComponentNum)
public TermSuitePipeline setCompostMinComponentSize(int compostMinComponentSize)
public TermSuitePipeline setCompostScoreThreshold(float compostScoreThreshold)
public TermSuitePipeline setCompostSegmentSimilarityThreshold(java.lang.Object compostSegmentSimilarityThreshold)
public TermSuitePipeline aeCompostSplitter()
public TermSuitePipeline haeCasStatCounter(java.lang.String statName)
public TermSuitePipeline haeTraceTimePerf(java.lang.String toFile)
WordAnnotation
processedtoFile
- TermSuitePipeline
builder objectpublic TermSuitePipeline aeTermClassifier(TermProperty sortingProperty)
sortingProperty
- the term property used to order terms before they are classified.
The first term of a class appearing given this order will be considered
as the head of the class.TermSuitePipeline
builder objectTermClassifier
public TermSuitePipeline haeEval(java.lang.String refFileURI, java.lang.String outputFile, java.lang.String customLogHeader, java.lang.String rFile, java.lang.String evalTraceName, boolean rtlWithVariants)
refFileURI
- The path to reference terminooutputFile
- The path to output log filecustomLogHeader
- A custom string to add in the header of the output log filerFile
- The path to output r fileevalTraceName
- The name of the eval tracertlWithVariants
- true if variants of the reference termino should be kept during the evalTermSuitePipeline
builder objectpublic TermSuitePipeline setMongoDBOccurrenceStore(java.lang.String mongoDBUri)
mongoDBUri
- the mongo db connection uriTermSuitePipeline
builder object@Deprecated public TermSuitePipeline setSpotWithOccurrences(boolean activate)
activate
- TermSuitePipeline
builder objectpublic TermSuitePipeline setAddSpottedAnnoToTermIndex(boolean addToTermIndex)
addToTermIndex
- the value of the parameterTermSuitePipeline
builder objectaeRegexSpotter()
public TermSuitePipeline setPostProcessingStrategy(java.lang.String postProcessingStrategy)
RegexSpotter
analysis enginepostProcessingStrategy
- TermSuitePipeline
builder objectaeRegexSpotter()
,
OccurrenceBuffer.NO_CLEANING
,
OccurrenceBuffer.KEEP_PREFIXES
,
OccurrenceBuffer.KEEP_SUFFIXES
public TermSuitePipeline setTsvShowHeaders(boolean tsvWithHeaders)
tsvWithHeaders
- the flagTermSuitePipeline
builder objectpublic TermSuitePipeline setTsvShowScores(boolean tsvWithVariantScores)
tsvWithVariantScores
- the flagTermSuitePipeline
builder objectpublic TermSuitePipeline haeJsonCasExporter(java.lang.String toDirectoryPath)
public TermSuitePipeline linkMongoStore()
JsonExporter
to not embed the occurrences
in the json file, but to link the mongodb occurrence store instead.TermSuitePipeline
builder objecthaeJsonExporter(String)