public class TermSuitePipeline
extends java.lang.Object
Modifier and Type | Method and Description |
---|---|
TermSuitePipeline |
addPipelineListener(PipelineListener pipelineListener)
Registers a pipeline listener.
|
TermSuitePipeline |
aeChineseTokenizer()
Tokenizer for chinese collections.
|
TermSuitePipeline |
aeCompostSplitter() |
TermSuitePipeline |
aeContextualizer(int scope,
boolean allTerms)
Computes the
Contextualizer vector of all
single-word terms in the term index. |
TermSuitePipeline |
aeDocumentLogger(long nbDocument) |
TermSuitePipeline |
aeExtensionDetector()
Detects all inclusion/extension relation between terms that have size >= 2.
|
TermSuitePipeline |
aeFixedExpressionSpotter()
Spots fixed expressions in the CAS an creates
FixedExpression
annotation whenever one is found. |
TermSuitePipeline |
aeFixedExpressionTermMarker()
Iterates over the
TermIndex and mark terms as
"fixed expressions" when their lemmas are found in the
FixedExpressionResource . |
TermSuitePipeline |
aeGraphicalVariantGatherer() |
TermSuitePipeline |
aeMateTaggerLemmatizer() |
TermSuitePipeline |
aeMaxSizeThresholdCleaner(TermProperty property,
int maxSize) |
TermSuitePipeline |
aeMerger()
Merges the variants (only those who are extensions of the base term)
of a terms by graphical variation.
|
TermSuitePipeline |
aePrefixSplitter()
Naive morphological analysis of prefix compounds based on a
prefix dictionary resource
|
TermSuitePipeline |
aePrimaryOccurrenceDetector(int detectionStrategy) |
TermSuitePipeline |
aeRanker(TermProperty property,
boolean desc)
|
TermSuitePipeline |
aeRegexSpotter()
The single-word and multi-word term spotter AE
base on UIMA Tokens Regex.
|
TermSuitePipeline |
aeScorer()
Transforms the
TermIndex into a flat one-n scored model. |
TermSuitePipeline |
aeSpecificityComputer()
Computes
TermProperty.WR values (and additional
term properties of type TermProperty in the future). |
TermSuitePipeline |
aeStemmer() |
TermSuitePipeline |
aeStopWordsFilter()
Removes from the term index any term having a
stop word at its boundaries.
|
TermSuitePipeline |
aeSuffixDerivationDetector() |
TermSuitePipeline |
aeSyntacticVariantGatherer()
Gathers terms according to their syntactic structures.
|
TermSuitePipeline |
aeTermClassifier(TermProperty sortingProperty) |
TermSuitePipeline |
aeTermOccAnnotationImporter()
An AE thats imports all
TermOccAnnotation in CAS to a TermIndex . |
TermSuitePipeline |
aeThresholdCleaner(TermProperty property,
float threshold) |
TermSuitePipeline |
aeThresholdCleaner(TermProperty property,
float threshold,
boolean isPeriodic,
int cleaningPeriod,
int termIndexSizeTrigger) |
TermSuitePipeline |
aeThresholdCleanerPeriodic(TermProperty property,
float threshold,
int cleaningPeriod) |
TermSuitePipeline |
aeThresholdCleanerSizeTrigger(TermProperty property,
float threshold,
int termIndexSizeTrigger) |
TermSuitePipeline |
aeTopNCleaner(TermProperty property,
int n) |
TermSuitePipeline |
aeTopNCleanerPeriodic(TermProperty property,
int n,
boolean isPeriodic,
int cleaningPeriod) |
TermSuitePipeline |
aeTreeTagger() |
TermSuitePipeline |
aeUrlFilter()
Filters out URLs from CAS.
|
TermSuitePipeline |
aeWordTokenizer() |
static TermSuitePipeline |
create(java.lang.String lang)
Starts a chaining
TermSuitePipeline builder. |
static TermSuitePipeline |
create(TermIndex termIndex) |
org.apache.uima.analysis_engine.AnalysisEngineDescription |
createDescription() |
TermSuitePipeline |
customAE(org.apache.uima.analysis_engine.AnalysisEngineDescription ae,
java.lang.String taskName)
Aggregates an AE to the TS pipeline.
|
TermSuitePipeline |
emptyCollection() |
TermSuitePipeline |
emptyTermIndex(java.lang.String name)
Creates a new in-memory
TermIndex on which this
piepline with run. |
TermSuitePipeline |
enableSyntacticLabels() |
java.lang.String |
getHistoryResourceName() |
java.lang.Thread |
getStreamThread() |
TermIndex |
getTermIndex()
Returns the term index produced (or last modified) by this pipeline.
|
TermSuitePipeline |
haeCasStatCounter(java.lang.String statName) |
TermSuitePipeline |
haeCompoundExporter(java.lang.String toFilePath)
Exports all compound words of the terminology to given file path.
|
TermSuitePipeline |
haeEval(java.lang.String refFileURI,
java.lang.String outputFile,
java.lang.String customLogHeader,
java.lang.String rFile,
java.lang.String evalTraceName,
boolean rtlWithVariants) |
TermSuitePipeline |
haeEvalExporter(java.lang.String toFilePath,
boolean withVariants) |
TermSuitePipeline |
haeExportVariationRuleExamples(java.lang.String toFilePath)
Exports examples of matching pairs for each variation rule.
|
TermSuitePipeline |
haeJsonCasExporter(java.lang.String toDirectoryPath) |
TermSuitePipeline |
haeJsonExporter(java.lang.String toFilePath) |
TermSuitePipeline |
haeLogOverlappingRules() |
TermSuitePipeline |
haeSpotterTSVWriter(java.lang.String toDirectoryPath)
Export all CAS in TSV format to a given directory.
|
TermSuitePipeline |
haeTbxExporter(java.lang.String toFilePath) |
TermSuitePipeline |
haeTermsuiteJsonCasExporter(java.lang.String toDirectoryPath)
Exports all CAS as JSON files to a given directory.
|
TermSuitePipeline |
haeTraceTimePerf(java.lang.String toFile)
Exports time progress to TSV file.
|
TermSuitePipeline |
haeTsvExporter(java.lang.String toFilePath)
Exports the
TermIndex in tsv format |
TermSuitePipeline |
haeVariantEvalExporter(java.lang.String toFilePath,
int topN,
int maxVariantsPerTerm)
Creates a tsv output with :
- the occurrence list of each term and theirs in-text contexts
|
TermSuitePipeline |
haeVariationExporter(java.lang.String toFilePath,
VariationType... vTypes) |
TermSuitePipeline |
haeXmiCasExporter(java.lang.String toDirectoryPath)
Exports all CAS as XMI files to a given directory.
|
TermSuitePipeline |
linkMongoStore()
Configures the
JsonExporterAE to not embed the occurrences
in the json file, but to link the mongodb occurrence store instead. |
org.apache.uima.resource.ExternalResourceDescription |
resHistory() |
org.apache.uima.resource.ExternalResourceDescription |
resObserver() |
org.apache.uima.resource.ExternalResourceDescription |
resSyntacticVariantRules() |
org.apache.uima.resource.ExternalResourceDescription |
resTermIndex() |
TermSuitePipeline |
run()
Runs the pipeline with
SimplePipeline on the CollectionReader that must have been defined. |
TermSuitePipeline |
run(org.apache.uima.jcas.JCas cas)
Runs the pipeline with
SimplePipeline without requiring a CollectionReader
to be defined. |
TermSuitePipeline |
setAddSpottedAnnoToTermIndex(boolean addToTermIndex)
Configures
RegexSpotter . |
TermSuitePipeline |
setCollection(TermSuiteCollection termSuiteCollection,
java.lang.String collectionPath,
java.lang.String collectionEncoding)
Creates a collection reader for this pipeline.
|
TermSuitePipeline |
setCollection(TermSuiteCollection termSuiteCollection,
java.lang.String collectionPath,
java.lang.String collectionEncoding,
java.lang.String droppedTags,
java.lang.String txtTags)
Creates a collection reader of type
GenericXMLToTxtCollectionReader for this pipeline. |
TermSuitePipeline |
setCompostCoeffs(float alpha,
float beta,
float gamma,
float delta) |
TermSuitePipeline |
setCompostMaxComponentNum(int compostMaxComponentNum) |
TermSuitePipeline |
setCompostMinComponentSize(int compostMinComponentSize) |
TermSuitePipeline |
setCompostScoreThreshold(float compostScoreThreshold) |
TermSuitePipeline |
setCompostSegmentSimilarityThreshold(float compostSegmentSimilarityThreshold) |
TermSuitePipeline |
setContextAssocRateMeasure(java.lang.String contextAssocRateMeasure) |
TermSuitePipeline |
setContextualizeCoTermsType(OccurrenceType contextualizeCoTermsType) |
TermSuitePipeline |
setContextualizeWithCoOccurrenceFrequencyThreshhold(int contextualizeWithCoOccurrenceFrequencyThreshhold) |
TermSuitePipeline |
setContextualizeWithTermClasses(boolean contextualizeWithTermClasses) |
TermSuitePipeline |
setExportJsonWithContext(boolean b) |
TermSuitePipeline |
setExportJsonWithOccurrences(boolean exportJsonWithOccurrences) |
TermSuitePipeline |
setGraphicalVariantSimilarityThreshold(float th) |
TermSuitePipeline |
setHistory(TermHistory history) |
TermSuitePipeline |
setInlineString(java.lang.String text) |
TermSuitePipeline |
setIstexCollection(java.lang.String apiURL,
java.util.List<java.lang.String> documentsIds) |
TermSuitePipeline |
setKeepVariantsWhileCleaning(boolean keepVariantsWhileCleaning) |
TermSuitePipeline |
setMateModelPath(java.lang.String path) |
TermSuitePipeline |
setMongoDBOccurrenceStore(java.lang.String mongoDBUri)
Stores occurrences to MongoDB
|
TermSuitePipeline |
setPostProcessingStrategy(java.lang.String postProcessingStrategy)
Sets the post processing strategy for
RegexSpotter analysis engine |
TermSuitePipeline |
setResourceDir(java.lang.String resourceDir)
Invoke this method if TermSuite resources are accessible via
a "file:/path/to/res/" url, i.e. they can be found locally.
|
TermSuitePipeline |
setResourceJar(java.lang.String resourceJar) |
TermSuitePipeline |
setResourceUrlPrefix(java.lang.String urlPrefix) |
TermSuitePipeline |
setSpotWithOccurrences(boolean activate)
Deprecated.
Use TermSuitePipeline#setOccurrenceStoreMode instead.
|
TermSuitePipeline |
setTermIndex(TermIndex termIndex)
Sets the term index on which this pipeline will run.
|
TermSuitePipeline |
setTreeTaggerHome(java.lang.String treeTaggerPath) |
TermSuitePipeline |
setTsvExportProperties(TermProperty... properties)
Defines the term properties that appear in tsv export file
|
TermSuitePipeline |
setTsvShowHeaders(boolean tsvWithHeaders)
Configures tsvExporter to (not) show headers on the
first line.
|
TermSuitePipeline |
setTsvShowScores(boolean tsvWithVariantScores)
Configures tsvExporter to (not) show variant scores with the
"V" label
|
DocumentStream |
stream(CasConsumer consumer) |
TermSuitePipeline |
watch(java.lang.String... termKeys) |
public static TermSuitePipeline create(java.lang.String lang)
TermSuitePipeline
builder.lang
- Thepublic static TermSuitePipeline create(TermIndex termIndex)
public TermSuitePipeline run()
SimplePipeline
on the CollectionReader
that must have been defined.TermSuitePipelineException
- if no CollectionReader
has been declared on this pipelinepublic DocumentStream stream(CasConsumer consumer)
public java.lang.Thread getStreamThread()
public TermSuitePipeline addPipelineListener(PipelineListener pipelineListener)
pipelineListener
- TermSuitePipeline
builder objectpublic TermSuitePipeline run(org.apache.uima.jcas.JCas cas)
SimplePipeline
without requiring a CollectionReader
to be defined.cas
- the JCas
on which the pipeline operates.TermSuitePipeline
builder objectpublic TermSuitePipeline setInlineString(java.lang.String text)
public TermSuitePipeline setIstexCollection(java.lang.String apiURL, java.util.List<java.lang.String> documentsIds)
public TermSuitePipeline setCollection(TermSuiteCollection termSuiteCollection, java.lang.String collectionPath, java.lang.String collectionEncoding)
termSuiteCollection
- collectionPath
- collectionEncoding
- TermSuitePipeline
builder objectpublic TermSuitePipeline setCollection(TermSuiteCollection termSuiteCollection, java.lang.String collectionPath, java.lang.String collectionEncoding, java.lang.String droppedTags, java.lang.String txtTags)
GenericXMLToTxtCollectionReader
for this pipeline.
Requires a list of dropped tags and txt tags for collection parsing.termSuiteCollection
- collectionPath
- collectionEncoding
- droppedTags
- txtTags
- TermSuitePipeline
builder objectAbstractToTxtSaxHandler
public TermSuitePipeline setResourceDir(java.lang.String resourceDir)
resourceDir
- public TermSuitePipeline setResourceJar(java.lang.String resourceJar)
public TermSuitePipeline setResourceUrlPrefix(java.lang.String urlPrefix)
public TermSuitePipeline setContextAssocRateMeasure(java.lang.String contextAssocRateMeasure)
public TermSuitePipeline emptyCollection()
public org.apache.uima.analysis_engine.AnalysisEngineDescription createDescription()
public TermSuitePipeline setHistory(TermHistory history)
public TermSuitePipeline watch(java.lang.String... termKeys)
public java.lang.String getHistoryResourceName()
public TermSuitePipeline aeWordTokenizer()
public TermSuitePipeline aeTreeTagger()
public TermSuitePipeline setMateModelPath(java.lang.String path)
public TermSuitePipeline aeMateTaggerLemmatizer()
public TermSuitePipeline setTsvExportProperties(TermProperty... properties)
properties
- TermSuitePipeline
builder objecthaeTsvExporter(String)
public TermSuitePipeline haeTsvExporter(java.lang.String toFilePath)
TermIndex
in tsv formattoFilePath
- TermSuitePipeline
builder objectsetTsvExportProperties(TermProperty...)
public TermSuitePipeline haeExportVariationRuleExamples(java.lang.String toFilePath)
toFilePath
- the file path where to write the examples for each variation rulesTermSuitePipeline
builder objectpublic TermSuitePipeline haeCompoundExporter(java.lang.String toFilePath)
toFilePath
- TermSuitePipeline
builder objectpublic TermSuitePipeline haeVariationExporter(java.lang.String toFilePath, VariationType... vTypes)
public TermSuitePipeline haeTbxExporter(java.lang.String toFilePath)
public TermSuitePipeline haeEvalExporter(java.lang.String toFilePath, boolean withVariants)
public TermSuitePipeline setExportJsonWithOccurrences(boolean exportJsonWithOccurrences)
public TermSuitePipeline setExportJsonWithContext(boolean b)
public TermSuitePipeline haeJsonExporter(java.lang.String toFilePath)
public TermSuitePipeline haeVariantEvalExporter(java.lang.String toFilePath, int topN, int maxVariantsPerTerm)
toFilePath
- The output file pathtopN
- The number of variants to keep in the filemaxVariantsPerTerm
- The maximum number of variants to eval for each termTermSuitePipeline
builder objectpublic TermSuitePipeline aeStemmer()
public TermSuitePipeline aeFixedExpressionTermMarker()
TermIndex
and mark terms as
"fixed expressions" when their lemmas are found in the
FixedExpressionResource
.TermSuitePipeline
builder objectpublic TermSuitePipeline aeFixedExpressionSpotter()
FixedExpression
annotation whenever one is found.TermSuitePipeline
builder objectpublic TermSuitePipeline aeRegexSpotter()
TermSuitePipeline
builder objectpublic TermSuitePipeline aeTermOccAnnotationImporter()
TermOccAnnotation
in CAS to a TermIndex
.TermSuitePipeline
builder objectpublic TermSuitePipeline aePrefixSplitter()
TermSuitePipeline
builder objectpublic TermSuitePipeline aeSuffixDerivationDetector()
public TermSuitePipeline aeStopWordsFilter()
TermSuitePipeline
builder objectTermIndexBlacklistWordFilterAE
public TermSuitePipeline haeXmiCasExporter(java.lang.String toDirectoryPath)
toDirectoryPath
- TermSuitePipeline
builder objectpublic TermSuitePipeline haeTermsuiteJsonCasExporter(java.lang.String toDirectoryPath)
toDirectoryPath
- TermSuitePipeline
builder objectpublic TermSuitePipeline haeSpotterTSVWriter(java.lang.String toDirectoryPath)
toDirectoryPath
- TermSuitePipeline
builder objectSpotterTSVWriter
public TermSuitePipeline aeDocumentLogger(long nbDocument)
public TermSuitePipeline aeChineseTokenizer()
TermSuitePipeline
builder objectChineseSegmenter
public org.apache.uima.resource.ExternalResourceDescription resTermIndex()
public org.apache.uima.resource.ExternalResourceDescription resObserver()
public org.apache.uima.resource.ExternalResourceDescription resHistory()
public org.apache.uima.resource.ExternalResourceDescription resSyntacticVariantRules()
public TermIndex getTermIndex()
public TermSuitePipeline setTermIndex(TermIndex termIndex)
termIndex
- TermSuitePipeline
builder objectpublic TermSuitePipeline emptyTermIndex(java.lang.String name)
TermIndex
on which this
piepline with run.name
- the name of the new term indexTermSuitePipeline
builder objectpublic TermSuitePipeline aeSpecificityComputer()
TermProperty.WR
values (and additional
term properties of type TermProperty
in the future).TermSuitePipeline
builder objectTermSpecificityComputer
,
TermProperty
public TermSuitePipeline setContextualizeCoTermsType(OccurrenceType contextualizeCoTermsType)
public TermSuitePipeline setContextualizeWithTermClasses(boolean contextualizeWithTermClasses)
public TermSuitePipeline setContextualizeWithCoOccurrenceFrequencyThreshhold(int contextualizeWithCoOccurrenceFrequencyThreshhold)
public TermSuitePipeline aeContextualizer(int scope, boolean allTerms)
Contextualizer
vector of all
single-word terms in the term index.scope
- allTerms
- TermSuitePipeline
builder objectContextualizer
public TermSuitePipeline aeMaxSizeThresholdCleaner(TermProperty property, int maxSize)
public TermSuitePipeline aeThresholdCleaner(TermProperty property, float threshold, boolean isPeriodic, int cleaningPeriod, int termIndexSizeTrigger)
public TermSuitePipeline aePrimaryOccurrenceDetector(int detectionStrategy)
public TermSuitePipeline aeThresholdCleanerPeriodic(TermProperty property, float threshold, int cleaningPeriod)
property
- threshold
- cleaningPeriod
- TermSuitePipeline
builder objectpublic TermSuitePipeline aeThresholdCleanerSizeTrigger(TermProperty property, float threshold, int termIndexSizeTrigger)
public TermSuitePipeline setKeepVariantsWhileCleaning(boolean keepVariantsWhileCleaning)
public TermSuitePipeline aeThresholdCleaner(TermProperty property, float threshold)
public TermSuitePipeline aeTopNCleaner(TermProperty property, int n)
public TermSuitePipeline aeTopNCleanerPeriodic(TermProperty property, int n, boolean isPeriodic, int cleaningPeriod)
property
- n
- isPeriodic
- cleaningPeriod
- TermSuitePipeline
builder objectpublic TermSuitePipeline setGraphicalVariantSimilarityThreshold(float th)
public TermSuitePipeline aeGraphicalVariantGatherer()
public TermSuitePipeline aeUrlFilter()
TermSuitePipeline
builder objectpublic TermSuitePipeline aeSyntacticVariantGatherer()
TermSuitePipeline
builder objectpublic TermSuitePipeline aeExtensionDetector()
TermSuitePipeline
builder objectpublic TermSuitePipeline aeScorer()
TermIndex
into a flat one-n scored model.TermSuitePipeline
builder objectpublic TermSuitePipeline aeMerger()
TermSuitePipeline
builder objectpublic TermSuitePipeline aeRanker(TermProperty property, boolean desc)
property
- desc
- public TermSuitePipeline setTreeTaggerHome(java.lang.String treeTaggerPath)
public TermSuitePipeline haeLogOverlappingRules()
public TermSuitePipeline enableSyntacticLabels()
public TermSuitePipeline setCompostCoeffs(float alpha, float beta, float gamma, float delta)
public TermSuitePipeline setCompostMaxComponentNum(int compostMaxComponentNum)
public TermSuitePipeline setCompostMinComponentSize(int compostMinComponentSize)
public TermSuitePipeline setCompostScoreThreshold(float compostScoreThreshold)
public TermSuitePipeline setCompostSegmentSimilarityThreshold(float compostSegmentSimilarityThreshold)
public TermSuitePipeline aeCompostSplitter()
public TermSuitePipeline haeCasStatCounter(java.lang.String statName)
public TermSuitePipeline haeTraceTimePerf(java.lang.String toFile)
WordAnnotation
processedtoFile
- TermSuitePipeline
builder objectpublic TermSuitePipeline aeTermClassifier(TermProperty sortingProperty)
sortingProperty
- the term property used to order terms before they are classified.
The first term of a class appearing given this order will be considered
as the head of the class.TermSuitePipeline
builder objectTermClassifier
public TermSuitePipeline haeEval(java.lang.String refFileURI, java.lang.String outputFile, java.lang.String customLogHeader, java.lang.String rFile, java.lang.String evalTraceName, boolean rtlWithVariants)
refFileURI
- The path to reference terminooutputFile
- The path to output log filecustomLogHeader
- A custom string to add in the header of the output log filerFile
- The path to output r fileevalTraceName
- The name of the eval tracertlWithVariants
- true if variants of the reference termino should be kept during the evalTermSuitePipeline
builder objectpublic TermSuitePipeline setMongoDBOccurrenceStore(java.lang.String mongoDBUri)
mongoDBUri
- the mongo db connection uriTermSuitePipeline
builder object@Deprecated public TermSuitePipeline setSpotWithOccurrences(boolean activate)
activate
- TermSuitePipeline
builder objectpublic TermSuitePipeline setAddSpottedAnnoToTermIndex(boolean addToTermIndex)
addToTermIndex
- the value of the parameterTermSuitePipeline
builder objectaeRegexSpotter()
public TermSuitePipeline setPostProcessingStrategy(java.lang.String postProcessingStrategy)
RegexSpotter
analysis enginepostProcessingStrategy
- TermSuitePipeline
builder objectaeRegexSpotter()
,
OccurrenceBuffer.NO_CLEANING
,
OccurrenceBuffer.KEEP_PREFIXES
,
OccurrenceBuffer.KEEP_SUFFIXES
public TermSuitePipeline setTsvShowHeaders(boolean tsvWithHeaders)
tsvWithHeaders
- the flagTermSuitePipeline
builder objectpublic TermSuitePipeline setTsvShowScores(boolean tsvWithVariantScores)
tsvWithVariantScores
- the flagTermSuitePipeline
builder objectpublic TermSuitePipeline haeJsonCasExporter(java.lang.String toDirectoryPath)
public TermSuitePipeline linkMongoStore()
JsonExporterAE
to not embed the occurrences
in the json file, but to link the mongodb occurrence store instead.TermSuitePipeline
builder objecthaeJsonExporter(String)
public TermSuitePipeline customAE(org.apache.uima.analysis_engine.AnalysisEngineDescription ae, java.lang.String taskName)
ae
- the ae description of the added pipeline.taskName
- a user-readable name for the AE task (intended to
be displayed in progress views)TermSuitePipeline
builder object