Interface | Description |
---|---|
CollectionDocument |
An interface of input documents sent to collection readers
|
Class | Description |
---|---|
AbstractTermSuiteCollectionReader |
An abstract
CollectionException implementation for TermSuite that
recursively load all selected files from an input directory, with customizable file filter
and document text parser. |
AbstractToTxtSaxHandler |
Parses xml files from an xml corpus based on a list of two lists of tag names:
- the list of dropped tags (not interesting tags) : their contents are skipped
- the list of txt tags : tags whose content is kept in the out put txt file
|
EmptyCollectionReader | |
GenericXMLToTxtCollectionReader | |
IstexCollectionReader |
Reads collections from a ISTEX
|
IstexDocument | |
JsonCasConstants |
Created by smeoni on 27/05/16.
|
JsonCollectionReader |
Created by smeoni on 26/05/16.
|
QueueRegistry |
A registry for queues of
CollectionDocument , used in StreamingCollectionReader . |
StreamingCollectionReader | |
StringCollectionReader | |
StringPreparator | |
TeiCollectionReader | |
TeiToTxtSaxHandler |
Parses tei input files into a String where offsets are the same, but all tags replaced
with whitspaces.
|
TermSuiteJsonCasDeserializer |
Created by smeoni on 27/05/16.
|
TermSuiteJsonCasSerializer |
Created by smeoni on 27/05/16.
|
TxtCollectionReader | |
XmiCollectionReader |