Package org.opencms.search.documents

Handles indexing different sorts of document and resource type from the OpenCms VFS for the full text search.

See:
          Description

Interface Summary
I_CmsDocumentFactory Used to create index Lucene Documents for OpenCms resources, controls the text extraction algorithm used for a specific OpenCms resource type / MIME type combination.
I_CmsSearchExtractor Defines a text extractor for the integrated search engine.
I_CmsTermHighlighter Highlights arbitrary terms, used for generation of search excerpts.
 

Class Summary
A_CmsVfsDocument Base document factory class for a VFS CmsResource, just requires a specialized implementation of I_CmsSearchExtractor.extractContent(CmsObject, CmsResource, CmsSearchIndex) for text extraction from the binary document content.
CmsDocumentContainerPage Lucene document factory class to extract index data from a resource of type CmsResourceTypeContainerPage.
CmsDocumentGeneric Lucene document factory class for indexing data from a generic CmsResource.
CmsDocumentHtml Lucene document factory class to extract index data from a cms resource containing plain html data.
CmsDocumentMsOfficeOLE2 Lucene document factory class to extract text data from a VFS resource that is an OLE 2 MS Office document.
CmsDocumentMsOfficeOOXML Lucene document factory class to extract text data from a VFS resource that is an OOXML MS Office document.
CmsDocumentOpenOffice Lucene document factory class to extract index data from a cms resource containing Open Document Format data.
CmsDocumentPdf Lucene document factory class to extract index data from a cms resource containing Adobe pdf data.
CmsDocumentPlainText Lucene document factory class to extract index data from a cms resource containing plain text data.
CmsDocumentRtf Lucene document factory class to extract index data from a cms resource containing RTF data.
CmsDocumentXmlContent Lucene document factory class to extract index data from an OpenCms VFS resource of type CmsResourceTypeXmlContent.
CmsDocumentXmlPage Lucene document factory class to extract index data from a cms resource of type CmsResourceTypeXmlPage.
CmsExtractionResultCache Implements a disk cache that stores text extraction results in the RFS.
CmsTermHighlighterHtml Default highlighter implementation used for generation of search excerpts.
 

Exception Summary
CmsIndexNoContentException Signals an error during content extraction of an empty document.
 

Package org.opencms.search.documents Description

Handles indexing different sorts of document and resource type from the OpenCms VFS for the full text search.

Since:
6.0.0