Class DocumentProcessor

  • All Implemented Interfaces:
    com.yahoo.component.Component, com.yahoo.component.Deconstructable, Comparable<com.yahoo.component.Component>
    Direct Known Subclasses:
    JoinerDocumentProcessor, SimpleDocumentProcessor, SplitterDocumentProcessor

    public abstract class DocumentProcessor
    extends com.yahoo.component.chain.ChainedComponent

    A document processor is a component which performs some operation on a document or document update. Document processors are asynchronous, they may request some data and then return. The processing framework is responsible for calling processors again at unspecified times until they are done processing the document or document update.

    Document processor instances are chained together by the framework to realize a complete processing pipeline. The processing chain is represented by the processor instances themselves, see getNext/setNext. Document processors may optionally control the routing through the chain by setting the next processor on ongoing processings.

    A processing may contain one or multiple documents or document updates. Document processors may optionally handle collections of processors in some other way than just processing each one in order.

    A document processor must have an empty constructor. When instantiated from Vespa config (as opposed to being instantiated programmatically in a stand-alone Docproc system), the framework is responsible for configuring the processor using setConfig(). If a document processor wants to do some initial setup after configuration has been set, but before it has begun processing documents or document updates, it should override initialize().

    Document processors must be thread safe. To ensure this, make sure that access to any mutable, thread-unsafe state held in a field by the processor is synchronized.

    Author:
    bratseth
    • Constructor Detail

      • DocumentProcessor

        public DocumentProcessor()
    • Method Detail

      • process

        public abstract DocumentProcessor.Progress process​(Processing processing)
        Processes a processing, which can contain zero or more document bases. The implementing document processor is free to modify, replace or delete elements in the list inside processing.
        Parameters:
        processing - the processing to process
        Returns:
        the outcome of this processing
      • setFieldMap

        public void setFieldMap​(Map<com.yahoo.collections.Pair<String,​String>,​String> fieldMap)
        Sets the schema map for field names
      • getFieldMap

        public Map<com.yahoo.collections.Pair<String,​String>,​String> getFieldMap()
        Schema map for field names (doctype,from)→to
      • toString

        public String toString()
        Overrides:
        toString in class com.yahoo.component.AbstractComponent