Class DocumentProcessor

  • All Implemented Interfaces:
    com.yahoo.component.Component, java.lang.Comparable<com.yahoo.component.Component>
    Direct Known Subclasses:
    JoinerDocumentProcessor, SimpleDocumentProcessor, SplitterDocumentProcessor

    public abstract class DocumentProcessor
    extends com.yahoo.component.chain.ChainedComponent

    A document processor is a component which performs some operation on a document or document update. Document processors are asynchronous, they may request some data and then return. The processing framework is responsible for calling processors again at unspecified times until they are done processing the document or document update.

    Document processor instances are chained together by the framework to realize a complete processing pipeline. The processing chain is represented by the processor instances themselves, see getNext/setNext. Document processors may optionally control the routing through the chain by setting the next processor on ongoing processings.

    A processing may contain one or multiple documents or document updates. Document processors may optionally handle collections of processors in some other way than just processing each one in order.

    A document processor must have an empty constructor. When instantiated from Vespa config (as opposed to being instantiated programmatically in a stand-alone Docproc system), the framework is responsible for configuring the processor using setConfig(). If a document processor wants to do some initial setup after configuration has been set, but before it has begun processing documents or document updates, it should override initialize().

    Document processors must be thread safe. To ensure this, make sure that access to any mutable, thread-unsafe state held in a field by the processor is synchronized.

    Author:
    bratseth
    • Field Summary

      • Fields inherited from class com.yahoo.component.AbstractComponent

        isDeconstructable
    • Method Summary

      All Methods Instance Methods Abstract Methods Concrete Methods 
      Modifier and Type Method Description
      java.util.Map<java.lang.String,​java.lang.String> getDocMap​(java.lang.String docType)  
      java.util.Map<com.yahoo.collections.Pair<java.lang.String,​java.lang.String>,​java.lang.String> getFieldMap()
      Schema map for field names (doctype,from)→to
      abstract DocumentProcessor.Progress process​(Processing processing)
      Processes a processing, which can contain zero or more document bases.
      void setFieldMap​(java.util.Map<com.yahoo.collections.Pair<java.lang.String,​java.lang.String>,​java.lang.String> fieldMap)
      Sets the schema map for field names
      java.lang.String toString()  
      • Methods inherited from class com.yahoo.component.chain.ChainedComponent

        getAnnotatedDependencies, getDefaultAnnotatedDependencies, getDependencies, initDependencies
      • Methods inherited from class com.yahoo.component.AbstractComponent

        clone, compareTo, deconstruct, getClassName, getId, getIdString, hasInitializedId, initId, isDeconstructable, setIsDeconstructable
      • Methods inherited from class java.lang.Object

        equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait
    • Constructor Detail

      • DocumentProcessor

        public DocumentProcessor()
    • Method Detail

      • process

        public abstract DocumentProcessor.Progress process​(Processing processing)
        Processes a processing, which can contain zero or more document bases. The implementing document processor is free to modify, replace or delete elements in the list inside processing.
        Parameters:
        processing - the processing to process
        Returns:
        the outcome of this processing
      • toString

        public java.lang.String toString()
        Overrides:
        toString in class com.yahoo.component.AbstractComponent
      • setFieldMap

        public void setFieldMap​(java.util.Map<com.yahoo.collections.Pair<java.lang.String,​java.lang.String>,​java.lang.String> fieldMap)
        Sets the schema map for field names
      • getFieldMap

        public java.util.Map<com.yahoo.collections.Pair<java.lang.String,​java.lang.String>,​java.lang.String> getFieldMap()
        Schema map for field names (doctype,from)→to
      • getDocMap

        public java.util.Map<java.lang.String,​java.lang.String> getDocMap​(java.lang.String docType)