Class DocumentProcessor

java.lang.Object
com.yahoo.component.AbstractComponent
com.yahoo.component.chain.ChainedComponent
com.yahoo.docproc.DocumentProcessor
All Implemented Interfaces:
com.yahoo.component.Component, com.yahoo.component.Deconstructable, Comparable<com.yahoo.component.Component>
Direct Known Subclasses:
SimpleDocumentProcessor

public abstract class DocumentProcessor extends com.yahoo.component.chain.ChainedComponent

A document processor is a component which performs some operation on a document or document update. Document processors are asynchronous, they may request some data and then return. The processing framework is responsible for calling processors again at unspecified times until they are done processing the document or document update.

Document processor instances are chained together by the framework to realize a complete processing pipeline. The processing chain is represented by the processor instances themselves, see getNext/setNext. Document processors may optionally control the routing through the chain by setting the next processor on ongoing processings.

A processing may contain one or multiple documents or document updates. Document processors may optionally handle collections of processors in some other way than just processing each one in order.

A document processor must have an empty constructor. When instantiated from Vespa config (as opposed to being instantiated programmatically in a stand-alone Docproc system), the framework is responsible for configuring the processor using setConfig(). If a document processor wants to do some initial setup after configuration has been set, but before it has begun processing documents or document updates, it should override initialize().

Document processors must be thread safe. To ensure this, make sure that access to any mutable, thread-unsafe state held in a field by the processor is synchronized.

Author:
bratseth
  • Nested Class Summary

    Nested Classes
    Modifier and Type
    Class
    Description
    static final class 
     
    static class 
    An enumeration of possible results of calling a process method
  • Field Summary

    Fields inherited from class com.yahoo.component.AbstractComponent

    isDeconstructable
  • Constructor Summary

    Constructors
    Constructor
    Description
     
  • Method Summary

    Modifier and Type
    Method
    Description
    getDocMap(String docType)
     
    Map<com.yahoo.collections.Pair<String,String>,String>
    Schema map for field names (doctype,from)→to
    process(Processing processing)
    Processes a processing, which can contain zero or more document bases.
    void
    setFieldMap(Map<com.yahoo.collections.Pair<String,String>,String> fieldMap)
    Sets the schema map for field names
     

    Methods inherited from class com.yahoo.component.chain.ChainedComponent

    getAnnotatedDependencies, getDefaultAnnotatedDependencies, getDependencies, initDependencies

    Methods inherited from class com.yahoo.component.AbstractComponent

    clone, compareTo, deconstruct, getClassName, getId, getIdString, hasInitializedId, initId, isDeconstructable, setIsDeconstructable

    Methods inherited from class java.lang.Object

    equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait
  • Constructor Details

    • DocumentProcessor

      public DocumentProcessor()
  • Method Details

    • process

      public abstract DocumentProcessor.Progress process(Processing processing)
      Processes a processing, which can contain zero or more document bases. The implementing document processor is free to modify, replace or delete elements in the list inside processing.
      Parameters:
      processing - the processing to process
      Returns:
      the outcome of this processing
    • setFieldMap

      public void setFieldMap(Map<com.yahoo.collections.Pair<String,String>,String> fieldMap)
      Sets the schema map for field names
    • getFieldMap

      public Map<com.yahoo.collections.Pair<String,String>,String> getFieldMap()
      Schema map for field names (doctype,from)→to
    • getDocMap

      public Map<String,String> getDocMap(String docType)
    • toString

      public String toString()
      Overrides:
      toString in class com.yahoo.component.AbstractComponent