Class NGramSearcher

  • All Implemented Interfaces:
    com.yahoo.component.Component, java.lang.Comparable<com.yahoo.component.Component>

    public class NGramSearcher
    extends Searcher
    Handles NGram indexes by splitting query terms to them into grams and combining summary field values from such fields into the original text.

    This declares it must be placed after Juniper searchers because it assumes Juniper token separators (which are returned on bolding) are not replaced by highlight tags when this is run (and "after" means "before" from the point of view of the result).

    Author:
    bratseth
    • Field Summary

      • Fields inherited from class com.yahoo.component.AbstractComponent

        isDeconstructable
    • Constructor Summary

      Constructors 
      Constructor Description
      NGramSearcher​(com.yahoo.language.Linguistics linguistics)  
    • Method Summary

      All Methods Instance Methods Concrete Methods 
      Modifier and Type Method Description
      protected CompositeItem createGramRoot​(Query query)
      Creates the root of the query subtree which will contain the grams to match, called by splitToGrams(com.yahoo.prelude.query.Item, java.lang.String, int, com.yahoo.search.Query).
      void fill​(Result result, java.lang.String summaryClass, Execution execution)
      Fill hit properties with data using the given summary class.
      protected com.yahoo.language.process.GramSplitter getGramSplitter()
      Returns the (thread-safe) object to use to split the query text into grams.
      Result search​(Query query, Execution execution)
      Override this to implement your searcher.
      protected Item splitToGrams​(Item term, java.lang.String text, int gramSize, Query query)
      Splits the given item into n-grams and adds them as a CompositeItem containing WordItems searching the index of the input term.
      • Methods inherited from class com.yahoo.component.chain.ChainedComponent

        getAnnotatedDependencies, getDefaultAnnotatedDependencies, getDependencies, initDependencies
      • Methods inherited from class com.yahoo.component.AbstractComponent

        clone, compareTo, deconstruct, getClassName, getId, getIdString, hasInitializedId, initId, isDeconstructable, setIsDeconstructable
      • Methods inherited from class java.lang.Object

        equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait
    • Constructor Detail

      • NGramSearcher

        public NGramSearcher​(com.yahoo.language.Linguistics linguistics)
    • Method Detail

      • search

        public Result search​(Query query,
                             Execution execution)
        Description copied from class: Searcher
        Override this to implement your searcher.

        Searcher implementation subclasses will, depending on their type of logic, do one of the following:

        • Query processors: Access the query, then call execution.search and return the result
        • Result processors: Call execution.search to get the result, access it and return
        • Sources (which produces results): Create a result, add the desired hits and return it.
        • Federators (which forwards the search to multiple subchains): Call search on the desired subchains in parallel and get the results. Combine the results to one and return it.
        • Workflows: Call execution.search as many times as desired, using different queries. Eventually return a result.

        Hits come in two kinds - concrete hits are actual content of the kind requested by the user, meta hits are hits which provides information about the collection of hits, on the query, the service and so on.

        The query specifies a window into a larger result list that must be returned from the searcher through hits and offset; Searchers which returns list of hits in the top level in the result must return at least hits number of hits (or if impossible; all that are available), starting at the given offset. In addition, searchers are allowed to return any number of meta hits (although this number is expected to be low). For hits contained in nested hit groups, the concept of a window defined by hits and offset is not well defined and does not apply.

        Error handling in searchers:

        • Unexpected events: Throw any RuntimeException. This query will fail with the exception message, and the error will be logged
        • Expected events: Create (new Result(Query, ErrorMessage) or add result.setErrorIfNoOtherErrors(ErrorMessage) an error message to the Result.
        • Recoverable user errors: Add a FeedbackHit explaining the condition and how to correct it.
        Specified by:
        search in class Searcher
        Parameters:
        query - the query
        Returns:
        the result of making this query
      • fill

        public void fill​(Result result,
                         java.lang.String summaryClass,
                         Execution execution)
        Description copied from class: Searcher
        Fill hit properties with data using the given summary class. Calling this on already filled results has no cost.

        This needs to be overridden by federating searchers to contact search sources again by propagating the fill call down through the search chain, and by source searchers which talks to fill capable backends to request the data to be filled. Other searchers do not need to override this.

        Overrides:
        fill in class Searcher
        Parameters:
        result - the result to fill
        summaryClass - the name of the collection of fields to fetch the values of, or null to use the default
      • splitToGrams

        protected Item splitToGrams​(Item term,
                                    java.lang.String text,
                                    int gramSize,
                                    Query query)
        Splits the given item into n-grams and adds them as a CompositeItem containing WordItems searching the index of the input term. If the result is a single gram, that single WordItem is returned rather than the AndItem
        Parameters:
        term - the term to split, must be an item which implement the IndexedItem and BlockItem "mixins"
        text - the text of the item, just stringValue() if the item is a TermItem
        gramSize - the gram size to split to
        query - the query in which this rewriting is done
        Returns:
        the root of the query subtree produced by this, containing the split items
      • getGramSplitter

        protected final com.yahoo.language.process.GramSplitter getGramSplitter()
        Returns the (thread-safe) object to use to split the query text into grams.