Class FieldMatchMetrics

java.lang.Object
com.yahoo.searchlib.ranking.features.fieldmatch.FieldMatchMetrics
All Implemented Interfaces:
Cloneable

public final class FieldMatchMetrics extends Object implements Cloneable
The collection of metrics calculated by the string match metric calculator.
Author:
bratseth
  • Constructor Summary

    Constructors
    Constructor
    Description
     
  • Method Summary

    Modifier and Type
    Method
    Description
     
    float
    get(String name)
    Returns a metric by name
    float
    Returns a normalized measure of the number of occurrence of the terms of the query: sum over all query terms(min(number of occurrences of the term,maxOccurrences))/(query term count*100)
    float
    Returns the normalized proximity of the matched terms, weighted by the connectedness of the query terms.
    float
    Total completeness, where field completeness is more important: queryCompleteness * ( 1 - fieldCompletenessImportance) + fieldCompletenessImportance * fieldCompleteness
    float
    A normalized measure of how early the first segment occurs in this field: 1-head/(max(6,field.length)-1)
    float
    Returns the degree to which the query terms submitted matched exactly terms contained in the document.
    float
    The ratio of query tokens which was matched in the field: matches/fieldLength
    int
    Returns the summed size of all gaps within segments
    int
    Returns the total number of position jumps (backward or forward) within document segments
    int
    Returns the number of tokens in the field preceding the start of the first matched segment
    float
    Returns the average of significance and weight.
    int
    Returns the size of the longest matched continuous, in-order sequence in the document
    float
    Returns longestSequence/matches
    float
    A ready-to-use aggregate match score.
    int
    Returns the number of query terms which was matched in this field
    float
    Returns a normalized measure of the number of occurrence of the terms of the query.
    float
    Returns how well the order of the terms agreed in segments: 1-outOfOrder/pairs
    int
    Returns the total number of out of order token sequences within field segments
    int
    Returns the number of in-segment token pairs
    float
    Returns a value which is close to 1 when matched terms are close and close to zero when they are far apart in the segment.
    float
    The ratio of query tokens which was matched in the field: matches/queryLength
    float
    Returns the degree to which different terms are related (occurring in the same segment): 1-segments/(matches-1)
    float
    Returns the sum of the distance between all segments making up a match to the query, measured as the sum of the number of token positions separating the start of each field adjacent segment.
    float
    Returns the closeness of the segments in the field: 1-segmentDistance/fieldLength
    int
    Returns the number of field text segments which are needed to match the query as completely as possible
    Returns the segment start points
    float
    Returns the normalized term significance (1-frequency) of the terms of this match relative to the whole query: The sum of the significance of all matched terms/the sum of the significance of all query terms If all the query terms were matched, this is 1.
    float
    Returns a normalized measure of the number of occurrence of the terms of the query in absolute terms, or relative to the total content of the field, weighted by term significance.
    int
    Returns the number of tokens in the field following the end of the last matched segment
    float
    Returns the normalized proximity of the matched terms, not taking term connectedness into account.
    float
    Returns the normalized weight of this match relative to the whole query: The sum of the weights of all matched terms/the sum of the weights of all query terms If all the query terms were matched, this is 1.
    float
    Returns a normalized measure of the number of occurrence of the terms of the query, taking weights into account so that occurrences of higher weighted query terms has more impact than lower weighted terms.
    float
    Returns a normalized measure of the number of occurrence of the terms of the query, weighted by term weight.
    boolean
    Are these metrics representing a complete match
    void
    setComplete(boolean complete)
     
     
     
    Returns the trace of this computation.

    Methods inherited from class java.lang.Object

    equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait
  • Constructor Details

  • Method Details

    • isComplete

      public boolean isComplete()
      Are these metrics representing a complete match
    • setComplete

      public void setComplete(boolean complete)
    • getSegmentStarts

      public List<Integer> getSegmentStarts()
      Returns the segment start points
    • get

      public float get(String name)
      Returns a metric by name
      Throws:
      IllegalArgumentException - if the metric name (case sensitive) is not present
    • getOutOfOrder

      public int getOutOfOrder()
      Returns the total number of out of order token sequences within field segments
    • getSegments

      public int getSegments()
      Returns the number of field text segments which are needed to match the query as completely as possible
    • getGaps

      public int getGaps()
      Returns the total number of position jumps (backward or forward) within document segments
    • getGapLength

      public int getGapLength()
      Returns the summed size of all gaps within segments
    • getLongestSequence

      public int getLongestSequence()
      Returns the size of the longest matched continuous, in-order sequence in the document
    • getHead

      public int getHead()
      Returns the number of tokens in the field preceding the start of the first matched segment
    • getTail

      public int getTail()
      Returns the number of tokens in the field following the end of the last matched segment
    • getMatches

      public int getMatches()
      Returns the number of query terms which was matched in this field
    • getPairs

      public int getPairs()
      Returns the number of in-segment token pairs
    • getAbsoluteProximity

      public float getAbsoluteProximity()
      Returns the normalized proximity of the matched terms, weighted by the connectedness of the query terms. This number is 0.1 if all the matched terms are and have default or lower connectedness, close to 1 if they are following in sequence and have a high connectedness, and close to 0 if they are far from each other in the segment or out of order
    • getUnweightedProximity

      public float getUnweightedProximity()
      Returns the normalized proximity of the matched terms, not taking term connectedness into account. This number is close to 1 if all the matched terms are following each other in sequence, and close to 0 if they are far from each other or out of order
    • getSegmentDistance

      public float getSegmentDistance()
      Returns the sum of the distance between all segments making up a match to the query, measured as the sum of the number of token positions separating the start of each field adjacent segment.
    • getWeight

      public float getWeight()

      Returns the normalized weight of this match relative to the whole query: The sum of the weights of all matched terms/the sum of the weights of all query terms If all the query terms were matched, this is 1. If no terms were matched, or these matches has weight zero, this is 0.

      As the sum of this number over all the terms of the query is always 1, sums over all fields of normalized rank features for each field multiplied by this number for the same field will produce a normalized number.

      Note that this scales with the number of matched query terms in the field. If you want a component which does not, divide by matches.

    • getSignificance

      public float getSignificance()

      Returns the normalized term significance (1-frequency) of the terms of this match relative to the whole query: The sum of the significance of all matched terms/the sum of the significance of all query terms If all the query terms were matched, this is 1. If no terms were matched, or if the significance of all the matched terms is zero (they are present in all (possible) documents), this number is zero.

      As the sum of this number over all the terms of the query is always 1, sums over all fields of normalized rank features for each field multiplied by this number for the same field will produce a normalized number.

      Note that this scales with the number of matched query terms in the field. If you want a component which does not, divide by matches.

    • getOccurrence

      public float getOccurrence()

      Returns a normalized measure of the number of occurrence of the terms of the query. This number is 1 if there are many occurrences of the query terms in absolute terms, or relative to the total content of the field, and 0 if there are none.

      This is suitable for occurrence in fields containing regular text.

    • getAbsoluteOccurrence

      public float getAbsoluteOccurrence()

      Returns a normalized measure of the number of occurrence of the terms of the query: sum over all query terms(min(number of occurrences of the term,maxOccurrences))/(query term count*100)

      This number is 1 if there are many occurrences of the query terms, and 0 if there are none. This number does not take the actual length of the field into account, so it is suitable for uses of occurrence to denote importance across multiple terms.

    • getWeightedOccurrence

      public float getWeightedOccurrence()

      Returns a normalized measure of the number of occurrence of the terms of the query, weighted by term weight. This number is close to 1 if there are many occurrences of highly weighted query terms, in absolute terms, or relative to the total content of the field, and 0 if there are none.

    • getWeightedAbsoluteOccurrence

      public float getWeightedAbsoluteOccurrence()

      Returns a normalized measure of the number of occurrence of the terms of the query, taking weights into account so that occurrences of higher weighted query terms has more impact than lower weighted terms.

      This number is 1 if there are many occurrences of the highly weighted terms, and 0 if there are none. This number does not take the actual length of the field into account, so it is suitable for uses of occurrence to denote importance across multiple terms.

    • getSignificantOccurrence

      public float getSignificantOccurrence()

      Returns a normalized measure of the number of occurrence of the terms of the query in absolute terms, or relative to the total content of the field, weighted by term significance.

      This number is 1 if there are many occurrences of the highly significant terms, and 0 if there are none.

    • getExactness

      public float getExactness()

      Returns the degree to which the query terms submitted matched exactly terms contained in the document. This is 1 if all the terms matched exactly, and closer to 0 as more of the terms was matched only as stem forms.

      This is the query term weighted average of the exactness of each match, where the exactness of a match is the product of the exactness of the matching query term and the matching field term: sum over matching query terms(query term weight * query term exactness * field term exactness) / sum over matching query terms(query term weight)

    • getQueryCompleteness

      public float getQueryCompleteness()
      The ratio of query tokens which was matched in the field: matches/queryLength
    • getFieldCompleteness

      public float getFieldCompleteness()
      The ratio of query tokens which was matched in the field: matches/fieldLength
    • getCompleteness

      public float getCompleteness()
      Total completeness, where field completeness is more important: queryCompleteness * ( 1 - fieldCompletenessImportance) + fieldCompletenessImportance * fieldCompleteness
    • getOrderness

      public float getOrderness()
      Returns how well the order of the terms agreed in segments: 1-outOfOrder/pairs
    • getRelatedness

      public float getRelatedness()
      Returns the degree to which different terms are related (occurring in the same segment): 1-segments/(matches-1)
    • getLongestSequenceRatio

      public float getLongestSequenceRatio()
      Returns longestSequence/matches
    • getSegmentProximity

      public float getSegmentProximity()
      Returns the closeness of the segments in the field: 1-segmentDistance/fieldLength
    • getProximity

      public float getProximity()
      Returns a value which is close to 1 when matched terms are close and close to zero when they are far apart in the segment. Relatively more connected terms influence this value more. This is absoluteProximity/average connectedness.
    • getImportance

      public float getImportance()

      Returns the average of significance and weight.

      As the sum of this number over all the terms of the query is always 1, sums over all fields of normalized rank features for each field multiplied by this number for the same field will produce a normalized number.

      Note that this scales with the number of matched query terms in the field. If you want a component which does not, divide by matches.

    • getEarliness

      public float getEarliness()
      A normalized measure of how early the first segment occurs in this field: 1-head/(max(6,field.length)-1)
    • getMatch

      public float getMatch()

      A ready-to-use aggregate match score. Use this if you don't have time to find a better application specific aggregate score of the fine grained match metrics.

      The current formula is ( proximityCompletenessImportance * (1-relatednessImportance + relatednessImportance*relatedness) proximity * exactness * completeness^2 + earlinessImportance * earliness + segmentProximityImportance * segmentProximity ) / (proximityCompletenessImportance + earlinessImportance + relatednessImportance) but this is subject to change (i.e improvement) at any time.

      Weight and significance are not taken into account because this is meant to capture tha quality of the match in this field, while those measures relate this match to matches in other fields. This number can be multiplied with those values when combining with other field match scores.

    • clone

      public FieldMatchMetrics clone()
      Overrides:
      clone in class Object
    • toString

      public String toString()
      Overrides:
      toString in class Object
    • toStringDump

      public String toStringDump()
    • trace

      public Trace trace()
      Returns the trace of this computation. This is empty (never null) if tracing is off