Class FieldMatchMetrics

  • All Implemented Interfaces:
    java.lang.Cloneable

    public final class FieldMatchMetrics
    extends java.lang.Object
    implements java.lang.Cloneable
    The collection of metrics calculated by the string match metric calculator.
    Author:
    bratseth
    • Method Summary

      All Methods Instance Methods Concrete Methods 
      Modifier and Type Method Description
      FieldMatchMetrics clone()  
      float get​(java.lang.String name)
      Returns a metric by name
      float getAbsoluteOccurrence()
      Returns a normalized measure of the number of occurrence of the terms of the query: sum over all query terms(min(number of occurrences of the term,maxOccurrences))/(query term count*100)
      float getAbsoluteProximity()
      Returns the normalized proximity of the matched terms, weighted by the connectedness of the query terms.
      float getCompleteness()
      Total completeness, where field completeness is more important: queryCompleteness * ( 1 - fieldCompletenessImportance) + fieldCompletenessImportance * fieldCompleteness
      float getEarliness()
      A normalized measure of how early the first segment occurs in this field: 1-head/(max(6,field.length)-1)
      float getExactness()
      Returns the degree to which the query terms submitted matched exactly terms contained in the document.
      float getFieldCompleteness()
      The ratio of query tokens which was matched in the field: matches/fieldLength
      int getGapLength()
      Returns the summed size of all gaps within segments
      int getGaps()
      Returns the total number of position jumps (backward or forward) within document segments
      int getHead()
      Returns the number of tokens in the field preceding the start of the first matched segment
      float getImportance()
      Returns the average of significance and weight.
      int getLongestSequence()
      Returns the size of the longest matched continuous, in-order sequence in the document
      float getLongestSequenceRatio()
      Returns longestSequence/matches
      float getMatch()
      A ready-to-use aggregate match score.
      int getMatches()
      Returns the number of query terms which was matched in this field
      float getOccurrence()
      Returns a normalized measure of the number of occurrence of the terms of the query.
      float getOrderness()
      Returns how well the order of the terms agreed in segments: 1-outOfOrder/pairs
      int getOutOfOrder()
      Returns the total number of out of order token sequences within field segments
      int getPairs()
      Returns the number of in-segment token pairs
      float getProximity()
      Returns a value which is close to 1 when matched terms are close and close to zero when they are far apart in the segment.
      float getQueryCompleteness()
      The ratio of query tokens which was matched in the field: matches/queryLength
      float getRelatedness()
      Returns the degree to which different terms are related (occurring in the same segment): 1-segments/(matches-1)
      float getSegmentDistance()
      Returns the sum of the distance between all segments making up a match to the query, measured as the sum of the number of token positions separating the start of each field adjacent segment.
      float getSegmentProximity()
      Returns the closeness of the segments in the field: 1-segmentDistance/fieldLength
      int getSegments()
      Returns the number of field text segments which are needed to match the query as completely as possible
      java.util.List<java.lang.Integer> getSegmentStarts()
      Returns the segment start points
      float getSignificance()
      Returns the normalized term significance (1-frequency) of the terms of this match relative to the whole query: The sum of the significance of all matched terms/the sum of the significance of all query terms If all the query terms were matched, this is 1.
      float getSignificantOccurrence()
      Returns a normalized measure of the number of occurrence of the terms of the query in absolute terms, or relative to the total content of the field, weighted by term significance.
      int getTail()
      Returns the number of tokens in the field following the end of the last matched segment
      float getUnweightedProximity()
      Returns the normalized proximity of the matched terms, not taking term connectedness into account.
      float getWeight()
      Returns the normalized weight of this match relative to the whole query: The sum of the weights of all matched terms/the sum of the weights of all query terms If all the query terms were matched, this is 1.
      float getWeightedAbsoluteOccurrence()
      Returns a normalized measure of the number of occurrence of the terms of the query, taking weights into account so that occurrences of higher weighted query terms has more impact than lower weighted terms.
      float getWeightedOccurrence()
      Returns a normalized measure of the number of occurrence of the terms of the query, weighted by term weight.
      boolean isComplete()
      Are these metrics representing a complete match
      void setComplete​(boolean complete)  
      java.lang.String toString()  
      java.lang.String toStringDump()  
      Trace trace()
      Returns the trace of this computation.
      • Methods inherited from class java.lang.Object

        equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait
    • Method Detail

      • isComplete

        public boolean isComplete()
        Are these metrics representing a complete match
      • setComplete

        public void setComplete​(boolean complete)
      • getSegmentStarts

        public java.util.List<java.lang.Integer> getSegmentStarts()
        Returns the segment start points
      • get

        public float get​(java.lang.String name)
        Returns a metric by name
        Throws:
        java.lang.IllegalArgumentException - if the metric name (case sensitive) is not present
      • getOutOfOrder

        public int getOutOfOrder()
        Returns the total number of out of order token sequences within field segments
      • getSegments

        public int getSegments()
        Returns the number of field text segments which are needed to match the query as completely as possible
      • getGaps

        public int getGaps()
        Returns the total number of position jumps (backward or forward) within document segments
      • getGapLength

        public int getGapLength()
        Returns the summed size of all gaps within segments
      • getLongestSequence

        public int getLongestSequence()
        Returns the size of the longest matched continuous, in-order sequence in the document
      • getHead

        public int getHead()
        Returns the number of tokens in the field preceding the start of the first matched segment
      • getTail

        public int getTail()
        Returns the number of tokens in the field following the end of the last matched segment
      • getMatches

        public int getMatches()
        Returns the number of query terms which was matched in this field
      • getPairs

        public int getPairs()
        Returns the number of in-segment token pairs
      • getAbsoluteProximity

        public float getAbsoluteProximity()
        Returns the normalized proximity of the matched terms, weighted by the connectedness of the query terms. This number is 0.1 if all the matched terms are and have default or lower connectedness, close to 1 if they are following in sequence and have a high connectedness, and close to 0 if they are far from each other in the segment or out of order
      • getUnweightedProximity

        public float getUnweightedProximity()
        Returns the normalized proximity of the matched terms, not taking term connectedness into account. This number is close to 1 if all the matched terms are following each other in sequence, and close to 0 if they are far from each other or out of order
      • getSegmentDistance

        public float getSegmentDistance()
        Returns the sum of the distance between all segments making up a match to the query, measured as the sum of the number of token positions separating the start of each field adjacent segment.
      • getWeight

        public float getWeight()

        Returns the normalized weight of this match relative to the whole query: The sum of the weights of all matched terms/the sum of the weights of all query terms If all the query terms were matched, this is 1. If no terms were matched, or these matches has weight zero, this is 0.

        As the sum of this number over all the terms of the query is always 1, sums over all fields of normalized rank features for each field multiplied by this number for the same field will produce a normalized number.

        Note that this scales with the number of matched query terms in the field. If you want a component which does not, divide by matches.

      • getSignificance

        public float getSignificance()

        Returns the normalized term significance (1-frequency) of the terms of this match relative to the whole query: The sum of the significance of all matched terms/the sum of the significance of all query terms If all the query terms were matched, this is 1. If no terms were matched, or if the significance of all the matched terms is zero (they are present in all (possible) documents), this number is zero.

        As the sum of this number over all the terms of the query is always 1, sums over all fields of normalized rank features for each field multiplied by this number for the same field will produce a normalized number.

        Note that this scales with the number of matched query terms in the field. If you want a component which does not, divide by matches.

      • getOccurrence

        public float getOccurrence()

        Returns a normalized measure of the number of occurrence of the terms of the query. This number is 1 if there are many occurrences of the query terms in absolute terms, or relative to the total content of the field, and 0 if there are none.

        This is suitable for occurrence in fields containing regular text.

      • getAbsoluteOccurrence

        public float getAbsoluteOccurrence()

        Returns a normalized measure of the number of occurrence of the terms of the query: sum over all query terms(min(number of occurrences of the term,maxOccurrences))/(query term count*100)

        This number is 1 if there are many occurrences of the query terms, and 0 if there are none. This number does not take the actual length of the field into account, so it is suitable for uses of occurrence to denote importance across multiple terms.

      • getWeightedOccurrence

        public float getWeightedOccurrence()

        Returns a normalized measure of the number of occurrence of the terms of the query, weighted by term weight. This number is close to 1 if there are many occurrences of highly weighted query terms, in absolute terms, or relative to the total content of the field, and 0 if there are none.

      • getWeightedAbsoluteOccurrence

        public float getWeightedAbsoluteOccurrence()

        Returns a normalized measure of the number of occurrence of the terms of the query, taking weights into account so that occurrences of higher weighted query terms has more impact than lower weighted terms.

        This number is 1 if there are many occurrences of the highly weighted terms, and 0 if there are none. This number does not take the actual length of the field into account, so it is suitable for uses of occurrence to denote importance across multiple terms.

      • getSignificantOccurrence

        public float getSignificantOccurrence()

        Returns a normalized measure of the number of occurrence of the terms of the query in absolute terms, or relative to the total content of the field, weighted by term significance.

        This number is 1 if there are many occurrences of the highly significant terms, and 0 if there are none.

      • getExactness

        public float getExactness()

        Returns the degree to which the query terms submitted matched exactly terms contained in the document. This is 1 if all the terms matched exactly, and closer to 0 as more of the terms was matched only as stem forms.

        This is the query term weighted average of the exactness of each match, where the exactness of a match is the product of the exactness of the matching query term and the matching field term: sum over matching query terms(query term weight * query term exactness * field term exactness) / sum over matching query terms(query term weight)

      • getQueryCompleteness

        public float getQueryCompleteness()
        The ratio of query tokens which was matched in the field: matches/queryLength
      • getFieldCompleteness

        public float getFieldCompleteness()
        The ratio of query tokens which was matched in the field: matches/fieldLength
      • getCompleteness

        public float getCompleteness()
        Total completeness, where field completeness is more important: queryCompleteness * ( 1 - fieldCompletenessImportance) + fieldCompletenessImportance * fieldCompleteness
      • getOrderness

        public float getOrderness()
        Returns how well the order of the terms agreed in segments: 1-outOfOrder/pairs
      • getRelatedness

        public float getRelatedness()
        Returns the degree to which different terms are related (occurring in the same segment): 1-segments/(matches-1)
      • getLongestSequenceRatio

        public float getLongestSequenceRatio()
        Returns longestSequence/matches
      • getSegmentProximity

        public float getSegmentProximity()
        Returns the closeness of the segments in the field: 1-segmentDistance/fieldLength
      • getProximity

        public float getProximity()
        Returns a value which is close to 1 when matched terms are close and close to zero when they are far apart in the segment. Relatively more connected terms influence this value more. This is absoluteProximity/average connectedness.
      • getImportance

        public float getImportance()

        Returns the average of significance and weight.

        As the sum of this number over all the terms of the query is always 1, sums over all fields of normalized rank features for each field multiplied by this number for the same field will produce a normalized number.

        Note that this scales with the number of matched query terms in the field. If you want a component which does not, divide by matches.

      • getEarliness

        public float getEarliness()
        A normalized measure of how early the first segment occurs in this field: 1-head/(max(6,field.length)-1)
      • getMatch

        public float getMatch()

        A ready-to-use aggregate match score. Use this if you don't have time to find a better application specific aggregate score of the fine grained match metrics.

        The current formula is ( proximityCompletenessImportance * (1-relatednessImportance + relatednessImportance*relatedness) proximity * exactness * completeness^2 + earlinessImportance * earliness + segmentProximityImportance * segmentProximity ) / (proximityCompletenessImportance + earlinessImportance + relatednessImportance) but this is subject to change (i.e improvement) at any time.

        Weight and significance are not taken into account because this is meant to capture tha quality of the match in this field, while those measures relate this match to matches in other fields. This number can be multiplied with those values when combining with other field match scores.

      • clone

        public FieldMatchMetrics clone()
        Overrides:
        clone in class java.lang.Object
      • toString

        public java.lang.String toString()
        Overrides:
        toString in class java.lang.Object
      • toStringDump

        public java.lang.String toStringDump()
      • trace

        public Trace trace()
        Returns the trace of this computation. This is empty (never null) if tracing is off