java.lang.Object

com.yahoo.searchlib.ranking.features.fieldmatch.FieldMatchMetrics

All Implemented Interfaces:: Cloneable

public final class FieldMatchMetrics extends Object implements Cloneable

The collection of metrics calculated by the string match metric calculator.

Author:: bratseth

Constructor Summary

Constructors

Constructor

Description

FieldMatchMetrics(FieldMatchMetricsComputer source)
Method Summary

Modifier and Type

Method

Description

FieldMatchMetrics

clone()

float

get(String name)

Returns a metric by name

float

getAbsoluteOccurrence()

Returns a normalized measure of the number of occurrence of the terms of the query: sum over all query terms(min(number of occurrences of the term,maxOccurrences))/(query term count*100)

float

getAbsoluteProximity()

Returns the normalized proximity of the matched terms, weighted by the connectedness of the query terms.

float

getCompleteness()

Total completeness, where field completeness is more important: queryCompleteness * ( 1 - fieldCompletenessImportance) + fieldCompletenessImportance * fieldCompleteness

float

getEarliness()

A normalized measure of how early the first segment occurs in this field: 1-head/(max(6,field.length)-1)

float

getExactness()

Returns the degree to which the query terms submitted matched exactly terms contained in the document.

float

getFieldCompleteness()

The ratio of query tokens which was matched in the field: matches/fieldLength

int

getGapLength()

Returns the summed size of all gaps within segments

int

getGaps()

Returns the total number of position jumps (backward or forward) within document segments

int

getHead()

Returns the number of tokens in the field preceding the start of the first matched segment

float

getImportance()

Returns the average of significance and weight.

int

getLongestSequence()

Returns the size of the longest matched continuous, in-order sequence in the document

float

getLongestSequenceRatio()

Returns longestSequence/matches

float

getMatch()

A ready-to-use aggregate match score.

int

getMatches()

Returns the number of query terms which was matched in this field

float

getOccurrence()

Returns a normalized measure of the number of occurrence of the terms of the query.

float

getOrderness()

Returns how well the order of the terms agreed in segments: 1-outOfOrder/pairs

int

getOutOfOrder()

Returns the total number of out of order token sequences within field segments

int

getPairs()

Returns the number of in-segment token pairs

float

getProximity()

Returns a value which is close to 1 when matched terms are close and close to zero when they are far apart in the segment.

float

getQueryCompleteness()

The ratio of query tokens which was matched in the field: matches/queryLength

float

getRelatedness()

Returns the degree to which different terms are related (occurring in the same segment): 1-segments/(matches-1)

float

getSegmentDistance()

Returns the sum of the distance between all segments making up a match to the query, measured as the sum of the number of token positions separating the start of each field adjacent segment.

float

getSegmentProximity()

Returns the closeness of the segments in the field: 1-segmentDistance/fieldLength

int

getSegments()

Returns the number of field text segments which are needed to match the query as completely as possible

List<Integer>

getSegmentStarts()

Returns the segment start points

float

getSignificance()

Returns the normalized term significance (1-frequency) of the terms of this match relative to the whole query: The sum of the significance of all matched terms/the sum of the significance of all query terms If all the query terms were matched, this is 1.

float

getSignificantOccurrence()

Returns a normalized measure of the number of occurrence of the terms of the query in absolute terms, or relative to the total content of the field, weighted by term significance.

int

getTail()

Returns the number of tokens in the field following the end of the last matched segment

float

getUnweightedProximity()

Returns the normalized proximity of the matched terms, not taking term connectedness into account.

float

getWeight()

Returns the normalized weight of this match relative to the whole query: The sum of the weights of all matched terms/the sum of the weights of all query terms If all the query terms were matched, this is 1.

float

getWeightedAbsoluteOccurrence()

Returns a normalized measure of the number of occurrence of the terms of the query, taking weights into account so that occurrences of higher weighted query terms has more impact than lower weighted terms.

float

getWeightedOccurrence()

Returns a normalized measure of the number of occurrence of the terms of the query, weighted by term weight.

boolean

isComplete()

Are these metrics representing a complete match

void

setComplete(boolean complete)

String

toString()

String

toStringDump()

Trace

trace()

Returns the trace of this computation.

Methods inherited from class java.lang.Object
equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait

Constructor Details
- FieldMatchMetrics
  
  public FieldMatchMetrics(FieldMatchMetricsComputer source)
Method Details
- isComplete
  
  public boolean isComplete()
  
  Are these metrics representing a complete match
- setComplete
  
  public void setComplete(boolean complete)
- getSegmentStarts
  
  public List<Integer> getSegmentStarts()
  
  Returns the segment start points
- get
  
  public float get(String name)
  
  Returns a metric by name
  
  Throws:
  
  IllegalArgumentException - if the metric name (case sensitive) is not present
- getOutOfOrder
  
  public int getOutOfOrder()
  
  Returns the total number of out of order token sequences within field segments
- getSegments
  
  public int getSegments()
  
  Returns the number of field text segments which are needed to match the query as completely as possible
- getGaps
  
  public int getGaps()
  
  Returns the total number of position jumps (backward or forward) within document segments
- getGapLength
  
  public int getGapLength()
  
  Returns the summed size of all gaps within segments
- getLongestSequence
  
  public int getLongestSequence()
  
  Returns the size of the longest matched continuous, in-order sequence in the document
- getHead
  
  public int getHead()
  
  Returns the number of tokens in the field preceding the start of the first matched segment
- getTail
  
  public int getTail()
  
  Returns the number of tokens in the field following the end of the last matched segment
- getMatches
  
  public int getMatches()
  
  Returns the number of query terms which was matched in this field
- getPairs
  
  public int getPairs()
  
  Returns the number of in-segment token pairs
- getAbsoluteProximity
  
  public float getAbsoluteProximity()
  
  Returns the normalized proximity of the matched terms, weighted by the connectedness of the query terms. This number is 0.1 if all the matched terms are and have default or lower connectedness, close to 1 if they are following in sequence and have a high connectedness, and close to 0 if they are far from each other in the segment or out of order
- getUnweightedProximity
  
  public float getUnweightedProximity()
  
  Returns the normalized proximity of the matched terms, not taking term connectedness into account. This number is close to 1 if all the matched terms are following each other in sequence, and close to 0 if they are far from each other or out of order
- getSegmentDistance
  
  public float getSegmentDistance()
  
  Returns the sum of the distance between all segments making up a match to the query, measured as the sum of the number of token positions separating the start of each field adjacent segment.
- getWeight
  
  public float getWeight()
  
  Returns the normalized weight of this match relative to the whole query: The sum of the weights of all matched terms/the sum of the weights of all query terms If all the query terms were matched, this is 1. If no terms were matched, or these matches has weight zero, this is 0.
  
  As the sum of this number over all the terms of the query is always 1, sums over all fields of normalized rank features for each field multiplied by this number for the same field will produce a normalized number.
  
  Note that this scales with the number of matched query terms in the field. If you want a component which does not, divide by matches.
- getSignificance
  
  public float getSignificance()
  
  Returns the normalized term significance (1-frequency) of the terms of this match relative to the whole query: The sum of the significance of all matched terms/the sum of the significance of all query terms If all the query terms were matched, this is 1. If no terms were matched, or if the significance of all the matched terms is zero (they are present in all (possible) documents), this number is zero.
  
  As the sum of this number over all the terms of the query is always 1, sums over all fields of normalized rank features for each field multiplied by this number for the same field will produce a normalized number.
  
  Note that this scales with the number of matched query terms in the field. If you want a component which does not, divide by matches.
- getOccurrence
  
  public float getOccurrence()
  
  Returns a normalized measure of the number of occurrence of the terms of the query. This number is 1 if there are many occurrences of the query terms in absolute terms, or relative to the total content of the field, and 0 if there are none.
  
  This is suitable for occurrence in fields containing regular text.
- getAbsoluteOccurrence
  
  public float getAbsoluteOccurrence()
  
  Returns a normalized measure of the number of occurrence of the terms of the query: sum over all query terms(min(number of occurrences of the term,maxOccurrences))/(query term count*100)
  This number is 1 if there are many occurrences of the query terms, and 0 if there are none. This number does not take the actual length of the field into account, so it is suitable for uses of occurrence to denote importance across multiple terms.
- getWeightedOccurrence
  
  public float getWeightedOccurrence()
  
  Returns a normalized measure of the number of occurrence of the terms of the query, weighted by term weight. This number is close to 1 if there are many occurrences of highly weighted query terms, in absolute terms, or relative to the total content of the field, and 0 if there are none.
- getWeightedAbsoluteOccurrence
  
  public float getWeightedAbsoluteOccurrence()
  
  Returns a normalized measure of the number of occurrence of the terms of the query, taking weights into account so that occurrences of higher weighted query terms has more impact than lower weighted terms.
  
  This number is 1 if there are many occurrences of the highly weighted terms, and 0 if there are none. This number does not take the actual length of the field into account, so it is suitable for uses of occurrence to denote importance across multiple terms.
- getSignificantOccurrence
  
  public float getSignificantOccurrence()
  
  Returns a normalized measure of the number of occurrence of the terms of the query in absolute terms, or relative to the total content of the field, weighted by term significance.
  This number is 1 if there are many occurrences of the highly significant terms, and 0 if there are none.
- getExactness
  
  public float getExactness()
  
  Returns the degree to which the query terms submitted matched exactly terms contained in the document. This is 1 if all the terms matched exactly, and closer to 0 as more of the terms was matched only as stem forms.
  
  This is the query term weighted average of the exactness of each match, where the exactness of a match is the product of the exactness of the matching query term and the matching field term: sum over matching query terms(query term weight * query term exactness * field term exactness) / sum over matching query terms(query term weight)
- getQueryCompleteness
  
  public float getQueryCompleteness()
  
  The ratio of query tokens which was matched in the field: matches/queryLength
- getFieldCompleteness
  
  public float getFieldCompleteness()
  
  The ratio of query tokens which was matched in the field: matches/fieldLength
- getCompleteness
  
  public float getCompleteness()
  
  Total completeness, where field completeness is more important: queryCompleteness * ( 1 - fieldCompletenessImportance) + fieldCompletenessImportance * fieldCompleteness
- getOrderness
  
  public float getOrderness()
  
  Returns how well the order of the terms agreed in segments: 1-outOfOrder/pairs
- getRelatedness
  
  public float getRelatedness()
  
  Returns the degree to which different terms are related (occurring in the same segment): 1-segments/(matches-1)
- getLongestSequenceRatio
  
  public float getLongestSequenceRatio()
  
  Returns longestSequence/matches
- getSegmentProximity
  
  public float getSegmentProximity()
  
  Returns the closeness of the segments in the field: 1-segmentDistance/fieldLength
- getProximity
  
  public float getProximity()
  
  Returns a value which is close to 1 when matched terms are close and close to zero when they are far apart in the segment. Relatively more connected terms influence this value more. This is absoluteProximity/average connectedness.
- getImportance
  
  public float getImportance()
  
  Returns the average of significance and weight.
  
  As the sum of this number over all the terms of the query is always 1, sums over all fields of normalized rank features for each field multiplied by this number for the same field will produce a normalized number.
  
  Note that this scales with the number of matched query terms in the field. If you want a component which does not, divide by matches.
- getEarliness
  
  public float getEarliness()
  
  A normalized measure of how early the first segment occurs in this field: 1-head/(max(6,field.length)-1)
- getMatch
  
  public float getMatch()
  
  A ready-to-use aggregate match score. Use this if you don't have time to find a better application specific aggregate score of the fine grained match metrics.
  
  The current formula is ( proximityCompletenessImportance * (1-relatednessImportance + relatednessImportance*relatedness) proximity * exactness * completeness^2 + earlinessImportance * earliness + segmentProximityImportance * segmentProximity ) / (proximityCompletenessImportance + earlinessImportance + relatednessImportance) but this is subject to change (i.e improvement) at any time.
  
  Weight and significance are not taken into account because this is meant to capture tha quality of the match in this field, while those measures relate this match to matches in other fields. This number can be multiplied with those values when combining with other field match scores.
- clone
  
  public FieldMatchMetrics clone()
  
  Overrides:
  
  clone in class Object
- toString
  
  public String toString()
  
  Overrides:
  
  toString in class Object
- toStringDump
  
  public String toStringDump()
- trace
  
  public Trace trace()
  
  Returns the trace of this computation. This is empty (never null) if tracing is off

Class FieldMatchMetrics

Constructor Summary

Method Summary

Methods inherited from class java.lang.Object

Constructor Details

FieldMatchMetrics

Method Details

isComplete

setComplete

getSegmentStarts

get

getOutOfOrder

getSegments

getGaps

getGapLength

getLongestSequence

getHead

getTail

getMatches

getPairs

getAbsoluteProximity

getUnweightedProximity

getSegmentDistance

getWeight

getSignificance

getOccurrence

getAbsoluteOccurrence

getWeightedOccurrence

getWeightedAbsoluteOccurrence

getSignificantOccurrence

getExactness

getQueryCompleteness

getFieldCompleteness

getCompleteness

getOrderness

getRelatedness

getLongestSequenceRatio

getSegmentProximity

getProximity

getImportance

getEarliness

getMatch

clone

toString

toStringDump

trace