org.opencms.search
Class CmsSearchSimilarity

java.lang.Object
  extended by org.apache.lucene.search.Similarity
      extended by org.apache.lucene.search.DefaultSimilarity
          extended by org.opencms.search.CmsSearchSimilarity
All Implemented Interfaces:
Serializable

public class CmsSearchSimilarity
extends org.apache.lucene.search.DefaultSimilarity

Reduces the importance of the computeNorm(String, FieldInvertState) factor for the CmsSearchField.FIELD_CONTENT field, while keeping the Lucene default for all other fields.

This implementation was added since apparently the default length norm is heavily biased for small documents. In the default, even if a term is found in 2 documents the same number of times, the smaller document (containing less terms) will have a score easily 3x as high as the longer document. Using this implementation the importance of the term number is reduced.

Inspired by Chuck Williams WikipediaSimilarity.

Since:
6.0.0
See Also:
Serialized Form

Field Summary
 
Fields inherited from class org.apache.lucene.search.DefaultSimilarity
discountOverlaps
 
Fields inherited from class org.apache.lucene.search.Similarity
NO_DOC_ID_PROVIDED
 
Constructor Summary
CmsSearchSimilarity()
          Creates a new instance of the OpenCms search similarity.
 
Method Summary
 float computeNorm(String fieldName, org.apache.lucene.index.FieldInvertState state)
          Special implementation for "compute norm" to reduce the significance of this factor for the CmsSearchField.FIELD_CONTENT field, while keeping the Lucene default for all other fields.
 
Methods inherited from class org.apache.lucene.search.DefaultSimilarity
coord, getDiscountOverlaps, idf, queryNorm, setDiscountOverlaps, sloppyFreq, tf
 
Methods inherited from class org.apache.lucene.search.Similarity
decodeNorm, decodeNormValue, encodeNorm, encodeNormValue, getDefault, getNormDecoder, idfExplain, idfExplain, idfExplain, lengthNorm, scorePayload, setDefault, tf
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

CmsSearchSimilarity

public CmsSearchSimilarity()
Creates a new instance of the OpenCms search similarity.

Method Detail

computeNorm

public float computeNorm(String fieldName,
                         org.apache.lucene.index.FieldInvertState state)
Special implementation for "compute norm" to reduce the significance of this factor for the CmsSearchField.FIELD_CONTENT field, while keeping the Lucene default for all other fields.

Overrides:
computeNorm in class org.apache.lucene.search.DefaultSimilarity
See Also:
DefaultSimilarity.computeNorm(java.lang.String, org.apache.lucene.index.FieldInvertState)