Class JaroWinkler

  • All Implemented Interfaces:
    NormalizedStringDistance, NormalizedStringSimilarity, StringDistance, StringSimilarity, Serializable

    @Immutable
    public class JaroWinkler
    extends Object
    implements NormalizedStringSimilarity, NormalizedStringDistance
    The Jaro–Winkler distance metric is designed and best suited for short strings such as person names, and to detect typos; it is (roughly) a variation of Damerau-Levenshtein, where the substitution of 2 close characters is considered less important then the substitution of 2 characters that a far from each other. Jaro-Winkler was developed in the area of record linkage (duplicate detection) (Winkler, 1990). It returns a value in the interval [0.0, 1.0]. The distance is computed as 1 - Jaro-Winkler similarity.
    Author:
    Thibault Debatty
    See Also:
    Serialized Form
    • Constructor Detail

      • JaroWinkler

        public JaroWinkler()
        Instantiate with default threshold (0.7).
      • JaroWinkler

        public JaroWinkler​(double threshold)
        Instantiate with given threshold to determine when Winkler bonus should be used. Set threshold to a negative value to get the Jaro distance.
        Parameters:
        threshold -
    • Method Detail

      • getThreshold

        public final double getThreshold()
        Returns the current value of the threshold used for adding the Winkler bonus. The default value is 0.7.
        Returns:
        the current value of the threshold
      • similarity

        public final double similarity​(String s1,
                                       String s2)
        Compute Jaro-Winkler similarity.
        Specified by:
        similarity in interface StringSimilarity
        Parameters:
        s1 - The first string to compare.
        s2 - The second string to compare.
        Returns:
        The Jaro-Winkler similarity in the range [0, 1]
        Throws:
        NullPointerException - if s1 or s2 is null.
      • distance

        public final double distance​(String s1,
                                     String s2)
        Return 1 - similarity.
        Specified by:
        distance in interface StringDistance
        Parameters:
        s1 - The first string to compare.
        s2 - The second string to compare.
        Returns:
        1 - similarity.
        Throws:
        NullPointerException - if s1 or s2 is null.