Class SorensenDice

    • Constructor Detail

      • SorensenDice

        public SorensenDice​(int k)
        Sorensen-Dice coefficient, aka Sørensen index, Dice's coefficient or Czekanowski's binary (non-quantitative) index. The strings are first converted to boolean sets of k-shingles (sequences of k characters), then the similarity is computed as 2 * |A inter B| / (|A| + |B|). Attention: Sorensen-Dice distance (and similarity) does not satisfy triangle inequality.
        Parameters:
        k -
      • SorensenDice

        public SorensenDice()
        Sorensen-Dice coefficient, aka Sørensen index, Dice's coefficient or Czekanowski's binary (non-quantitative) index. The strings are first converted to boolean sets of k-shingles (sequences of k characters), then the similarity is computed as 2 * |A inter B| / (|A| + |B|). Attention: Sorensen-Dice distance (and similarity) does not satisfy triangle inequality. Default k is 3.
    • Method Detail

      • similarity

        public final double similarity​(String s1,
                                       String s2)
        Similarity is computed as 2 * |A inter B| / (|A| + |B|).
        Specified by:
        similarity in interface StringSimilarity
        Parameters:
        s1 - The first string to compare.
        s2 - The second string to compare.
        Returns:
        The computed Sorensen-Dice similarity.
        Throws:
        NullPointerException - if s1 or s2 is null.
      • distance

        public final double distance​(String s1,
                                     String s2)
        Returns 1 - similarity.
        Specified by:
        distance in interface StringDistance
        Parameters:
        s1 - The first string to compare.
        s2 - The second string to compare.
        Returns:
        1.0 - the computed similarity
        Throws:
        NullPointerException - if s1 or s2 is null.