Class SorensenDice
- java.lang.Object
-
- info.debatty.java.stringsimilarity.ShingleBased
-
- info.debatty.java.stringsimilarity.SorensenDice
-
- All Implemented Interfaces:
NormalizedStringDistance
,NormalizedStringSimilarity
,StringDistance
,StringSimilarity
,Serializable
@Immutable public class SorensenDice extends ShingleBased implements NormalizedStringDistance, NormalizedStringSimilarity
Similar to Jaccard index, but this time the similarity is computed as 2 * |V1 inter V2| / (|V1| + |V2|). Distance is computed as 1 - cosine similarity.- Author:
- Thibault Debatty
- See Also:
- Serialized Form
-
-
Constructor Summary
Constructors Constructor Description SorensenDice()
Sorensen-Dice coefficient, aka Sørensen index, Dice's coefficient or Czekanowski's binary (non-quantitative) index.SorensenDice(int k)
Sorensen-Dice coefficient, aka Sørensen index, Dice's coefficient or Czekanowski's binary (non-quantitative) index.
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description double
distance(String s1, String s2)
Returns 1 - similarity.double
similarity(String s1, String s2)
Similarity is computed as 2 * |A inter B| / (|A| + |B|).-
Methods inherited from class info.debatty.java.stringsimilarity.ShingleBased
getK, getProfile
-
-
-
-
Constructor Detail
-
SorensenDice
public SorensenDice(int k)
Sorensen-Dice coefficient, aka Sørensen index, Dice's coefficient or Czekanowski's binary (non-quantitative) index. The strings are first converted to boolean sets of k-shingles (sequences of k characters), then the similarity is computed as 2 * |A inter B| / (|A| + |B|). Attention: Sorensen-Dice distance (and similarity) does not satisfy triangle inequality.- Parameters:
k
-
-
SorensenDice
public SorensenDice()
Sorensen-Dice coefficient, aka Sørensen index, Dice's coefficient or Czekanowski's binary (non-quantitative) index. The strings are first converted to boolean sets of k-shingles (sequences of k characters), then the similarity is computed as 2 * |A inter B| / (|A| + |B|). Attention: Sorensen-Dice distance (and similarity) does not satisfy triangle inequality. Default k is 3.
-
-
Method Detail
-
similarity
public final double similarity(String s1, String s2)
Similarity is computed as 2 * |A inter B| / (|A| + |B|).- Specified by:
similarity
in interfaceStringSimilarity
- Parameters:
s1
- The first string to compare.s2
- The second string to compare.- Returns:
- The computed Sorensen-Dice similarity.
- Throws:
NullPointerException
- if s1 or s2 is null.
-
distance
public final double distance(String s1, String s2)
Returns 1 - similarity.- Specified by:
distance
in interfaceStringDistance
- Parameters:
s1
- The first string to compare.s2
- The second string to compare.- Returns:
- 1.0 - the computed similarity
- Throws:
NullPointerException
- if s1 or s2 is null.
-
-