Class Jaccard
- java.lang.Object
-
- info.debatty.java.stringsimilarity.ShingleBased
-
- info.debatty.java.stringsimilarity.Jaccard
-
- All Implemented Interfaces:
MetricStringDistance
,NormalizedStringDistance
,NormalizedStringSimilarity
,StringDistance
,StringSimilarity
,Serializable
@Immutable public class Jaccard extends ShingleBased implements MetricStringDistance, NormalizedStringDistance, NormalizedStringSimilarity
Each input string is converted into a set of n-grams, the Jaccard index is then computed as |V1 inter V2| / |V1 union V2|. Like Q-Gram distance, the input strings are first converted into sets of n-grams (sequences of n characters, also called k-shingles), but this time the cardinality of each n-gram is not taken into account. Distance is computed as 1 - cosine similarity. Jaccard index is a metric distance.- Author:
- Thibault Debatty
- See Also:
- Serialized Form
-
-
Constructor Summary
Constructors Constructor Description Jaccard()
The strings are first transformed into sets of k-shingles (sequences of k characters), then Jaccard index is computed as |A inter B| / |A union B|.Jaccard(int k)
The strings are first transformed into sets of k-shingles (sequences of k characters), then Jaccard index is computed as |A inter B| / |A union B|.
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description double
distance(String s1, String s2)
Distance is computed as 1 - similarity.double
similarity(String s1, String s2)
Compute Jaccard index: |A inter B| / |A union B|.-
Methods inherited from class info.debatty.java.stringsimilarity.ShingleBased
getK, getProfile
-
-
-
-
Constructor Detail
-
Jaccard
public Jaccard(int k)
The strings are first transformed into sets of k-shingles (sequences of k characters), then Jaccard index is computed as |A inter B| / |A union B|. The default value of k is 3.- Parameters:
k
-
-
Jaccard
public Jaccard()
The strings are first transformed into sets of k-shingles (sequences of k characters), then Jaccard index is computed as |A inter B| / |A union B|. The default value of k is 3.
-
-
Method Detail
-
similarity
public final double similarity(String s1, String s2)
Compute Jaccard index: |A inter B| / |A union B|.- Specified by:
similarity
in interfaceStringSimilarity
- Parameters:
s1
- The first string to compare.s2
- The second string to compare.- Returns:
- The Jaccard index in the range [0, 1]
- Throws:
NullPointerException
- if s1 or s2 is null.
-
distance
public final double distance(String s1, String s2)
Distance is computed as 1 - similarity.- Specified by:
distance
in interfaceMetricStringDistance
- Specified by:
distance
in interfaceStringDistance
- Parameters:
s1
- The first string to compare.s2
- The second string to compare.- Returns:
- 1 - the Jaccard similarity.
- Throws:
NullPointerException
- if s1 or s2 is null.
-
-