class |
Cosine |
The similarity between the two strings is the cosine of the angle between
these two vectors representation.
|
class |
Jaccard |
Each input string is converted into a set of n-grams, the Jaccard index is
then computed as |V1 inter V2| / |V1 union V2|.
|
class |
JaroWinkler |
The Jaro–Winkler distance metric is designed and best suited for short
strings such as person names, and to detect typos; it is (roughly) a
variation of Damerau-Levenshtein, where the substitution of 2 close
characters is considered less important then the substitution of 2 characters
that a far from each other.
|
class |
NormalizedLevenshtein |
This distance is computed as levenshtein distance divided by the length of
the longest string.
|
class |
RatcliffObershelp |
Ratcliff/Obershelp pattern recognition
The Ratcliff/Obershelp algorithm computes the similarity of two strings a
the doubled number of matching characters divided by the total number of
characters in the two strings.
|
class |
SorensenDice |
Similar to Jaccard index, but this time the similarity is computed as 2 * |V1
inter V2| / (|V1| + |V2|).
|