Interface | Description |
---|---|
StringSimilarityInterface |
Class | Description |
---|---|
Cosine |
Implements Cosine Similarity.
|
Damerau |
Implementation of Damerau-Levenshtein distance, computed as the
minimum number of operations needed to transform one string into the other,
where an operation is defined as an insertion, deletion, or substitution of a
single character, or a transposition of two adjacent characters.
|
Jaccard | |
JaroWinkler | |
KShingling |
A k-shingling is a set of unique k-grams, used to measure the similarity of
two documents.
|
Levenshtein |
The Levenshtein distance between two words is the minimum number of
single-character edits (insertions, deletions or substitutions) required to
change one word into the other.
|
LongestCommonSubsequence |
The longest common subsequence (LCS) problem consists in finding the
longest subsequence common to two (or more) sequences.
|
Main | |
NGram |
N-Gram Similarity as defined by Kondrak, "N-Gram Similarity and Distance",
String Processing and Information Retrieval, Lecture Notes in Computer
Science Volume 3772, 2005, pp 115-126.
|
QGram |
Q-gram similarity and distance.
|
SorensenDice |
Sorensen-Dice coefficien, aka Sørensen index, Dice's coefficient or
Czekanowski's binary (non-quantitative) index.
|
Copyright © 2015. All rights reserved.