Interface | Description |
---|---|
CharacterSubstitutionInterface |
Used to indicate the cost of character substitution.
|
StringSimilarityInterface |
Class | Description |
---|---|
Cosine | |
Damerau |
Implementation of Damerau-Levenshtein distance, computed as the
minimum number of operations needed to transform one string into the other,
where an operation is defined as an insertion, deletion, or substitution of a
single character, or a transposition of two adjacent characters.
|
Jaccard | |
JaroWinkler | |
KShingling |
k-shingling is the operation of transforming a string (or text document) into
a set of n-grams, which can be used to measure the similarity between two
strings or documents.
|
Levenshtein |
The Levenshtein distance between two words is the minimum number of
single-character edits (insertions, deletions or substitutions) required to
change one word into the other.
|
LongestCommonSubsequence |
The longest common subsequence (LCS) problem consists in finding the
longest subsequence common to two (or more) sequences.
|
Main | |
NGram |
N-Gram Similarity as defined by Kondrak, "N-Gram Similarity and Distance",
String Processing and Information Retrieval, Lecture Notes in Computer
Science Volume 3772, 2005, pp 115-126.
|
QGram | |
SetBasedStringSimilarity | |
SorensenDice | |
WeightedLevenshtein |
Implementation of Levenshtein that allows to define different weights for
different character substitutions.
|
Copyright © 2015. All rights reserved.