Class QGram
- java.lang.Object
-
- info.debatty.java.stringsimilarity.ShingleBased
-
- info.debatty.java.stringsimilarity.QGram
-
- All Implemented Interfaces:
StringDistance
,Serializable
@Immutable public class QGram extends ShingleBased implements StringDistance
Q-gram distance, as defined by Ukkonen in "Approximate string-matching with q-grams and maximal matches". The distance between two strings is defined as the L1 norm of the difference of their profiles (the number of occurences of each n-gram): SUM( |V1_i - V2_i| ). Q-gram distance is a lower bound on Levenshtein distance, but can be computed in O(m + n), where Levenshtein requires O(m.n).- Author:
- Thibault Debatty
- See Also:
- Serialized Form
-
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description double
distance(String s1, String s2)
The distance between two strings is defined as the L1 norm of the difference of their profiles (the number of occurence of each k-shingle).double
distance(Map<String,Integer> profile1, Map<String,Integer> profile2)
Compute QGram distance using precomputed profiles.-
Methods inherited from class info.debatty.java.stringsimilarity.ShingleBased
getK, getProfile
-
-
-
-
Constructor Detail
-
QGram
public QGram(int k)
Q-gram similarity and distance. Defined by Ukkonen in "Approximate string-matching with q-grams and maximal matches", http://www.sciencedirect.com/science/article/pii/0304397592901434 The distance between two strings is defined as the L1 norm of the difference of their profiles (the number of occurences of each k-shingle). Q-gram distance is a lower bound on Levenshtein distance, but can be computed in O(|A| + |B|), where Levenshtein requires O(|A|.|B|)- Parameters:
k
-
-
QGram
public QGram()
Q-gram similarity and distance. Defined by Ukkonen in "Approximate string-matching with q-grams and maximal matches", http://www.sciencedirect.com/science/article/pii/0304397592901434 The distance between two strings is defined as the L1 norm of the difference of their profiles (the number of occurence of each k-shingle). Q-gram distance is a lower bound on Levenshtein distance, but can be computed in O(|A| + |B|), where Levenshtein requires O(|A|.|B|) Default k is 3.
-
-
Method Detail
-
distance
public final double distance(String s1, String s2)
The distance between two strings is defined as the L1 norm of the difference of their profiles (the number of occurence of each k-shingle).- Specified by:
distance
in interfaceStringDistance
- Parameters:
s1
- The first string to compare.s2
- The second string to compare.- Returns:
- The computed Q-gram distance.
- Throws:
NullPointerException
- if s1 or s2 is null.
-
-