Class QGram

  • All Implemented Interfaces:
    StringDistance, Serializable

    @Immutable
    public class QGram
    extends ShingleBased
    implements StringDistance
    Q-gram distance, as defined by Ukkonen in "Approximate string-matching with q-grams and maximal matches". The distance between two strings is defined as the L1 norm of the difference of their profiles (the number of occurences of each n-gram): SUM( |V1_i - V2_i| ). Q-gram distance is a lower bound on Levenshtein distance, but can be computed in O(m + n), where Levenshtein requires O(m.n).
    Author:
    Thibault Debatty
    See Also:
    Serialized Form
    • Constructor Detail

      • QGram

        public QGram​(int k)
        Q-gram similarity and distance. Defined by Ukkonen in "Approximate string-matching with q-grams and maximal matches", http://www.sciencedirect.com/science/article/pii/0304397592901434 The distance between two strings is defined as the L1 norm of the difference of their profiles (the number of occurences of each k-shingle). Q-gram distance is a lower bound on Levenshtein distance, but can be computed in O(|A| + |B|), where Levenshtein requires O(|A|.|B|)
        Parameters:
        k -
      • QGram

        public QGram()
        Q-gram similarity and distance. Defined by Ukkonen in "Approximate string-matching with q-grams and maximal matches", http://www.sciencedirect.com/science/article/pii/0304397592901434 The distance between two strings is defined as the L1 norm of the difference of their profiles (the number of occurence of each k-shingle). Q-gram distance is a lower bound on Levenshtein distance, but can be computed in O(|A| + |B|), where Levenshtein requires O(|A|.|B|) Default k is 3.
    • Method Detail

      • distance

        public final double distance​(String s1,
                                     String s2)
        The distance between two strings is defined as the L1 norm of the difference of their profiles (the number of occurence of each k-shingle).
        Specified by:
        distance in interface StringDistance
        Parameters:
        s1 - The first string to compare.
        s2 - The second string to compare.
        Returns:
        The computed Q-gram distance.
        Throws:
        NullPointerException - if s1 or s2 is null.
      • distance

        public final double distance​(Map<String,​Integer> profile1,
                                     Map<String,​Integer> profile2)
        Compute QGram distance using precomputed profiles.
        Parameters:
        profile1 -
        profile2 -
        Returns: