- Damerau - Class in info.debatty.java.stringsimilarity
-
Implementation of Damerau-Levenshtein distance with transposition (also
sometimes calls unrestricted Damerau-Levenshtein distance).
- Damerau() - Constructor for class info.debatty.java.stringsimilarity.Damerau
-
- distance(String, String) - Method in class info.debatty.java.stringsimilarity.Cosine
-
Return 1.0 - similarity.
- distance(String, String) - Method in class info.debatty.java.stringsimilarity.Damerau
-
Compute the distance between strings: the minimum number of operations
needed to transform one string into the other (insertion, deletion,
substitution of a single character, or a transposition of two adjacent
characters).
- distance(String, String) - Method in class info.debatty.java.stringsimilarity.experimental.Sift4
-
Sift4 - a general purpose string distance algorithm inspired by
JaroWinkler and Longest Common Subsequence.
- distance(String, String) - Method in interface info.debatty.java.stringsimilarity.interfaces.MetricStringDistance
-
- distance(String, String) - Method in interface info.debatty.java.stringsimilarity.interfaces.StringDistance
-
- distance(String, String) - Method in class info.debatty.java.stringsimilarity.Jaccard
-
Distance is computed as 1 - similarity.
- distance(String, String) - Method in class info.debatty.java.stringsimilarity.JaroWinkler
-
Return 1 - similarity.
- distance(String, String) - Method in class info.debatty.java.stringsimilarity.Levenshtein
-
The Levenshtein distance, or edit distance, between two words is the
minimum number of single-character edits (insertions, deletions or
substitutions) required to change one word into the other.
- distance(String, String) - Method in class info.debatty.java.stringsimilarity.LongestCommonSubsequence
-
Return the LCS distance between strings s1 and s2, computed as |s1| +
|s2| - 2 * |LCS(s1, s2)|.
- distance(String, String) - Method in class info.debatty.java.stringsimilarity.MetricLCS
-
Distance metric based on Longest Common Subsequence, computed as
1 - |LCS(s1, s2)| / max(|s1|, |s2|).
- distance(String, String) - Method in class info.debatty.java.stringsimilarity.NGram
-
Compute n-gram distance.
- distance(String, String) - Method in class info.debatty.java.stringsimilarity.NormalizedLevenshtein
-
Compute distance as Levenshtein(s1, s2) / max(|s1|, |s2|).
- distance(String, String) - Method in class info.debatty.java.stringsimilarity.OptimalStringAlignment
-
Compute the distance between strings: the minimum number of operations
needed to transform one string into the other (insertion, deletion,
substitution of a single character, or a transposition of two adjacent
characters) while no substring is edited more than once.
- distance(String, String) - Method in class info.debatty.java.stringsimilarity.QGram
-
The distance between two strings is defined as the L1 norm of the
difference of their profiles (the number of occurence of each k-shingle).
- distance(String, String) - Method in class info.debatty.java.stringsimilarity.SorensenDice
-
Returns 1 - similarity.
- distance(String, String) - Method in class info.debatty.java.stringsimilarity.WeightedLevenshtein
-
Compute Levenshtein distance using provided weights for substitution.
- dotProduct(SparseDoubleVector) - Method in class info.debatty.java.utils.SparseDoubleVector
-
- dotProduct(double[]) - Method in class info.debatty.java.utils.SparseDoubleVector
-
- dotProduct(SparseIntegerVector) - Method in class info.debatty.java.utils.SparseIntegerVector
-
Compute and return the dot product.
- dotProduct(double[]) - Method in class info.debatty.java.utils.SparseIntegerVector
-
Compute and return the dot product.
- Jaccard - Class in info.debatty.java.stringsimilarity
-
Each input string is converted into a set of n-grams, the Jaccard index is
then computed as |V1 inter V2| / |V1 union V2|.
- Jaccard(int) - Constructor for class info.debatty.java.stringsimilarity.Jaccard
-
The strings are first transformed into sets of k-shingles (sequences of k
characters), then Jaccard index is computed as |A inter B| / |A union B|.
- Jaccard() - Constructor for class info.debatty.java.stringsimilarity.Jaccard
-
The strings are first transformed into sets of k-shingles (sequences of k
characters), then Jaccard index is computed as |A inter B| / |A union B|.
- jaccard(SparseBooleanVector) - Method in class info.debatty.java.utils.SparseBooleanVector
-
Computes and return the Jaccard index with other SparseVector.
- jaccard(SparseDoubleVector) - Method in class info.debatty.java.utils.SparseDoubleVector
-
Computes and return the Jaccard index with other SparseVector.
- jaccard(SparseIntegerVector) - Method in class info.debatty.java.utils.SparseIntegerVector
-
Computes and return the Jaccard index with other SparseVector.
- JaroWinkler - Class in info.debatty.java.stringsimilarity
-
The Jaro–Winkler distance metric is designed and best suited for short
strings such as person names, and to detect typos; it is (roughly) a
variation of Damerau-Levenshtein, where the substitution of 2 close
characters is considered less important then the substitution of 2 characters
that a far from each other.
- JaroWinkler() - Constructor for class info.debatty.java.stringsimilarity.JaroWinkler
-
Instantiate with default threshold (0.7).
- JaroWinkler(double) - Constructor for class info.debatty.java.stringsimilarity.JaroWinkler
-
Instantiate with given threshold to determine when Winkler bonus should
be used.
- sampleDIMSUM(double, int, int) - Method in class info.debatty.java.utils.SparseDoubleVector
-
- setMaxOffset(int) - Method in class info.debatty.java.stringsimilarity.experimental.Sift4
-
Set the maximum distance to search for character transposition.
- Sift4 - Class in info.debatty.java.stringsimilarity.experimental
-
Sift4 - a general purpose string distance algorithm inspired by JaroWinkler
and Longest Common Subsequence.
- Sift4() - Constructor for class info.debatty.java.stringsimilarity.experimental.Sift4
-
- similarity(String, String) - Method in class info.debatty.java.stringsimilarity.Cosine
-
Compute the cosine similarity between strings.
- similarity(Map<String, Integer>, Map<String, Integer>) - Method in class info.debatty.java.stringsimilarity.Cosine
-
- similarity(String, String) - Method in interface info.debatty.java.stringsimilarity.interfaces.StringSimilarity
-
Compute and return a measure of similarity between 2 strings.
- similarity(String, String) - Method in class info.debatty.java.stringsimilarity.Jaccard
-
Compute Jaccard index: |A inter B| / |A union B|.
- similarity(String, String) - Method in class info.debatty.java.stringsimilarity.JaroWinkler
-
Compute Jaro-Winkler similarity.
- similarity(String, String) - Method in class info.debatty.java.stringsimilarity.NormalizedLevenshtein
-
Return 1 - distance.
- similarity(String, String) - Method in class info.debatty.java.stringsimilarity.SorensenDice
-
Similarity is computed as 2 * |A inter B| / (|A| + |B|).
- size() - Method in class info.debatty.java.utils.SparseBooleanVector
-
Return the number of (non-zero) elements in this vector.
- size - Variable in class info.debatty.java.utils.SparseDoubleVector
-
- size() - Method in class info.debatty.java.utils.SparseDoubleVector
-
Return the number of non-zero elements in this vector.
- size() - Method in class info.debatty.java.utils.SparseIntegerVector
-
Return the number of (non-zero) elements in this vector.
- SorensenDice - Class in info.debatty.java.stringsimilarity
-
Similar to Jaccard index, but this time the similarity is computed as 2 * |V1
inter V2| / (|V1| + |V2|).
- SorensenDice(int) - Constructor for class info.debatty.java.stringsimilarity.SorensenDice
-
Sorensen-Dice coefficient, aka Sørensen index, Dice's coefficient or
Czekanowski's binary (non-quantitative) index.
- SorensenDice() - Constructor for class info.debatty.java.stringsimilarity.SorensenDice
-
Sorensen-Dice coefficient, aka Sørensen index, Dice's coefficient or
Czekanowski's binary (non-quantitative) index.
- SparseBooleanVector - Class in info.debatty.java.utils
-
- SparseBooleanVector(int) - Constructor for class info.debatty.java.utils.SparseBooleanVector
-
- SparseBooleanVector() - Constructor for class info.debatty.java.utils.SparseBooleanVector
-
- SparseBooleanVector(HashMap<Integer, Integer>) - Constructor for class info.debatty.java.utils.SparseBooleanVector
-
- SparseBooleanVector(boolean[]) - Constructor for class info.debatty.java.utils.SparseBooleanVector
-
- SparseDoubleVector - Class in info.debatty.java.utils
-
Sparse vector of double, implemented using two arrays.
- SparseDoubleVector(int) - Constructor for class info.debatty.java.utils.SparseDoubleVector
-
- SparseDoubleVector() - Constructor for class info.debatty.java.utils.SparseDoubleVector
-
- SparseDoubleVector(HashMap<Integer, Double>) - Constructor for class info.debatty.java.utils.SparseDoubleVector
-
- SparseDoubleVector(double[]) - Constructor for class info.debatty.java.utils.SparseDoubleVector
-
- SparseDoubleVectorExample - Class in info.debatty.java.stringsimilarity.examples
-
- SparseDoubleVectorExample() - Constructor for class info.debatty.java.stringsimilarity.examples.SparseDoubleVectorExample
-
- SparseIntegerVector - Class in info.debatty.java.utils
-
Sparse vector of int, implemented using two arrays.
- SparseIntegerVector(int) - Constructor for class info.debatty.java.utils.SparseIntegerVector
-
Sparse vector of int, implemented using two arrays.
- SparseIntegerVector() - Constructor for class info.debatty.java.utils.SparseIntegerVector
-
Sparse vector of int, implemented using two arrays.
- SparseIntegerVector(HashMap<Integer, Integer>) - Constructor for class info.debatty.java.utils.SparseIntegerVector
-
Sparse vector of int, implemented using two arrays.
- SparseIntegerVector(int[]) - Constructor for class info.debatty.java.utils.SparseIntegerVector
-
Sparse vector of int, implemented using two arrays.
- StringDistance - Interface in info.debatty.java.stringsimilarity.interfaces
-
- StringSimilarity - Interface in info.debatty.java.stringsimilarity.interfaces
-