A marker interface for the cosine similarity algorithm.
A marker interface for the damerau levenshtein distance algorithm.
A marker interface for the dice coefficient algorithm.
A type class to extend a distance method to StringMetricAlgorithm.
A marker interface for the hamming distance algorithm.
A marker interface for a jaccard similarity algorithm.
A marker interface for the jaro similarity algorithm.
A marker interface for the jaro winkler algorithm.
A marker interface for the levenshtein distance algorithm.
A marker interface for the longest common subsequence algorithm.
A marker interface for the metaphone algorithm.
Java Wrapper for metaphone similarity.
A marker interface for the n-gram similarity algorithm.
A marker interface for the needleman wunsch similarity algorithm.
A marker interface for the overlap similarity algorithm.
A mix-in trait to extend a score method using the distance method to StringMetricAlgorithm.
A type class to extend a score method to StringMetricAlgorithm.
A marker interface for the smith waterman similarity algorithm.
A marker interface for the smith waterman gotoh similarity algorithm.
A type class to extend a sound score method to StringMetricAlgorithm.
A marker interface for the soundex similarity algorithm.
Java Wrapper for soundex similarity.
Defines implementation for StringMetricAlgorithm by adding implicit definitions from DistanceAlgorithm, ScoringAlgorithm, WeightedDistanceAlgorithm, or WeightedScoringAlgorithm
A marker interface for the string metric algorithm.
A marker interface for the tversky similarity algorithm.
A type class to extend a distance method with a 2nd typed parameter to StringMetricAlgorithm.
A type class to extend a score method with a 2nd typed parameter to StringMetricAlgorithm.
Main class to work with generic arrays, Array[T], analagous to StringDistance
Main class to work with generic arrays, Array[T], analagous to StringDistance
import com.github.vickumar1981.stringdistance.ArrayDistance._ // Example Levenshtein Distance and Score val levenshteinDist = Levenshtein.distance(Array("m", "a", "r", "t", "h", "a"), Array("m", "a", "r", "h", "t", "a")) // 2 val levenshtein = Levenshtein.score(Array("m", "a", "r", "t", "h", "a"), Array("m", "a", "r", "h", "t", "a")) // 0.667
Implicit definition of cosine similarity score for CosineAlgorithm.
Implicit definition of cosine similarity score for CosineAlgorithm.
Implicit definition of damerau levenshtein distance for DamerauLevenshteinAlgorithm.
Implicit definition of damerau levenshtein distance for DamerauLevenshteinAlgorithm.
Implicit definition of dice coefficient score for DiceCoefficientAlgorithm.
Implicit definition of dice coefficient score for DiceCoefficientAlgorithm.
Implicit definition of hamming distance for HammingAlgorithm.
Implicit definition of hamming distance for HammingAlgorithm.
Implicit definition of jaccard score for JaccardAlgorithm.
Implicit definition of jaccard score for JaccardAlgorithm.
Implicit definition of jaro score for JaroAlgorithm.
Implicit definition of jaro score for JaroAlgorithm.
Implicit definition of jaro winkler score for JaroWinklerAlgorithm.
Implicit definition of jaro winkler score for JaroWinklerAlgorithm.
Implicit definition of levenshtein distance for LevenshteinAlgorithm.
Implicit definition of levenshtein distance for LevenshteinAlgorithm.
Implicit definition of longest common subsequence for CosineAlgorithm.
Implicit definition of longest common subsequence for CosineAlgorithm.
Implicit definition of metaphone score for MetaphoneAlgorithm.
Implicit definition of metaphone score for MetaphoneAlgorithm.
Implicit definition of n-gram distance for NGramAlgorithm.
Implicit definition of n-gram distance for NGramAlgorithm.
Implicit definition of n-gram score for NGramAlgorithm.
Implicit definition of n-gram score for NGramAlgorithm.
Implicit definition of needleman wunsch score for NeedlemanWunschAlgorithm.
Implicit definition of needleman wunsch score for NeedlemanWunschAlgorithm.
Implicit definition of overlap score for OverlapAlgorithm.
Implicit definition of overlap score for OverlapAlgorithm.
Implicit definition of smith waterman gotoh score for SmithWatermanGotohAlgorithm.
Implicit definition of smith waterman gotoh score for SmithWatermanGotohAlgorithm.
Implicit definition of smith waterman score for SmithWatermanAlgorithm.
Implicit definition of smith waterman score for SmithWatermanAlgorithm.
Implicit definition of soundex score for SoundexAlgorithm.
Implicit definition of soundex score for SoundexAlgorithm.
Object to extend operations to the String class.
Object to extend operations to the String class.
import com.github.vickumar1981.stringdistance.StringConverter._ // Scores between two strings val cosSimilarity: Double = "hello".cosine("chello") val damerau: Double = "martha".damerau("marhta") val diceCoefficient: Double = "martha".diceCoefficient("marhta") val hamming: Double = "martha".hamming("marhta") val jaccard: Double = "karolin".jaccard("kathrin") val jaro: Double = "martha".jaro("marhta") val jaroWinkler: Double = "martha".jaroWinkler("marhta") val levenshtein: Double = "martha".levenshtein("marhta") val needlemanWunsch: Double = "martha".needlemanWusnch("marhta") val ngramSimilarity: Double = "karolin".nGram("kathrin") val bigramSimilarity: Double = "karolin".nGram("kathrin", 2) val overlap: Double = "karolin".overlap("kathrin") val smithWaterman: Double = "martha".smithWaterman("marhta") val smithWatermanGotoh: Double = "martha".smithWatermanGotoh("marhta") val tversky: Double = "karolin".tversky("kathrin", 0.5) // return a List[String] of ngram tokens val tokens = "martha".tokens(2) // List("ma", "ar", "rt", "th", "ha") // Distances between two strings val damerauDist: int = "martha".damerauDist("marhta") val hammingDist: Int = "martha".hammingDist("marhta") val levenshteinDist: Int = "martha".levenshteinDist("marhta") val longestCommonSeq: Int = "martha".longestCommonSeq("marhta") val ngramDist: Int = "karolin".nGramDist("kathrin") val bigramDist: Int = "karolin".nGramDist("kathrin", 2) // Phonetic similarity of two strings val metaphone: Boolean = "merci".metaphone("mercy") val soundex: Boolean = "merci".soundex("mercy")
Main class to organize functionality of different string distance algorithms
Main class to organize functionality of different string distance algorithms
import com.github.vickumar1981.stringdistance.StringDistance._ import com.github.vickumar1981.stringdistance.impl.{ConstantGap, LinearGap} // Scores between strings val cosSimilarity: Double = Cosine.score("hello", "chello") val damerau: Double = Damerau.score("martha", "marhta") val diceCoefficient: Double = DiceCoefficient.score("martha", "marhta") val hamming: Double = Hamming.score("martha", "marhta") val jaccard: Double = Jaccard.score("karolin", "kathrin", 1) val jaro: Double = Jaro.score("martha", "marhta") val jaroWinkler: Double = JaroWinkler.score("martha", "marhta", 0.1) val levenshtein: Double = Levenshtein.score("martha", "marhta") val needlemanWunsch: Double = NeedlemanWunsch.score("martha", "marhta", ConstantGap()) val ngramSimilarity: Double = NGram.score("karolin", "kathrin", 1) val bigramSimilarity: Double = NGram.score("karolin", "kathrin", 2) val overlap: Double = Overlap.score("karolin", "kathrin", 1) val smithWaterman: Double = SmithWaterman.score("martha", "marhta", (LinearGap(gapValue = -1), Integer.MAX_VALUE)) val smithWatermanGotoh: Double = SmithWatermanGotoh.score("martha", "marhta", ConstantGap()) val tversky: Double = Tversky.score("karolin", "kathrin", 0.5) // Distances between strings val damerauDist: Int = Damerau.distance("martha", "marhta") val hammingDist: Int = Hamming.distance("martha", "marhta") val levenshteinDist: Int = Levenshtein.distance("martha", "marhta") val longestCommonSubSeq: Int = LongestCommonSeq.distance("martha", "marhta") val ngramDist: Int = NGram.distance("karolin", "kathrin", 1) val bigramDist: Int = NGram.distance("karolin", "kathrin", 2) // return a List[String] of ngram tokens val tokens = NGram.tokens("martha", 2) // List("ma", "ar", "rt", "th", "ha")
Main class to organize functionality of different phonetic/sound string algorithms
Main class to organize functionality of different phonetic/sound string algorithms
import com.github.vickumar1981.stringdistance.StringSound._ import com.github.vickumar1981.stringdistance.implicits._ // Phonetic similarity between strings val metaphone: Boolean = Metaphone.score("merci", "mercy") val soundex: Boolean = Soundex.score("merci", "mercy")
Implicit definition of tversky score for TverskyAlgorithm.
Implicit definition of tversky score for TverskyAlgorithm.
Provides classes for calculating distances and fuzzy match similarities between two strings. Also provides implicits for using distance and fuzzy match scores as an operator, like:
Includes functionality for phonetic comparisons between strings.
Overview
The main class to use is com.github.vickumar1981.stringdistance.StringDistance
If you include com.github.vickumar1981.stringdistance.StringConverter, you can convert/use the string distance and score functions as an operator between two strings.
To compare two strings phonetically, i.e. if they sound alike, use the com.github.vickumar1981.stringdistance.util.StringSound class.
To use in Java, please use the corresponding classes in the com.github.vickumar1981.stringdistance.util package.
| Class | Description | | :--- | :--- | | com.github.vickumar1981.stringdistance.StringDistance | Singleton class with fuzzy match scores and distances | | com.github.vickumar1981.stringdistance.StringConverter | Implicit converstions between strings s1 and s2 | | com.github.vickumar1981.stringdistance.StringSound | Phonetic comparison between strings s1 and s2 | | com.github.vickumar1981.stringdistance.util.StringDistance | Java class for fuzzy match scores and distances | | com.github.vickumar1981.stringdistance.util.StringSound | Java class for phonetic comparison between strings s1 and s2 |