Package

com.github.vickumar1981

stringdistance

Permalink

package stringdistance

Provides classes for calculating distances and fuzzy match similarities between two strings. Also provides implicits for using distance and fuzzy match scores as an operator, like:

val result = "abc" levenshtein "abc"

Includes functionality for phonetic comparisons between strings.

Overview

The main class to use is com.github.vickumar1981.stringdistance.StringDistance

If you include com.github.vickumar1981.stringdistance.StringConverter, you can convert/use the string distance and score functions as an operator between two strings.

To compare two strings phonetically, i.e. if they sound alike, use the com.github.vickumar1981.stringdistance.util.StringSound class.

To use in Java, please use the corresponding classes in the com.github.vickumar1981.stringdistance.util package.

| Class | Description | | :--- | :--- | | com.github.vickumar1981.stringdistance.StringDistance | Singleton class with fuzzy match scores and distances | | com.github.vickumar1981.stringdistance.StringConverter | Implicit converstions between strings s1 and s2 | | com.github.vickumar1981.stringdistance.StringSound | Phonetic comparison between strings s1 and s2 | | com.github.vickumar1981.stringdistance.util.StringDistance | Java class for fuzzy match scores and distances | | com.github.vickumar1981.stringdistance.util.StringSound | Java class for phonetic comparison between strings s1 and s2 |

Linear Supertypes
Ordering
  1. Alphabetic
  2. By Inheritance
Inherited
  1. stringdistance
  2. SoundDefinitions
  3. ScoreDefinitions
  4. DistanceDefinitions
  5. AnyRef
  6. Any
  1. Hide All
  2. Show All
Visibility
  1. Public
  2. All

Type Members

  1. trait CosineAlgorithm extends StringMetricAlgorithm

    Permalink

    A marker interface for the cosine similarity algorithm.

  2. trait DamerauLevenshteinAlgorithm extends StringMetricAlgorithm

    Permalink

    A marker interface for the damerau levenshtein distance algorithm.

  3. trait DiceCoefficientAlgorithm extends StringMetricAlgorithm

    Permalink

    A marker interface for the dice coefficient algorithm.

  4. trait DistanceAlgorithm[+T <: StringMetricAlgorithm] extends AnyRef

    Permalink

    A type class to extend a distance method to StringMetricAlgorithm.

  5. trait HammingAlgorithm extends StringMetricAlgorithm

    Permalink

    A marker interface for the hamming distance algorithm.

  6. trait JaccardAlgorithm extends StringMetricAlgorithm

    Permalink

    A marker interface for a jaccard similarity algorithm.

  7. trait JaroAlgorithm extends StringMetricAlgorithm

    Permalink

    A marker interface for the jaro similarity algorithm.

  8. trait JaroWinklerAlgorithm extends StringMetricAlgorithm

    Permalink

    A marker interface for the jaro winkler algorithm.

  9. trait LevenshteinAlgorithm extends StringMetricAlgorithm

    Permalink

    A marker interface for the levenshtein distance algorithm.

  10. trait LongestCommonSeqAlorithm extends StringMetricAlgorithm

    Permalink

    A marker interface for the longest common subsequence algorithm.

  11. trait MetaphoneAlgorithm extends StringMetricAlgorithm

    Permalink

    A marker interface for the metaphone algorithm.

  12. class MetaphoneImplWrapper extends MetaphoneImpl

    Permalink

    Java Wrapper for metaphone similarity.

  13. trait NGramAlgorithm extends StringMetricAlgorithm

    Permalink

    A marker interface for the n-gram similarity algorithm.

  14. trait NeedlemanWunschAlgorithm extends StringMetricAlgorithm

    Permalink

    A marker interface for the needleman wunsch similarity algorithm.

  15. trait OverlapAlgorithm extends StringMetricAlgorithm

    Permalink

    A marker interface for the overlap similarity algorithm.

  16. trait ScorableFromDistance[+T <: StringMetricAlgorithm] extends ScoringAlgorithm[T]

    Permalink

    A mix-in trait to extend a score method using the distance method to StringMetricAlgorithm.

  17. trait ScoringAlgorithm[+T <: StringMetricAlgorithm] extends AnyRef

    Permalink

    A type class to extend a score method to StringMetricAlgorithm.

  18. trait SmithWatermanAlgorithm extends StringMetricAlgorithm

    Permalink

    A marker interface for the smith waterman similarity algorithm.

  19. trait SmithWatermanGotohAlgorithm extends StringMetricAlgorithm

    Permalink

    A marker interface for the smith waterman gotoh similarity algorithm.

  20. trait SoundScoringAlgorithm[+T <: StringMetricAlgorithm] extends AnyRef

    Permalink

    A type class to extend a sound score method to StringMetricAlgorithm.

  21. trait SoundexAlgorithm extends StringMetricAlgorithm

    Permalink

    A marker interface for the soundex similarity algorithm.

  22. class SoundexImplWrapper extends SoundexImpl

    Permalink

    Java Wrapper for soundex similarity.

  23. trait StringMetric[A <: StringMetricAlgorithm] extends AnyRef

    Permalink

    Defines implementation for StringMetricAlgorithm by adding implicit definitions from DistanceAlgorithm, ScoringAlgorithm, WeightedDistanceAlgorithm, or WeightedScoringAlgorithm

  24. trait StringMetricAlgorithm extends AnyRef

    Permalink

    A marker interface for the string metric algorithm.

  25. trait StringSoundMetric[A <: StringMetricAlgorithm] extends AnyRef

    Permalink
  26. trait TverskyAlgorithm extends StringMetricAlgorithm

    Permalink

    A marker interface for the tversky similarity algorithm.

  27. trait WeightedDistanceAlgorithm[+A <: StringMetricAlgorithm, B] extends AnyRef

    Permalink

    A type class to extend a distance method with a 2nd typed parameter to StringMetricAlgorithm.

  28. trait WeightedScoringAlgorithm[+A <: StringMetricAlgorithm, B] extends AnyRef

    Permalink

    A type class to extend a score method with a 2nd typed parameter to StringMetricAlgorithm.

  29. trait WeightedStringMetric[A <: StringMetricAlgorithm, B] extends AnyRef

    Permalink

Value Members

  1. object ArrayDistance

    Permalink

    Main class to work with generic arrays, Array[T], analagous to StringDistance

    Main class to work with generic arrays, Array[T], analagous to StringDistance

    import com.github.vickumar1981.stringdistance.ArrayDistance._
    
    // Example Levenshtein Distance and Score
    val levenshteinDist = Levenshtein.distance(Array("m", "a", "r", "t", "h", "a"), Array("m", "a", "r", "h", "t", "a")) // 2
    val levenshtein = Levenshtein.score(Array("m", "a", "r", "t", "h", "a"), Array("m", "a", "r", "h", "t", "a")) // 0.667
  2. implicit object CosSimilarityScore extends CosSimilarityImpl with ScoringAlgorithm[CosineAlgorithm]

    Permalink

    Implicit definition of cosine similarity score for CosineAlgorithm.

    Implicit definition of cosine similarity score for CosineAlgorithm.

    Definition Classes
    ScoreDefinitions
  3. implicit object DamerauLevenshteinDistance extends LevenshteinDistanceImpl with DistanceAlgorithm[DamerauLevenshteinAlgorithm] with ScorableFromDistance[DamerauLevenshteinAlgorithm]

    Permalink

    Implicit definition of damerau levenshtein distance for DamerauLevenshteinAlgorithm.

    Implicit definition of damerau levenshtein distance for DamerauLevenshteinAlgorithm.

    Definition Classes
    DistanceDefinitions
  4. implicit object DiceCoefficientScore extends DiceCoefficientImpl with ScoringAlgorithm[DiceCoefficientAlgorithm]

    Permalink

    Implicit definition of dice coefficient score for DiceCoefficientAlgorithm.

    Implicit definition of dice coefficient score for DiceCoefficientAlgorithm.

    Definition Classes
    ScoreDefinitions
  5. implicit object HammingDistance extends HammingImpl with DistanceAlgorithm[HammingAlgorithm] with ScorableFromDistance[HammingAlgorithm]

    Permalink

    Implicit definition of hamming distance for HammingAlgorithm.

    Implicit definition of hamming distance for HammingAlgorithm.

    Definition Classes
    DistanceDefinitions
  6. implicit object JaccardScore extends JaccardImpl with WeightedScoringAlgorithm[JaccardAlgorithm, Int]

    Permalink

    Implicit definition of jaccard score for JaccardAlgorithm.

    Implicit definition of jaccard score for JaccardAlgorithm.

    Definition Classes
    ScoreDefinitions
  7. implicit object JaroScore extends JaroImpl with ScoringAlgorithm[JaroAlgorithm]

    Permalink

    Implicit definition of jaro score for JaroAlgorithm.

    Implicit definition of jaro score for JaroAlgorithm.

    Definition Classes
    ScoreDefinitions
  8. implicit object JaroWinklerScore extends JaroImpl with WeightedScoringAlgorithm[JaroWinklerAlgorithm, Double]

    Permalink

    Implicit definition of jaro winkler score for JaroWinklerAlgorithm.

    Implicit definition of jaro winkler score for JaroWinklerAlgorithm.

    Definition Classes
    ScoreDefinitions
  9. implicit object LevenshteinDistance extends LevenshteinDistanceImpl with DistanceAlgorithm[LevenshteinAlgorithm] with ScorableFromDistance[LevenshteinAlgorithm]

    Permalink

    Implicit definition of levenshtein distance for LevenshteinAlgorithm.

    Implicit definition of levenshtein distance for LevenshteinAlgorithm.

    Definition Classes
    DistanceDefinitions
  10. implicit object LongestCommonSeqDistance extends LongestCommonSeqImpl with DistanceAlgorithm[LongestCommonSeqAlorithm]

    Permalink

    Implicit definition of longest common subsequence for CosineAlgorithm.

    Implicit definition of longest common subsequence for CosineAlgorithm.

    Definition Classes
    DistanceDefinitions
  11. implicit object MetaphoneScore extends MetaphoneImpl with SoundScoringAlgorithm[MetaphoneAlgorithm]

    Permalink

    Implicit definition of metaphone score for MetaphoneAlgorithm.

    Implicit definition of metaphone score for MetaphoneAlgorithm.

    Definition Classes
    SoundDefinitions
  12. implicit object NGramDistance extends NGramImpl with WeightedDistanceAlgorithm[NGramAlgorithm, Int]

    Permalink

    Implicit definition of n-gram distance for NGramAlgorithm.

    Implicit definition of n-gram distance for NGramAlgorithm.

    Definition Classes
    DistanceDefinitions
  13. implicit object NGramScore extends NGramImpl with WeightedScoringAlgorithm[NGramAlgorithm, Int]

    Permalink

    Implicit definition of n-gram score for NGramAlgorithm.

    Implicit definition of n-gram score for NGramAlgorithm.

    Definition Classes
    ScoreDefinitions
  14. implicit object NeedlemanWunschScore extends NeedlemanWunschImpl with WeightedScoringAlgorithm[NeedlemanWunschAlgorithm, ConstantGap]

    Permalink

    Implicit definition of needleman wunsch score for NeedlemanWunschAlgorithm.

    Implicit definition of needleman wunsch score for NeedlemanWunschAlgorithm.

    Definition Classes
    ScoreDefinitions
  15. implicit object OverlapScore extends OverlapImpl with WeightedScoringAlgorithm[OverlapAlgorithm, Int]

    Permalink

    Implicit definition of overlap score for OverlapAlgorithm.

    Implicit definition of overlap score for OverlapAlgorithm.

    Definition Classes
    ScoreDefinitions
  16. implicit object SmithWatermanGotohScore extends SmithWatermanImpl with WeightedScoringAlgorithm[SmithWatermanGotohAlgorithm, ConstantGap]

    Permalink

    Implicit definition of smith waterman gotoh score for SmithWatermanGotohAlgorithm.

    Implicit definition of smith waterman gotoh score for SmithWatermanGotohAlgorithm.

    Definition Classes
    ScoreDefinitions
  17. implicit object SmithWatermanScore extends SmithWatermanImpl with WeightedScoringAlgorithm[SmithWatermanAlgorithm, (Gap, Int)]

    Permalink

    Implicit definition of smith waterman score for SmithWatermanAlgorithm.

    Implicit definition of smith waterman score for SmithWatermanAlgorithm.

    Definition Classes
    ScoreDefinitions
  18. implicit object SoundexScore extends SoundexImpl with SoundScoringAlgorithm[SoundexAlgorithm]

    Permalink

    Implicit definition of soundex score for SoundexAlgorithm.

    Implicit definition of soundex score for SoundexAlgorithm.

    Definition Classes
    SoundDefinitions
  19. object StringConverter

    Permalink

    Object to extend operations to the String class.

    Object to extend operations to the String class.

    import com.github.vickumar1981.stringdistance.StringConverter._
    
    // Scores between two strings
    val cosSimilarity: Double = "hello".cosine("chello")
    val damerau: Double = "martha".damerau("marhta")
    val diceCoefficient: Double = "martha".diceCoefficient("marhta")
    val hamming: Double = "martha".hamming("marhta")
    val jaccard: Double = "karolin".jaccard("kathrin")
    val jaro: Double = "martha".jaro("marhta")
    val jaroWinkler: Double = "martha".jaroWinkler("marhta")
    val levenshtein: Double = "martha".levenshtein("marhta")
    val needlemanWunsch: Double = "martha".needlemanWusnch("marhta")
    val ngramSimilarity: Double = "karolin".nGram("kathrin")
    val bigramSimilarity: Double = "karolin".nGram("kathrin", 2)
    val overlap: Double = "karolin".overlap("kathrin")
    val smithWaterman: Double = "martha".smithWaterman("marhta")
    val smithWatermanGotoh: Double = "martha".smithWatermanGotoh("marhta")
    val tversky: Double = "karolin".tversky("kathrin", 0.5)
    
    // return a List[String] of ngram tokens
    val tokens = "martha".tokens(2) // List("ma", "ar", "rt", "th", "ha")
    
    // Distances between two strings
    val damerauDist: int = "martha".damerauDist("marhta")
    val hammingDist: Int = "martha".hammingDist("marhta")
    val levenshteinDist: Int = "martha".levenshteinDist("marhta")
    val longestCommonSeq: Int = "martha".longestCommonSeq("marhta")
    val ngramDist: Int = "karolin".nGramDist("kathrin")
    val bigramDist: Int = "karolin".nGramDist("kathrin", 2)
    
    // Phonetic similarity of two strings
    val metaphone: Boolean = "merci".metaphone("mercy")
    val soundex: Boolean = "merci".soundex("mercy")
  20. object StringDistance

    Permalink

    Main class to organize functionality of different string distance algorithms

    Main class to organize functionality of different string distance algorithms

    import com.github.vickumar1981.stringdistance.StringDistance._
    import com.github.vickumar1981.stringdistance.impl.{ConstantGap, LinearGap}
    
    // Scores between strings
    val cosSimilarity: Double = Cosine.score("hello", "chello")
    val damerau: Double = Damerau.score("martha", "marhta")
    val diceCoefficient: Double = DiceCoefficient.score("martha", "marhta")
    val hamming: Double = Hamming.score("martha", "marhta")
    val jaccard: Double = Jaccard.score("karolin", "kathrin", 1)
    val jaro: Double = Jaro.score("martha", "marhta")
    val jaroWinkler: Double = JaroWinkler.score("martha", "marhta", 0.1)
    val levenshtein: Double = Levenshtein.score("martha", "marhta")
    val needlemanWunsch: Double = NeedlemanWunsch.score("martha", "marhta", ConstantGap())
    val ngramSimilarity: Double = NGram.score("karolin", "kathrin", 1)
    val bigramSimilarity: Double = NGram.score("karolin", "kathrin", 2)
    val overlap: Double = Overlap.score("karolin", "kathrin", 1)
    val smithWaterman: Double = SmithWaterman.score("martha", "marhta", (LinearGap(gapValue = -1), Integer.MAX_VALUE))
    val smithWatermanGotoh: Double = SmithWatermanGotoh.score("martha", "marhta", ConstantGap())
    val tversky: Double = Tversky.score("karolin", "kathrin", 0.5)
    
    // Distances between strings
    val damerauDist: Int = Damerau.distance("martha", "marhta")
    val hammingDist: Int = Hamming.distance("martha", "marhta")
    val levenshteinDist: Int = Levenshtein.distance("martha", "marhta")
    val longestCommonSubSeq: Int = LongestCommonSeq.distance("martha", "marhta")
    val ngramDist: Int = NGram.distance("karolin", "kathrin", 1)
    val bigramDist: Int = NGram.distance("karolin", "kathrin", 2)
    
    // return a List[String] of ngram tokens
    val tokens = NGram.tokens("martha", 2) // List("ma", "ar", "rt", "th", "ha")
  21. object StringSound

    Permalink

    Main class to organize functionality of different phonetic/sound string algorithms

    Main class to organize functionality of different phonetic/sound string algorithms

    import com.github.vickumar1981.stringdistance.StringSound._
    import com.github.vickumar1981.stringdistance.implicits._
    
    // Phonetic similarity between strings
    val metaphone: Boolean = Metaphone.score("merci", "mercy")
    val soundex: Boolean = Soundex.score("merci", "mercy")
  22. implicit object TverskyScore extends JaccardImpl with WeightedScoringAlgorithm[TverskyAlgorithm, Double]

    Permalink

    Implicit definition of tversky score for TverskyAlgorithm.

    Implicit definition of tversky score for TverskyAlgorithm.

    Definition Classes
    ScoreDefinitions
  23. implicit def gapToGapAndWindow(g: Gap): (Gap, Int)

    Permalink
  24. package impl

    Permalink
  25. package implicits

    Permalink
  26. package util

    Permalink

Inherited from SoundDefinitions

Inherited from ScoreDefinitions

Inherited from DistanceDefinitions

Inherited from AnyRef

Inherited from Any

Ungrouped