Package

com.github.vickumar1981

stringdistance

Permalink

package stringdistance

Provides classes for calculating distances and fuzzy match similarities between two strings. Also provides implicits for using distance and fuzzy match scores as an operator, like:

val result = "abc" levenshtein "abc"

Includes functionality for phonetic comparisons between strings.

Overview

The main class to use is com.github.vickumar1981.stringdistance.StringDistance

If you include com.github.vickumar1981.stringdistance.StringConverter, you can convert/use the string distance and score functions as an operator between two strings.

To compare two strings phonetically, i.e. if they sound alike, use the com.github.vickumar1981.stringdistance.util.StringSound class.

To use in Java, please use the corresponding classes in the com.github.vickumar1981.stringdistance.util package.

| Class | Description | | :--- | :--- | | com.github.vickumar1981.stringdistance.StringDistance | Singleton class with fuzzy match scores and distances | | com.github.vickumar1981.stringdistance.StringConverter | Implicit converstions between strings s1 and s2 | | com.github.vickumar1981.stringdistance.StringSound | Phonetic comparison between strings s1 and s2 | | com.github.vickumar1981.stringdistance.util.StringDistance | Java class for fuzzy match scores and distances | | com.github.vickumar1981.stringdistance.util.StringSound | Java class for phonetic comparison between strings s1 and s2 |

Linear Supertypes
Ordering
  1. Alphabetic
  2. By Inheritance
Inherited
  1. stringdistance
  2. SoundDefinitions
  3. ScoreDefinitions
  4. DistanceDefinitions
  5. AnyRef
  6. Any
  1. Hide All
  2. Show All
Visibility
  1. Public
  2. All

Type Members

  1. trait CosineAlgorithm extends StringMetricAlgorithm

    Permalink

    A marker interface for the cosine similarity algorithm.

  2. class CosineSimilarityImplWrapper extends CosSimilarityImpl

    Permalink

    Jave Wrapper for cosine similarity.

  3. trait DamerauLevenshteinAlgorithm extends StringMetricAlgorithm

    Permalink

    A marker interface for the damerau levenshtein distance algorithm.

  4. trait DiceCoefficientAlgorithm extends StringMetricAlgorithm

    Permalink

    A marker interface for the dice coefficient algorithm.

  5. class DiceCoefficientImplWrapper extends DiceCoefficientImpl

    Permalink

    Jave Wrapper for dice coefficient similarity.

  6. trait DistanceAlgorithm[+T <: StringMetricAlgorithm] extends AnyRef

    Permalink

    A type class to extend a distance method to StringMetricAlgorithm.

  7. trait HammingAlgorithm extends StringMetricAlgorithm

    Permalink

    A marker interface for the hamming distance algorithm.

  8. class HammingImplWrapper extends HammingImpl

    Permalink

    Jave Wrapper for hamming distance.

  9. trait JaccardAlgorithm extends StringMetricAlgorithm

    Permalink

    A marker interface for a jaccard similarity algorithm.

  10. class JaccardImplWrapper extends JaccardImpl

    Permalink

    Jave Wrapper for jaccard similarity.

  11. trait JaroAlgorithm extends StringMetricAlgorithm

    Permalink

    A marker interface for the jaro similarity algorithm.

  12. class JaroImplWrapper extends JaroImpl

    Permalink

    Jave Wrapper for jaro and jaro winkler similarity.

  13. trait JaroWinklerAlgorithm extends StringMetricAlgorithm

    Permalink

    A marker interface for the jaro winkler algorithm.

  14. trait LevenshteinAlgorithm extends StringMetricAlgorithm

    Permalink

    A marker interface for the levenshtein distance algorithm.

  15. class LevenshteinDistanceImplWrapper extends LevenshteinDistanceImpl

    Permalink

    Jave Wrapper for levenshtein distance.

  16. trait LongestCommonSeqAlorithm extends StringMetricAlgorithm

    Permalink

    A marker interface for the longest common subsequence algorithm.

  17. class LongestCommonSeqWrapper extends LongestCommonSeqImpl

    Permalink

    Jave Wrapper for longest comment sequence.

  18. trait MetaphoneAlgorithm extends StringMetricAlgorithm

    Permalink

    A marker interface for the metaphone algorithm.

  19. class MetaphoneImplWrapper extends MetaphoneImpl

    Permalink

    Jave Wrapper for metaphone similarity.

  20. trait NGramAlgorithm extends StringMetricAlgorithm

    Permalink

    A marker interface for the n-gram similarity algorithm.

  21. class NGramImplWrapper extends NGramImpl

    Permalink

    Jave Wrapper for n-gram similarity.

  22. trait NeedlemanWunschAlgorithm extends StringMetricAlgorithm

    Permalink

    A marker interface for the needleman wunsch similarity algorithm.

  23. class NeedlemanWunschImplWrapper extends NeedlemanWunschImpl

    Permalink

    Jave Wrapper for needleman wunsch similarity.

  24. trait OverlapAlgorithm extends StringMetricAlgorithm

    Permalink

    A marker interface for the overlap similarity algorithm.

  25. class OverlapImplWrapper extends OverlapImpl

    Permalink

    Jave Wrapper for overlap similarity.

  26. trait ScorableFromDistance[+T <: StringMetricAlgorithm] extends ScoringAlgorithm[T]

    Permalink

    A mix-in trait to extend a score method using the distance method to StringMetricAlgorithm.

  27. trait ScoringAlgorithm[+T <: StringMetricAlgorithm] extends AnyRef

    Permalink

    A type class to extend a score method to StringMetricAlgorithm.

  28. trait SmithWatermanAlgorithm extends StringMetricAlgorithm

    Permalink

    A marker interface for the smith waterman similarity algorithm.

  29. trait SmithWatermanGotohAlgorithm extends StringMetricAlgorithm

    Permalink

    A marker interface for the smith waterman gotoh similarity algorithm.

  30. class SmithWatermanImplWrapper extends SmithWatermanImpl

    Permalink

    Jave Wrapper for smith waterman similarity.

  31. trait SoundScoringAlgorithm[+T <: StringMetricAlgorithm] extends AnyRef

    Permalink

    A type class to extend a sound score method to StringMetricAlgorithm.

  32. trait SoundexAlgorithm extends StringMetricAlgorithm

    Permalink

    A marker interface for the soundex similarity algorithm.

  33. class SoundexImplWrapper extends SoundexImpl

    Permalink

    Jave Wrapper for soundex similarity.

  34. trait StringMetric[A <: StringMetricAlgorithm] extends AnyRef

    Permalink

    Defines implementation for StringMetricAlgorithm by adding implicit definitions from DistanceAlgorithm, ScoringAlgorithm, WeightedDistanceAlgorithm, or WeightedScoringAlgorithm

  35. trait StringMetricAlgorithm extends AnyRef

    Permalink

    A marker interface for the string metric algorithm.

  36. trait StringSoundMetric[A <: StringMetricAlgorithm] extends AnyRef

    Permalink
  37. trait TverskyAlgorithm extends StringMetricAlgorithm

    Permalink

    A marker interface for the tversky similarity algorithm.

  38. trait WeightedDistanceAlgorithm[+A <: StringMetricAlgorithm, B] extends AnyRef

    Permalink

    A type class to extend a distance method with a 2nd typed parameter to StringMetricAlgorithm.

  39. trait WeightedScoringAlgorithm[+A <: StringMetricAlgorithm, B] extends AnyRef

    Permalink

    A type class to extend a score method with a 2nd typed parameter to StringMetricAlgorithm.

  40. trait WeightedStringMetric[A <: StringMetricAlgorithm, B] extends AnyRef

    Permalink

Value Members

  1. implicit object CosSimilarityScore extends CosSimilarityImpl with WeightedScoringAlgorithm[CosineAlgorithm, String]

    Permalink

    Implicit definition of cosine similarity score for CosineAlgorithm.

    Implicit definition of cosine similarity score for CosineAlgorithm.

    Definition Classes
    ScoreDefinitions
  2. implicit object DamerauLevenshteinDistance extends LevenshteinDistanceImpl with DistanceAlgorithm[DamerauLevenshteinAlgorithm] with ScorableFromDistance[DamerauLevenshteinAlgorithm]

    Permalink

    Implicit definition of damerau levenshtein distance for DamerauLevenshteinAlgorithm.

    Implicit definition of damerau levenshtein distance for DamerauLevenshteinAlgorithm.

    Definition Classes
    DistanceDefinitions
  3. implicit object DiceCoefficientScore extends DiceCoefficientImpl with ScoringAlgorithm[DiceCoefficientAlgorithm]

    Permalink

    Implicit definition of dice coefficient score for DiceCoefficientAlgorithm.

    Implicit definition of dice coefficient score for DiceCoefficientAlgorithm.

    Definition Classes
    ScoreDefinitions
  4. implicit object HammingDistance extends HammingImpl with DistanceAlgorithm[HammingAlgorithm] with ScorableFromDistance[HammingAlgorithm]

    Permalink

    Implicit definition of hamming distance for HammingAlgorithm.

    Implicit definition of hamming distance for HammingAlgorithm.

    Definition Classes
    DistanceDefinitions
  5. implicit object JaccardScore extends JaccardImpl with WeightedScoringAlgorithm[JaccardAlgorithm, Int]

    Permalink

    Implicit definition of jaccard score for JaccardAlgorithm.

    Implicit definition of jaccard score for JaccardAlgorithm.

    Definition Classes
    ScoreDefinitions
  6. implicit object JaroScore extends JaroImpl with ScoringAlgorithm[JaroAlgorithm]

    Permalink

    Implicit definition of jaro score for JaroAlgorithm.

    Implicit definition of jaro score for JaroAlgorithm.

    Definition Classes
    ScoreDefinitions
  7. implicit object JaroWinklerScore extends JaroImpl with WeightedScoringAlgorithm[JaroWinklerAlgorithm, Double]

    Permalink

    Implicit definition of jaro winkler score for JaroWinklerAlgorithm.

    Implicit definition of jaro winkler score for JaroWinklerAlgorithm.

    Definition Classes
    ScoreDefinitions
  8. implicit object LevenshteinDistance extends LevenshteinDistanceImpl with DistanceAlgorithm[LevenshteinAlgorithm] with ScorableFromDistance[LevenshteinAlgorithm]

    Permalink

    Implicit definition of levenshtein distance for LevenshteinAlgorithm.

    Implicit definition of levenshtein distance for LevenshteinAlgorithm.

    Definition Classes
    DistanceDefinitions
  9. implicit object LongestCommonSeqDistance extends LongestCommonSeqImpl with DistanceAlgorithm[LongestCommonSeqAlorithm]

    Permalink

    Implicit definition of longest common subsequence for CosineAlgorithm.

    Implicit definition of longest common subsequence for CosineAlgorithm.

    Definition Classes
    DistanceDefinitions
  10. implicit object MetaphoneScore extends MetaphoneImpl with SoundScoringAlgorithm[MetaphoneAlgorithm]

    Permalink

    Implicit definition of metaphone score for MetaphoneAlgorithm.

    Implicit definition of metaphone score for MetaphoneAlgorithm.

    Definition Classes
    SoundDefinitions
  11. implicit object NGramDistance extends NGramImpl with WeightedDistanceAlgorithm[NGramAlgorithm, Int]

    Permalink

    Implicit definition of n-gram distance for NGramAlgorithm.

    Implicit definition of n-gram distance for NGramAlgorithm.

    Definition Classes
    DistanceDefinitions
  12. implicit object NGramScore extends NGramImpl with WeightedScoringAlgorithm[NGramAlgorithm, Int]

    Permalink

    Implicit definition of n-gram score for NGramAlgorithm.

    Implicit definition of n-gram score for NGramAlgorithm.

    Definition Classes
    ScoreDefinitions
  13. implicit object NeedlemanWunschScore extends NeedlemanWunschImpl with WeightedScoringAlgorithm[NeedlemanWunschAlgorithm, ConstantGap]

    Permalink

    Implicit definition of needleman wunsch score for NeedlemanWunschAlgorithm.

    Implicit definition of needleman wunsch score for NeedlemanWunschAlgorithm.

    Definition Classes
    ScoreDefinitions
  14. implicit object OverlapScore extends OverlapImpl with WeightedScoringAlgorithm[OverlapAlgorithm, Int]

    Permalink

    Implicit definition of overlap score for OverlapAlgorithm.

    Implicit definition of overlap score for OverlapAlgorithm.

    Definition Classes
    ScoreDefinitions
  15. implicit object SmithWatermanGotohScore extends SmithWatermanImpl with WeightedScoringAlgorithm[SmithWatermanGotohAlgorithm, ConstantGap]

    Permalink

    Implicit definition of smith waterman gotoh score for SmithWatermanGotohAlgorithm.

    Implicit definition of smith waterman gotoh score for SmithWatermanGotohAlgorithm.

    Definition Classes
    ScoreDefinitions
  16. implicit object SmithWatermanScore extends SmithWatermanImpl with WeightedScoringAlgorithm[SmithWatermanAlgorithm, (Gap, Int)]

    Permalink

    Implicit definition of smith waterman score for SmithWatermanAlgorithm.

    Implicit definition of smith waterman score for SmithWatermanAlgorithm.

    Definition Classes
    ScoreDefinitions
  17. implicit object SoundexScore extends SoundexImpl with SoundScoringAlgorithm[SoundexAlgorithm]

    Permalink

    Implicit definition of soundex score for SoundexAlgorithm.

    Implicit definition of soundex score for SoundexAlgorithm.

    Definition Classes
    SoundDefinitions
  18. object Strategy

    Permalink

    The Strategy object has two strategies(reg ex) expressions on which to split input.

    The Strategy object has two strategies(reg ex) expressions on which to split input. Strategy.splitWord splits a word into a sequence of characters. Strategy.splitSentence splits a sentence into a sequence of words.

  19. object StringConverter

    Permalink

    Object to extend operations to the String class.

    Object to extend operations to the String class.

    import com.github.vickumar1981.stringdistance.StringConverter._
    
    // Scores between two strings
    val cosSimilarity: Double = "hello".cosine("chello")
    val damerau: Double = "martha".damerau("marhta")
    val diceCoefficient: Double = "martha".diceCoefficient("marhta")
    val hamming: Double = "martha".hamming("marhta")
    val jaccard: Double = "karolin".jaccard("kathrin")
    val jaro: Double = "martha".jaro("marhta")
    val jaroWinkler: Double = "martha".jaroWinkler("marhta")
    val levenshtein: Double = "martha".levenshtein("marhta")
    val needlemanWunsch: Double = "martha".needlemanWusnch("marhta")
    val ngramSimilarity: Double = "karolin".nGram("kathrin")
    val bigramSimilarity: Double = "karolin".nGram("kathrin", 2)
    val overlap: Double = "karolin".overlap("kathrin")
    val smithWaterman: Double = "martha".smithWaterman("marhta")
    val smithWatermanGotoh: Double = "martha".smithWatermanGotoh("marhta")
    val tversky: Double = "karolin".tversky("kathrin", 0.5)
    
    // Distances between two strings
    val damerauDist: int = "martha".damerauDist("marhta")
    val hammingDist: Int = "martha".hammingDist("marhta")
    val levenshteinDist: Int = "martha".levenshteinDist("marhta")
    val longestCommonSeq: Int = "martha".longestCommonSeq("marhta")
    val ngramDist: Int = "karolin".nGramDist("kathrin")
    val bigramDist: Int = "karolin".nGramDist("kathrin", 2)
    
    // Phonetic similarity of two strings
    val metaphone: Boolean = "merci".metaphone("mercy")
    val soundex: Boolean = "merci".soundex("mercy")
  20. object StringDistance

    Permalink

    Main class to organize functionality of different string distance algorithms

    Main class to organize functionality of different string distance algorithms

    import com.github.vickumar1981.stringdistance.Strategy
    import com.github.vickumar1981.stringdistance.StringDistance._
    import com.github.vickumar1981.stringdistance.impl.{ConstantGap, LinearGap}
    
    // Scores between strings
    val cosSimilarity: Double = Cosine.score("hello", "chello", Strategy.splitWord)
    val damerau: Double = Damerau.score("martha", "marhta")
    val diceCoefficient: Double = DiceCoefficient.score("martha", "marhta")
    val hamming: Double = Hamming.score("martha", "marhta")
    val jaccard: Double = Jaccard.score("karolin", "kathrin", 1)
    val jaro: Double = Jaro.score("martha", "marhta")
    val jaroWinkler: Double = JaroWinkler.score("martha", "marhta", 0.1)
    val levenshtein: Double = Levenshtein.score("martha", "marhta")
    val needlemanWunsch: Double = NeedlemanWunsch.score("martha", "marhta", ConstantGap())
    val ngramSimilarity: Double = NGram.score("karolin", "kathrin", 1)
    val bigramSimilarity: Double = NGram.score("karolin", "kathrin", 2)
    val overlap: Double = Overlap.score("karolin", "kathrin", 1)
    val smithWaterman: Double = SmithWaterman.score("martha", "marhta", (LinearGap(gapValue = -1), Integer.MAX_VALUE))
    val smithWatermanGotoh: Double = SmithWatermanGotoh.score("martha", "marhta", ConstantGap())
    val tversky: Double = Tversky.score("karolin", "kathrin", 0.5)
    
    // Distances between strings
    val damerauDist: Int = Damerau.distance("martha", "marhta")
    val hammingDist: Int = Hamming.distance("martha", "marhta")
    val levenshteinDist: Int = Levenshtein.distance("martha", "marhta")
    val longestCommonSubSeq: Int = LongestCommonSeq.distance("martha", "marhta")
    val ngramDist: Int = NGram.distance("karolin", "kathrin", 1)
    val bigramDist: Int = NGram.distance("karolin", "kathrin", 2)
  21. object StringSound

    Permalink

    Main class to organize functionality of different phonetic/sound string algorithms

    Main class to organize functionality of different phonetic/sound string algorithms

    import com.github.vickumar1981.stringdistance.StringSound._
    import com.github.vickumar1981.stringdistance.implicits._
    
    // Phonetic similarity between strings
    val metaphone: Boolean = Metaphone.score("merci", "mercy")
    val soundex: Boolean = Soundex.score("merci", "mercy")
  22. implicit object TverskyScore extends JaccardImpl with WeightedScoringAlgorithm[TverskyAlgorithm, Double]

    Permalink

    Implicit definition of tversky score for TverskyAlgorithm.

    Implicit definition of tversky score for TverskyAlgorithm.

    Definition Classes
    ScoreDefinitions
  23. package impl

    Permalink
  24. package implicits

    Permalink
  25. package interfaces

    Permalink
  26. package util

    Permalink

Inherited from SoundDefinitions

Inherited from ScoreDefinitions

Inherited from DistanceDefinitions

Inherited from AnyRef

Inherited from Any

Ungrouped