Smooth interpolation between containment Jaccard and plain Jaccard, based on character n-grams.
Smooth interpolation between containment Jaccard and plain Jaccard, based on character n-grams. Short strings must match exactly, but longer strings are considered a match if one is a substring of the other.
The final score is (J + F * JC) / (1 + F) in which J is the plain Jaccard JC is the containment Jaccard F = s ** (m - 1) m is the minimum length of the two strings s, l are parameters
String to compare
Other string to compare
Longer values will give a larger penalty to single-character typos
Determines how rapidly F rises with string length
The string length (in characters) for which which the two Jaccard scores have equal weights
This contains a bunch of helper functions stolen from the pipeline code. We need it here to anticipate how well the pipeline will work with the output from science-parse.