Trains a Symmetric Delete spelling correction algorithm.
Symmetric Delete spelling correction algorithm.
Symmetric Delete spelling correction algorithm.
The Symmetric Delete spelling correction algorithm reduces the complexity of edit candidate generation and dictionary lookup for a given Damerau-Levenshtein distance. It is six orders of magnitude faster (than the standard approach with deletes + transposes + replaces + inserts) and language independent.
Inspired by SymSpell.
Pretrained models can be loaded with pretrained
of the companion object:
val spell = SymmetricDeleteModel.pretrained() .setInputCols("token") .setOutputCol("spell")
The default model is "spellcheck_sd"
, if no name is provided.
For available pretrained models please see the Models Hub.
See SymmetricDeleteModelTestSpec for further reference.
import spark.implicits._ import com.johnsnowlabs.nlp.base.DocumentAssembler import com.johnsnowlabs.nlp.annotators.Tokenizer import com.johnsnowlabs.nlp.annotators.spell.symmetric.SymmetricDeleteModel import org.apache.spark.ml.Pipeline val documentAssembler = new DocumentAssembler() .setInputCol("text") .setOutputCol("document") val tokenizer = new Tokenizer() .setInputCols("document") .setOutputCol("token") val spellChecker = SymmetricDeleteModel.pretrained() .setInputCols("token") .setOutputCol("spell") val pipeline = new Pipeline().setStages(Array( documentAssembler, tokenizer, spellChecker )) val data = Seq("spmetimes i wrrite wordz erong.").toDF("text") val result = pipeline.fit(data).transform(data) result.select("spell.result").show(false) +--------------------------------------+ |result | +--------------------------------------+ |[sometimes, i, write, words, wrong, .]| +--------------------------------------+
ContextSpellCheckerModel for a DL based approach
NorvigSweetingModel for an alternative approach to spell checking
This is the companion object of SymmetricDeleteApproach.
This is the companion object of SymmetricDeleteApproach. Please refer to that class for the documentation.
This is the companion object of SymmetricDeleteModel.
This is the companion object of SymmetricDeleteModel. Please refer to that class for the documentation.
Trains a Symmetric Delete spelling correction algorithm. Retrieves tokens and utilizes distance metrics to compute possible derived words.
Inspired by SymSpell.
For instantiated/pretrained models, see SymmetricDeleteModel.
See SymmetricDeleteModelTestSpec for further reference.
Example
In this example, the dictionary
"words.txt"
has the form ofThis dictionary is then set to be the basis of the spell checker.
ContextSpellCheckerApproach for a DL based approach
NorvigSweetingApproach for an alternative approach to spell checking