Trains a deep-learning based Noisy Channel Model Spell Algorithm.
Implements a deep-learning based Noisy Channel Model Spell Algorithm.
Implements a deep-learning based Noisy Channel Model Spell Algorithm. Correction candidates are extracted combining context information and word information.
Spell Checking is a sequence to sequence mapping problem. Given an input sequence, potentially containing a
certain number of errors, ContextSpellChecker
will rank correction sequences according to three things:
For an in-depth explanation of the module see the article Applying Context Aware Spell Checking in Spark NLP.
This is the instantiated model of the ContextSpellCheckerApproach. For training your own model, please see the documentation of that class.
Pretrained models can be loaded with pretrained
of the companion object:
val spellChecker = ContextSpellCheckerModel.pretrained() .setInputCols("token") .setOutputCol("checked")
The default model is "spellcheck_dl"
, if no name is provided.
For available pretrained models please see the Models Hub.
For extended examples of usage, see the Spark NLP Workshop and the ContextSpellCheckerTestSpec.
import spark.implicits._ import com.johnsnowlabs.nlp.DocumentAssembler import com.johnsnowlabs.nlp.annotators.Tokenizer import com.johnsnowlabs.nlp.annotators.spell.context.ContextSpellCheckerModel import org.apache.spark.ml.Pipeline val documentAssembler = new DocumentAssembler() .setInputCol("text") .setOutputCol("doc") val tokenizer = new Tokenizer() .setInputCols(Array("doc")) .setOutputCol("token") val spellChecker = ContextSpellCheckerModel .pretrained() .setTradeOff(12.0f) .setInputCols("token") .setOutputCol("checked") val pipeline = new Pipeline().setStages(Array( documentAssembler, tokenizer, spellChecker )) val data = Seq("It was a cold , dreary day and the country was white with smow .").toDF("text") val result = pipeline.fit(data).transform(data) result.select("checked.result").show(false) +--------------------------------------------------------------------------------+ |result | +--------------------------------------------------------------------------------+ |[It, was, a, cold, ,, dreary, day, and, the, country, was, white, with, snow, .]| +--------------------------------------------------------------------------------+
NorvigSweetingModel and SymmetricDeleteModel for alternative approaches to spell checking
This is the companion object of ContextSpellCheckerModel.
This is the companion object of ContextSpellCheckerModel. Please refer to that class for the documentation.
Trains a deep-learning based Noisy Channel Model Spell Algorithm. Correction candidates are extracted combining context information and word information.
For instantiated/pretrained models, see ContextSpellCheckerModel.
Spell Checking is a sequence to sequence mapping problem. Given an input sequence, potentially containing a certain number of errors,
ContextSpellChecker
will rank correction sequences according to three things:For an in-depth explanation of the module see the article Applying Context Aware Spell Checking in Spark NLP.
For extended examples of usage, see the article Training a Contextual Spell Checker for Italian Language, the Spark NLP Workshop and the ContextSpellCheckerTestSpec.
Example
For this example, we use the first Sherlock Holmes book as the training dataset.
NorvigSweetingApproach and SymmetricDeleteApproach for alternative approaches to spell checking