context

Type Members

class ContextSpellCheckerApproach extends AnnotatorApproach[ContextSpellCheckerModel] with HasFeatures with WeightedLevenshtein

Trains a deep-learning based Noisy Channel Model Spell Algorithm.
Trains a deep-learning based Noisy Channel Model Spell Algorithm. Correction candidates are extracted combining context information and word information.
For instantiated/pretrained models, see ContextSpellCheckerModel.
Spell Checking is a sequence to sequence mapping problem. Given an input sequence, potentially containing a certain number of errors, ContextSpellChecker will rank correction sequences according to three things:
1. Different correction candidates for each word — word level.
2. The surrounding text of each word, i.e. it’s context — sentence level.
3. The relative cost of different correction candidates according to the edit operations at the character level it requires — subword level.
For an in-depth explanation of the module see the article Applying Context Aware Spell Checking in Spark NLP.
For extended examples of usage, see the article Training a Contextual Spell Checker for Italian Language, the Spark NLP Workshop and the ContextSpellCheckerTestSpec.
Example
For this example, we use the first Sherlock Holmes book as the training dataset.
```
import spark.implicits._
import com.johnsnowlabs.nlp.base.DocumentAssembler
import com.johnsnowlabs.nlp.annotators.Tokenizer
import com.johnsnowlabs.nlp.annotators.spell.context.ContextSpellCheckerApproach

import org.apache.spark.ml.Pipeline

val documentAssembler = new DocumentAssembler()
  .setInputCol("text")
  .setOutputCol("document")


val tokenizer = new Tokenizer()
  .setInputCols("document")
  .setOutputCol("token")

val spellChecker = new ContextSpellCheckerApproach()
  .setInputCols("token")
  .setOutputCol("corrected")
  .setWordMaxDistance(3)
  .setBatchSize(24)
  .setEpochs(8)
  .setLanguageModelClasses(1650)  // dependant on vocabulary size
  // .addVocabClass("_NAME_", names) // Extra classes for correction could be added like this

val pipeline = new Pipeline().setStages(Array(
  documentAssembler,
  tokenizer,
  spellChecker
))

val path = "src/test/resources/spell/sherlockholmes.txt"
val dataset = spark.sparkContext.textFile(path)
  .toDF("text")
val pipelineModel = pipeline.fit(dataset)
```
See also
NorvigSweetingApproach and SymmetricDeleteApproach for alternative approaches to spell checking
class ContextSpellCheckerModel extends AnnotatorModel[ContextSpellCheckerModel] with HasSimpleAnnotate[ContextSpellCheckerModel] with WeightedLevenshtein with WriteTensorflowModel with ParamsAndFeaturesWritable with HasTransducerFeatures

Implements a deep-learning based Noisy Channel Model Spell Algorithm.
Implements a deep-learning based Noisy Channel Model Spell Algorithm. Correction candidates are extracted combining context information and word information.
Spell Checking is a sequence to sequence mapping problem. Given an input sequence, potentially containing a certain number of errors, ContextSpellChecker will rank correction sequences according to three things:
1. Different correction candidates for each word — word level.
2. The surrounding text of each word, i.e. it’s context — sentence level.
3. The relative cost of different correction candidates according to the edit operations at the character level it requires — subword level.
For an in-depth explanation of the module see the article Applying Context Aware Spell Checking in Spark NLP.
This is the instantiated model of the ContextSpellCheckerApproach. For training your own model, please see the documentation of that class.
Pretrained models can be loaded with pretrained of the companion object:
```
val spellChecker = ContextSpellCheckerModel.pretrained()
  .setInputCols("token")
  .setOutputCol("checked")
```
The default model is "spellcheck_dl", if no name is provided. For available pretrained models please see the Models Hub.
For extended examples of usage, see the Spark NLP Workshop and the ContextSpellCheckerTestSpec.
Example
```
import spark.implicits._
import com.johnsnowlabs.nlp.DocumentAssembler
import com.johnsnowlabs.nlp.annotators.Tokenizer
import com.johnsnowlabs.nlp.annotators.spell.context.ContextSpellCheckerModel
import org.apache.spark.ml.Pipeline

val documentAssembler = new DocumentAssembler()
  .setInputCol("text")
  .setOutputCol("doc")

val tokenizer = new Tokenizer()
  .setInputCols(Array("doc"))
  .setOutputCol("token")

val spellChecker = ContextSpellCheckerModel
  .pretrained()
  .setTradeOff(12.0f)
  .setInputCols("token")
  .setOutputCol("checked")

val pipeline = new Pipeline().setStages(Array(
  documentAssembler,
  tokenizer,
  spellChecker
))

val data = Seq("It was a cold , dreary day and the country was white with smow .").toDF("text")
val result = pipeline.fit(data).transform(data)

result.select("checked.result").show(false)
+--------------------------------------------------------------------------------+
|result                                                                          |
+--------------------------------------------------------------------------------+
|[It, was, a, cold, ,, dreary, day, and, the, country, was, white, with, snow, .]|
+--------------------------------------------------------------------------------+
```
See also
NorvigSweetingModel and SymmetricDeleteModel for alternative approaches to spell checking
trait HasTransducerFeatures extends HasFeatures
case class LangModelSentence(ids: Array[Int], cids: Array[Int], cwids: Array[Int], len: Int) extends Product with Serializable
trait ReadablePretrainedContextSpell extends ReadsLanguageModelGraph with HasPretrained[ContextSpellCheckerModel]
trait ReadsLanguageModelGraph extends ParamsAndFeaturesReadable[ContextSpellCheckerModel] with ReadTensorflowModel
trait WeightedLevenshtein extends AnyRef

Value Members

object CandidateStrategy
object ContextSpellCheckerModel extends ReadablePretrainedContextSpell with Serializable

This is the companion object of ContextSpellCheckerModel.
This is the companion object of ContextSpellCheckerModel. Please refer to that class for the documentation.
package parser

package context

Type Members

class ContextSpellCheckerApproach extends AnnotatorApproach[ContextSpellCheckerModel] with HasFeatures with WeightedLevenshtein

Example

class ContextSpellCheckerModel extends AnnotatorModel[ContextSpellCheckerModel] with HasSimpleAnnotate[ContextSpellCheckerModel] with WeightedLevenshtein with WriteTensorflowModel with ParamsAndFeaturesWritable with HasTransducerFeatures

Example

trait HasTransducerFeatures extends HasFeatures

case class LangModelSentence(ids: Array[Int], cids: Array[Int], cwids: Array[Int], len: Int) extends Product with Serializable

trait ReadablePretrainedContextSpell extends ReadsLanguageModelGraph with HasPretrained[ContextSpellCheckerModel]

trait ReadsLanguageModelGraph extends ParamsAndFeaturesReadable[ContextSpellCheckerModel] with ReadTensorflowModel

trait WeightedLevenshtein extends AnyRef

Value Members

object CandidateStrategy

object ContextSpellCheckerModel extends ReadablePretrainedContextSpell with Serializable

package parser

Ungrouped