perceptron

Type Members

case class AveragedPerceptron(tags: Array[String], taggedWordBook: Map[String, String], featuresWeight: Map[String, Map[String, Double]]) extends Serializable with Product

tags
Holds all unique tags based on training
taggedWordBook
Contains non ambiguous words and their tags
featuresWeight
Contains prediction information based on context frequencies

class PerceptronApproach extends AnnotatorApproach[PerceptronModel] with PerceptronTrainingUtils

Trains an averaged Perceptron model to tag words part-of-speech.

Trains an averaged Perceptron model to tag words part-of-speech. Sets a POS tag to each word within a sentence.

For pretrained models please see the PerceptronModel.

The training data needs to be in a Spark DataFrame, where the column needs to consist of Annotations of type POS. The Annotation needs to have member result set to the POS tag and have a "word" mapping to its word inside of member metadata. This DataFrame for training can easily created by the helper class POS.

POS().readDataset(spark, datasetPath).selectExpr("explode(tags) as tags").show(false)
+---------------------------------------------+
|tags                                         |
+---------------------------------------------+
|[pos, 0, 5, NNP, [word -> Pierre], []]       |
|[pos, 7, 12, NNP, [word -> Vinken], []]      |
|[pos, 14, 14, ,, [word -> ,], []]            |
|[pos, 31, 34, MD, [word -> will], []]        |
|[pos, 36, 39, VB, [word -> join], []]        |
|[pos, 41, 43, DT, [word -> the], []]         |
|[pos, 45, 49, NN, [word -> board], []]       |
                      ...

For extended examples of usage, see the Spark NLP Workshop and PerceptronApproach tests.

Example

import spark.implicits._
import com.johnsnowlabs.nlp.base.DocumentAssembler
import com.johnsnowlabs.nlp.annotator.SentenceDetector
import com.johnsnowlabs.nlp.annotators.Tokenizer
import com.johnsnowlabs.nlp.training.POS
import com.johnsnowlabs.nlp.annotators.pos.perceptron.PerceptronApproach
import org.apache.spark.ml.Pipeline

val documentAssembler = new DocumentAssembler()
  .setInputCol("text")
  .setOutputCol("document")

val sentence = new SentenceDetector()
  .setInputCols("document")
  .setOutputCol("sentence")

val tokenizer = new Tokenizer()
  .setInputCols("sentence")
  .setOutputCol("token")

val datasetPath = "src/test/resources/anc-pos-corpus-small/test-training.txt"
val trainingPerceptronDF = POS().readDataset(spark, datasetPath)

val trainedPos = new PerceptronApproach()
  .setInputCols("document", "token")
  .setOutputCol("pos")
  .setPosColumn("tags")
  .fit(trainingPerceptronDF)

val pipeline = new Pipeline().setStages(Array(
  documentAssembler,
  sentence,
  tokenizer,
  trainedPos
))

val data = Seq("To be or not to be, is this the question?").toDF("text")
val result = pipeline.fit(data).transform(data)

result.selectExpr("pos.result").show(false)
+--------------------------------------------------+
|result                                            |
+--------------------------------------------------+
|[NNP, NNP, CD, JJ, NNP, NNP, ,, MD, VB, DT, CD, .]|
+--------------------------------------------------+

class PerceptronApproachDistributed extends AnnotatorApproach[PerceptronModel] with PerceptronTrainingUtils

Distributed Averaged Perceptron model to tag words part-of-speech.
Distributed Averaged Perceptron model to tag words part-of-speech.
Sets a POS tag to each word within a sentence. Its train data (train_pos) is a spark dataset of POS format values with Annotation columns.
See https://github.com/JohnSnowLabs/spark-nlp/blob/master/src/test/scala/com/johnsnowlabs/nlp/annotators/pos/perceptron/DistributedPos.scala for further reference on how to use this APIs.

class PerceptronModel extends AnnotatorModel[PerceptronModel] with HasSimpleAnnotate[PerceptronModel] with PerceptronPredictionUtils

Averaged Perceptron model to tag words part-of-speech.

Averaged Perceptron model to tag words part-of-speech. Sets a POS tag to each word within a sentence.

This is the instantiated model of the PerceptronApproach. For training your own model, please see the documentation of that class.

Pretrained models can be loaded with pretrained of the companion object:

val posTagger = PerceptronModel.pretrained()
  .setInputCols("document", "token")
  .setOutputCol("pos")

The default model is "pos_anc", if no name is provided.

For available pretrained models please see the Models Hub. Additionally, pretrained pipelines are available for this module, see Pipelines.

For extended examples of usage, see the Spark NLP Workshop.

Example

import spark.implicits._
import com.johnsnowlabs.nlp.base.DocumentAssembler
import com.johnsnowlabs.nlp.annotators.Tokenizer
import com.johnsnowlabs.nlp.annotators.pos.perceptron.PerceptronModel
import org.apache.spark.ml.Pipeline

val documentAssembler = new DocumentAssembler()
  .setInputCol("text")
  .setOutputCol("document")

val tokenizer = new Tokenizer()
  .setInputCols("document")
  .setOutputCol("token")

val posTagger = PerceptronModel.pretrained()
  .setInputCols("document", "token")
  .setOutputCol("pos")

val pipeline = new Pipeline().setStages(Array(
  documentAssembler,
  tokenizer,
  posTagger
))

val data = Seq("Peter Pipers employees are picking pecks of pickled peppers").toDF("text")
val result = pipeline.fit(data).transform(data)

result.selectExpr("explode(pos) as pos").show(false)
+-------------------------------------------+
|pos                                        |
+-------------------------------------------+
|[pos, 0, 4, NNP, [word -> Peter], []]      |
|[pos, 6, 11, NNP, [word -> Pipers], []]    |
|[pos, 13, 21, NNS, [word -> employees], []]|
|[pos, 23, 25, VBP, [word -> are], []]      |
|[pos, 27, 33, VBG, [word -> picking], []]  |
|[pos, 35, 39, NNS, [word -> pecks], []]    |
|[pos, 41, 42, IN, [word -> of], []]        |
|[pos, 44, 50, JJ, [word -> pickled], []]   |
|[pos, 52, 58, NNS, [word -> peppers], []]  |
+-------------------------------------------+

trait PerceptronPredictionUtils extends PerceptronUtils
trait PerceptronTrainingUtils extends PerceptronUtils
trait PerceptronUtils extends AnyRef
trait ReadablePretrainedPerceptron extends ParamsAndFeaturesReadable[PerceptronModel] with HasPretrained[PerceptronModel]
class StringMapStringDoubleAccumulator extends AccumulatorV2[(String, Map[String, Double]), Map[String, Map[String, Double]]]
class TrainingPerceptronLegacy extends Serializable
class TupleKeyLongDoubleMapAccumulator extends AccumulatorV2[((String, String), (Long, Double)), Map[(String, String), (Long, Double)]]

Value Members

object PerceptronApproach extends DefaultParamsReadable[PerceptronApproach] with Serializable

This is the companion object of PerceptronApproach.
This is the companion object of PerceptronApproach. Please refer to that class for the documentation.
object PerceptronApproachDistributed extends DefaultParamsReadable[PerceptronApproachDistributed] with Serializable

This is the companion object of PerceptronApproachDistributed.
This is the companion object of PerceptronApproachDistributed. Please refer to that class for the documentation.
object PerceptronModel extends ReadablePretrainedPerceptron with Serializable

This is the companion object of PerceptronModel.
This is the companion object of PerceptronModel. Please refer to that class for the documentation.

package perceptron

Type Members

case class AveragedPerceptron(tags: Array[String], taggedWordBook: Map[String, String], featuresWeight: Map[String, Map[String, Double]]) extends Serializable with Product

class PerceptronApproach extends AnnotatorApproach[PerceptronModel] with PerceptronTrainingUtils

Example

class PerceptronApproachDistributed extends AnnotatorApproach[PerceptronModel] with PerceptronTrainingUtils

class PerceptronModel extends AnnotatorModel[PerceptronModel] with HasSimpleAnnotate[PerceptronModel] with PerceptronPredictionUtils

Example

trait PerceptronPredictionUtils extends PerceptronUtils

trait PerceptronTrainingUtils extends PerceptronUtils

trait PerceptronUtils extends AnyRef

trait ReadablePretrainedPerceptron extends ParamsAndFeaturesReadable[PerceptronModel] with HasPretrained[PerceptronModel]

class StringMapStringDoubleAccumulator extends AccumulatorV2[(String, Map[String, Double]), Map[String, Map[String, Double]]]

class TrainingPerceptronLegacy extends Serializable

class TupleKeyLongDoubleMapAccumulator extends AccumulatorV2[((String, String), (Long, Double)), Map[(String, String), (Long, Double)]]

Value Members

object PerceptronApproach extends DefaultParamsReadable[PerceptronApproach] with Serializable

object PerceptronApproachDistributed extends DefaultParamsReadable[PerceptronApproachDistributed] with Serializable

object PerceptronModel extends ReadablePretrainedPerceptron with Serializable

Ungrouped