com.johnsnowlabs.nlp.annotators.parser.dep
DependencyParserApproach
Companion object DependencyParserApproach
class DependencyParserApproach extends AnnotatorApproach[DependencyParserModel]
Trains an unlabeled parser that finds a grammatical relations between two words in a sentence.
For instantiated/pretrained models, see DependencyParserModel.
Dependency parser provides information about word relationship. For example, dependency parsing can tell you what the subjects and objects of a verb are, as well as which words are modifying (describing) the subject. This can help you find precise answers to specific questions.
The required training data can be set in two different ways (only one can be chosen for a particular model):
- Dependency treebank in the Penn Treebank format set
with
setDependencyTreeBank - Dataset in the CoNLL-U format set with
setConllU
Apart from that, no additional training data is needed.
See DependencyParserApproachTestSpec for further reference on how to use this API.
Example
import spark.implicits._ import com.johnsnowlabs.nlp.base.DocumentAssembler import com.johnsnowlabs.nlp.annotators.sbd.pragmatic.SentenceDetector import com.johnsnowlabs.nlp.annotators.Tokenizer import com.johnsnowlabs.nlp.annotators.pos.perceptron.PerceptronModel import com.johnsnowlabs.nlp.annotators.parser.dep.DependencyParserApproach import org.apache.spark.ml.Pipeline val documentAssembler = new DocumentAssembler() .setInputCol("text") .setOutputCol("document") val sentence = new SentenceDetector() .setInputCols("document") .setOutputCol("sentence") val tokenizer = new Tokenizer() .setInputCols("sentence") .setOutputCol("token") val posTagger = PerceptronModel.pretrained() .setInputCols("sentence", "token") .setOutputCol("pos") val dependencyParserApproach = new DependencyParserApproach() .setInputCols("sentence", "pos", "token") .setOutputCol("dependency") .setDependencyTreeBank("src/test/resources/parser/unlabeled/dependency_treebank") val pipeline = new Pipeline().setStages(Array( documentAssembler, sentence, tokenizer, posTagger, dependencyParserApproach )) // Additional training data is not needed, the dependency parser relies on the dependency tree bank / CoNLL-U only. val emptyDataSet = Seq.empty[String].toDF("text") val pipelineModel = pipeline.fit(emptyDataSet)
- See also
TypedDependencyParserApproach to extract labels for the dependencies
- Grouped
- Alphabetic
- By Inheritance
- DependencyParserApproach
- AnnotatorApproach
- CanBeLazy
- DefaultParamsWritable
- MLWritable
- HasOutputAnnotatorType
- HasOutputAnnotationCol
- HasInputAnnotationCols
- Estimator
- PipelineStage
- Logging
- Params
- Serializable
- Identifiable
- AnyRef
- Any
- Hide All
- Show All
- Public
- Protected
Type Members
- type AnnotatorType = String
- Definition Classes
- HasOutputAnnotatorType
Value Members
- final def !=(arg0: Any): Boolean
- Definition Classes
- AnyRef → Any
- final def ##: Int
- Definition Classes
- AnyRef → Any
- final def $[T](param: Param[T]): T
- Attributes
- protected
- Definition Classes
- Params
- final def ==(arg0: Any): Boolean
- Definition Classes
- AnyRef → Any
- def _fit(dataset: Dataset[_], recursiveStages: Option[PipelineModel]): DependencyParserModel
- Attributes
- protected
- Definition Classes
- AnnotatorApproach
- final def asInstanceOf[T0]: T0
- Definition Classes
- Any
- def beforeTraining(spark: SparkSession): Unit
- Definition Classes
- AnnotatorApproach
- final def checkSchema(schema: StructType, inputAnnotatorType: String): Boolean
- Attributes
- protected
- Definition Classes
- HasInputAnnotationCols
- final def clear(param: Param[_]): DependencyParserApproach.this.type
- Definition Classes
- Params
- def clone(): AnyRef
- Attributes
- protected[lang]
- Definition Classes
- AnyRef
- Annotations
- @throws(classOf[java.lang.CloneNotSupportedException]) @HotSpotIntrinsicCandidate() @native()
- val conllU: ExternalResourceParam
Universal Dependencies source files
- final def copy(extra: ParamMap): Estimator[DependencyParserModel]
- Definition Classes
- AnnotatorApproach → Estimator → PipelineStage → Params
- def copyValues[T <: Params](to: T, extra: ParamMap): T
- Attributes
- protected
- Definition Classes
- Params
- final def defaultCopy[T <: Params](extra: ParamMap): T
- Attributes
- protected
- Definition Classes
- Params
- val dependencyTreeBank: ExternalResourceParam
Dependency treebank source files
- val description: String
- Definition Classes
- DependencyParserApproach → AnnotatorApproach
- final def eq(arg0: AnyRef): Boolean
- Definition Classes
- AnyRef
- def equals(arg0: AnyRef): Boolean
- Definition Classes
- AnyRef → Any
- def explainParam(param: Param[_]): String
- Definition Classes
- Params
- def explainParams(): String
- Definition Classes
- Params
- final def extractParamMap(): ParamMap
- Definition Classes
- Params
- final def extractParamMap(extra: ParamMap): ParamMap
- Definition Classes
- Params
- final def fit(dataset: Dataset[_]): DependencyParserModel
- Definition Classes
- AnnotatorApproach → Estimator
- def fit(dataset: Dataset[_], paramMaps: Seq[ParamMap]): Seq[DependencyParserModel]
- Definition Classes
- Estimator
- Annotations
- @Since("2.0.0")
- def fit(dataset: Dataset[_], paramMap: ParamMap): DependencyParserModel
- Definition Classes
- Estimator
- Annotations
- @Since("2.0.0")
- def fit(dataset: Dataset[_], firstParamPair: ParamPair[_], otherParamPairs: ParamPair[_]*): DependencyParserModel
- Definition Classes
- Estimator
- Annotations
- @varargs() @Since("2.0.0")
- final def get[T](param: Param[T]): Option[T]
- Definition Classes
- Params
- final def getClass(): Class[_ <: AnyRef]
- Definition Classes
- AnyRef → Any
- Annotations
- @HotSpotIntrinsicCandidate() @native()
- final def getDefault[T](param: Param[T]): Option[T]
- Definition Classes
- Params
- def getFilesContentTreeBank: Seq[Iterator[String]]
Gets a iterable TreeBank
- def getInputCols: Array[String]
- returns
input annotations columns currently used
- Definition Classes
- HasInputAnnotationCols
- def getLazyAnnotator: Boolean
- Definition Classes
- CanBeLazy
- def getNumberOfIterations: Int
Number of iterations in training, converges to better accuracy
- final def getOrDefault[T](param: Param[T]): T
- Definition Classes
- Params
- final def getOutputCol: String
Gets annotation column name going to generate
Gets annotation column name going to generate
- Definition Classes
- HasOutputAnnotationCol
- def getParam(paramName: String): Param[Any]
- Definition Classes
- Params
- def getTrainingSentences: List[Sentence]
Gets a list of ConnlU training sentences
- def getTrainingSentencesFromConllU(conllUAsArray: Array[String]): List[Sentence]
- final def hasDefault[T](param: Param[T]): Boolean
- Definition Classes
- Params
- def hasParam(paramName: String): Boolean
- Definition Classes
- Params
- def hashCode(): Int
- Definition Classes
- AnyRef → Any
- Annotations
- @HotSpotIntrinsicCandidate() @native()
- def initializeLogIfNecessary(isInterpreter: Boolean, silent: Boolean): Boolean
- Attributes
- protected
- Definition Classes
- Logging
- def initializeLogIfNecessary(isInterpreter: Boolean): Unit
- Attributes
- protected
- Definition Classes
- Logging
- val inputAnnotatorTypes: Array[String]
Input annotation type : DOCUMENT, POS, TOKEN
Input annotation type : DOCUMENT, POS, TOKEN
- Definition Classes
- DependencyParserApproach → HasInputAnnotationCols
- final val inputCols: StringArrayParam
columns that contain annotations necessary to run this annotator AnnotatorType is used both as input and output columns if not specified
columns that contain annotations necessary to run this annotator AnnotatorType is used both as input and output columns if not specified
- Attributes
- protected
- Definition Classes
- HasInputAnnotationCols
- final def isDefined(param: Param[_]): Boolean
- Definition Classes
- Params
- final def isInstanceOf[T0]: Boolean
- Definition Classes
- Any
- final def isSet(param: Param[_]): Boolean
- Definition Classes
- Params
- def isTraceEnabled(): Boolean
- Attributes
- protected
- Definition Classes
- Logging
- val lazyAnnotator: BooleanParam
- Definition Classes
- CanBeLazy
- def lineIsComment(line: String): Boolean
- def log: Logger
- Attributes
- protected
- Definition Classes
- Logging
- def logDebug(msg: => String, throwable: Throwable): Unit
- Attributes
- protected
- Definition Classes
- Logging
- def logDebug(msg: => String): Unit
- Attributes
- protected
- Definition Classes
- Logging
- def logError(msg: => String, throwable: Throwable): Unit
- Attributes
- protected
- Definition Classes
- Logging
- def logError(msg: => String): Unit
- Attributes
- protected
- Definition Classes
- Logging
- def logInfo(msg: => String, throwable: Throwable): Unit
- Attributes
- protected
- Definition Classes
- Logging
- def logInfo(msg: => String): Unit
- Attributes
- protected
- Definition Classes
- Logging
- def logName: String
- Attributes
- protected
- Definition Classes
- Logging
- def logTrace(msg: => String, throwable: Throwable): Unit
- Attributes
- protected
- Definition Classes
- Logging
- def logTrace(msg: => String): Unit
- Attributes
- protected
- Definition Classes
- Logging
- def logWarning(msg: => String, throwable: Throwable): Unit
- Attributes
- protected
- Definition Classes
- Logging
- def logWarning(msg: => String): Unit
- Attributes
- protected
- Definition Classes
- Logging
- def msgHelper(schema: StructType): String
- Attributes
- protected
- Definition Classes
- HasInputAnnotationCols
- final def ne(arg0: AnyRef): Boolean
- Definition Classes
- AnyRef
- final def notify(): Unit
- Definition Classes
- AnyRef
- Annotations
- @HotSpotIntrinsicCandidate() @native()
- final def notifyAll(): Unit
- Definition Classes
- AnyRef
- Annotations
- @HotSpotIntrinsicCandidate() @native()
- val numberOfIterations: IntParam
Number of iterations in training, converges to better accuracy (Default:
10) - def onTrained(model: DependencyParserModel, spark: SparkSession): Unit
- Definition Classes
- AnnotatorApproach
- val optionalInputAnnotatorTypes: Array[String]
- Definition Classes
- HasInputAnnotationCols
- val outputAnnotatorType: String
Output annotation type : DEPENDENCY
Output annotation type : DEPENDENCY
- Definition Classes
- DependencyParserApproach → HasOutputAnnotatorType
- final val outputCol: Param[String]
- Attributes
- protected
- Definition Classes
- HasOutputAnnotationCol
- lazy val params: Array[Param[_]]
- Definition Classes
- Params
- def readCONLL(filesContent: Seq[Iterator[String]]): List[Sentence]
- def save(path: String): Unit
- Definition Classes
- MLWritable
- Annotations
- @throws("If the input path already exists but overwrite is not enabled.") @Since("1.6.0")
- final def set(paramPair: ParamPair[_]): DependencyParserApproach.this.type
- Attributes
- protected
- Definition Classes
- Params
- final def set(param: String, value: Any): DependencyParserApproach.this.type
- Attributes
- protected
- Definition Classes
- Params
- final def set[T](param: Param[T], value: T): DependencyParserApproach.this.type
- Definition Classes
- Params
- def setConllU(path: String, readAs: Format = ReadAs.TEXT, options: Map[String, String] = Map.empty[String, String]): DependencyParserApproach.this.type
Path to a file in CoNLL-U format
- final def setDefault(paramPairs: ParamPair[_]*): DependencyParserApproach.this.type
- Attributes
- protected
- Definition Classes
- Params
- final def setDefault[T](param: Param[T], value: T): DependencyParserApproach.this.type
- Attributes
- protected[org.apache.spark.ml]
- Definition Classes
- Params
- def setDependencyTreeBank(path: String, readAs: Format = ReadAs.TEXT, options: Map[String, String] = Map.empty[String, String]): DependencyParserApproach.this.type
Dependency treebank folder with files in Penn Treebank format
- final def setInputCols(value: String*): DependencyParserApproach.this.type
- Definition Classes
- HasInputAnnotationCols
- def setInputCols(value: Array[String]): DependencyParserApproach.this.type
Overrides required annotators column if different than default
Overrides required annotators column if different than default
- Definition Classes
- HasInputAnnotationCols
- def setLazyAnnotator(value: Boolean): DependencyParserApproach.this.type
- Definition Classes
- CanBeLazy
- def setNumberOfIterations(value: Int): DependencyParserApproach.this.type
Number of iterations in training, converges to better accuracy
- final def setOutputCol(value: String): DependencyParserApproach.this.type
Overrides annotation column name when transforming
Overrides annotation column name when transforming
- Definition Classes
- HasOutputAnnotationCol
- final def synchronized[T0](arg0: => T0): T0
- Definition Classes
- AnyRef
- def toString(): String
- Definition Classes
- Identifiable → AnyRef → Any
- def train(dataset: Dataset[_], recursivePipeline: Option[PipelineModel]): DependencyParserModel
- Definition Classes
- DependencyParserApproach → AnnotatorApproach
- final def transformSchema(schema: StructType): StructType
requirement for pipeline transformation validation.
requirement for pipeline transformation validation. It is called on fit()
- Definition Classes
- AnnotatorApproach → PipelineStage
- def transformSchema(schema: StructType, logging: Boolean): StructType
- Attributes
- protected
- Definition Classes
- PipelineStage
- Annotations
- @DeveloperApi()
- def transformToSentences(cleanConllUSentence: Array[String]): Sentence
- val uid: String
- Definition Classes
- DependencyParserApproach → Identifiable
- def validate(schema: StructType): Boolean
takes a Dataset and checks to see if all the required annotation types are present.
takes a Dataset and checks to see if all the required annotation types are present.
- schema
to be validated
- returns
True if all the required types are present, else false
- Attributes
- protected
- Definition Classes
- AnnotatorApproach
- def validateTrainingFiles(): Unit
- final def wait(arg0: Long, arg1: Int): Unit
- Definition Classes
- AnyRef
- Annotations
- @throws(classOf[java.lang.InterruptedException])
- final def wait(arg0: Long): Unit
- Definition Classes
- AnyRef
- Annotations
- @throws(classOf[java.lang.InterruptedException]) @native()
- final def wait(): Unit
- Definition Classes
- AnyRef
- Annotations
- @throws(classOf[java.lang.InterruptedException])
- def write: MLWriter
- Definition Classes
- DefaultParamsWritable → MLWritable
Deprecated Value Members
- def finalize(): Unit
- Attributes
- protected[lang]
- Definition Classes
- AnyRef
- Annotations
- @throws(classOf[java.lang.Throwable]) @Deprecated
- Deprecated
(Since version 9)
Inherited from AnnotatorApproach[DependencyParserModel]
Inherited from CanBeLazy
Inherited from DefaultParamsWritable
Inherited from MLWritable
Inherited from HasOutputAnnotatorType
Inherited from HasOutputAnnotationCol
Inherited from HasInputAnnotationCols
Inherited from Estimator[DependencyParserModel]
Inherited from PipelineStage
Inherited from Logging
Inherited from Params
Inherited from Serializable
Inherited from Identifiable
Inherited from AnyRef
Inherited from Any
Parameters
A list of (hyper-)parameter keys this annotator can take. Users can set and get the parameter values through setters and getters, respectively.
Annotator types
Required input and expected output annotator types