Class/Object

io.smartdatalake.workflow.action

CustomScriptAction

Related Docs: object CustomScriptAction | package action

Permalink

case class CustomScriptAction(id: ActionId, inputIds: Seq[DataObjectId], outputIds: Seq[DataObjectId], scripts: Seq[ParsableScriptDef] = Seq(), executionCondition: Option[Condition] = None, metadata: Option[ActionMetadata] = None)(implicit instanceRegistry: InstanceRegistry) extends ScriptActionImpl with Product with Serializable

Action execute script after multiple input DataObjects are ready, notifying multiple output DataObjects when script succeeded.

inputIds

input DataObject's

outputIds

output DataObject's

scripts

definition of scripts to execute

executionCondition

optional spark sql expression evaluated against SubFeedsExpressionData. If true Action is executed, otherwise skipped. Details see Condition. If there are any rows passing the where clause, a MetricCheckFailed exception is thrown.

Linear Supertypes
Serializable, Serializable, Product, Equals, ScriptActionImpl, ActionSubFeedsImpl[ScriptSubFeed], Action, AtlasExportable, SmartDataLakeLogger, DAGNode, ParsableFromConfig[Action], SdlConfigObject, AnyRef, Any
Ordering
  1. Alphabetic
  2. By Inheritance
Inherited
  1. CustomScriptAction
  2. Serializable
  3. Serializable
  4. Product
  5. Equals
  6. ScriptActionImpl
  7. ActionSubFeedsImpl
  8. Action
  9. AtlasExportable
  10. SmartDataLakeLogger
  11. DAGNode
  12. ParsableFromConfig
  13. SdlConfigObject
  14. AnyRef
  15. Any
  1. Hide All
  2. Show All
Visibility
  1. Public
  2. All

Instance Constructors

  1. new CustomScriptAction(id: ActionId, inputIds: Seq[DataObjectId], outputIds: Seq[DataObjectId], scripts: Seq[ParsableScriptDef] = Seq(), executionCondition: Option[Condition] = None, metadata: Option[ActionMetadata] = None)(implicit instanceRegistry: InstanceRegistry)

    Permalink

    inputIds

    input DataObject's

    outputIds

    output DataObject's

    scripts

    definition of scripts to execute

    executionCondition

    optional spark sql expression evaluated against SubFeedsExpressionData. If true Action is executed, otherwise skipped. Details see Condition. If there are any rows passing the where clause, a MetricCheckFailed exception is thrown.

Value Members

  1. final def !=(arg0: Any): Boolean

    Permalink
    Definition Classes
    AnyRef → Any
  2. final def ##(): Int

    Permalink
    Definition Classes
    AnyRef → Any
  3. final def ==(arg0: Any): Boolean

    Permalink
    Definition Classes
    AnyRef → Any
  4. def addRuntimeEvent(executionId: ExecutionId, phase: ExecutionPhase, state: RuntimeEventState, msg: Option[String] = None, results: Seq[SubFeed] = Seq(), tstmp: LocalDateTime = LocalDateTime.now): Unit

    Permalink

    Adds a runtime event for this Action

    Adds a runtime event for this Action

    Definition Classes
    Action
  5. def addRuntimeMetrics(executionId: Option[ExecutionId], dataObjectId: Option[DataObjectId], metric: ActionMetrics): Unit

    Permalink

    Adds a runtime metric for this Action

    Adds a runtime metric for this Action

    Definition Classes
    Action
  6. def applyExecutionMode(mainInput: DataObject, mainOutput: DataObject, subFeed: SubFeed, partitionValuesTransform: (Seq[PartitionValues]) ⇒ Map[PartitionValues, PartitionValues])(implicit session: SparkSession, context: ActionPipelineContext): Unit

    Permalink

    Applies the executionMode and stores result in executionModeResult variable

    Applies the executionMode and stores result in executionModeResult variable

    Attributes
    protected
    Definition Classes
    Action
  7. final def asInstanceOf[T0]: T0

    Permalink
    Definition Classes
    Any
  8. def atlasName: String

    Permalink
    Definition Classes
    Action → AtlasExportable
  9. def atlasQualifiedName(prefix: String): String

    Permalink
    Definition Classes
    AtlasExportable
  10. def clone(): AnyRef

    Permalink
    Attributes
    protected[java.lang]
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  11. final def eq(arg0: AnyRef): Boolean

    Permalink
    Definition Classes
    AnyRef
  12. final def exec(subFeeds: Seq[SubFeed])(implicit session: SparkSession, context: ActionPipelineContext): Seq[SubFeed]

    Permalink

    Executes the main task of an action.

    Executes the main task of an action. In this step the data of the SubFeed's is moved from Input- to Output-DataObjects.

    subFeeds

    SparkSubFeed's to be processed

    returns

    processed SparkSubFeed's

    Definition Classes
    ActionSubFeedsImpl → Action
  13. def execScript(inputSubFeeds: Seq[ScriptSubFeed], outputSubFeeds: Seq[ScriptSubFeed])(implicit session: SparkSession, context: ActionPipelineContext): Seq[ScriptSubFeed]

    Permalink

    To be implemented by sub-classes

    To be implemented by sub-classes

    Attributes
    protected
    Definition Classes
    CustomScriptActionScriptActionImpl
  14. val executionCondition: Option[Condition]

    Permalink

    optional spark sql expression evaluated against SubFeedsExpressionData.

    optional spark sql expression evaluated against SubFeedsExpressionData. If true Action is executed, otherwise skipped. Details see Condition. If there are any rows passing the where clause, a MetricCheckFailed exception is thrown.

    Definition Classes
    CustomScriptAction → Action
  15. var executionConditionResult: Option[(Boolean, Option[String])]

    Permalink
    Attributes
    protected
    Definition Classes
    Action
  16. val executionMode: Option[ExecutionMode]

    Permalink

    execution mode for this action.

    execution mode for this action.

    Definition Classes
    ScriptActionImpl → Action
  17. var executionModeResult: Option[Try[Option[ExecutionModeResult]]]

    Permalink
    Attributes
    protected
    Definition Classes
    Action
  18. def factory: FromConfigFactory[Action]

    Permalink

    Returns the factory that can parse this type (that is, type CO).

    Returns the factory that can parse this type (that is, type CO).

    Typically, implementations of this method should return the companion object of the implementing class. The companion object in turn should implement FromConfigFactory.

    returns

    the factory (object) for this class.

    Definition Classes
    CustomScriptAction → ParsableFromConfig
  19. def finalize(): Unit

    Permalink
    Attributes
    protected[java.lang]
    Definition Classes
    AnyRef
    Annotations
    @throws( classOf[java.lang.Throwable] )
  20. final def getClass(): Class[_]

    Permalink
    Definition Classes
    AnyRef → Any
  21. def getDataObjectsState: Seq[DataObjectState]

    Permalink

    Get potential state of input DataObjects when executionMode is DataObjectStateIncrementalMode.

    Get potential state of input DataObjects when executionMode is DataObjectStateIncrementalMode.

    Definition Classes
    Action
  22. def getInputDataObject[T <: DataObject](id: DataObjectId)(implicit arg0: ClassTag[T], arg1: scala.reflect.api.JavaUniverse.TypeTag[T], registry: InstanceRegistry): T

    Permalink
    Attributes
    protected
    Definition Classes
    Action
  23. def getLatestRuntimeEventState: Option[RuntimeEventState]

    Permalink

    Get latest runtime state

    Get latest runtime state

    Definition Classes
    Action
  24. def getMainInput(inputSubFeeds: Seq[SubFeed])(implicit context: ActionPipelineContext): DataObject

    Permalink
    Attributes
    protected
    Definition Classes
    ActionSubFeedsImpl
  25. def getMainPartitionValues(inputSubFeeds: Seq[SubFeed])(implicit context: ActionPipelineContext): Seq[PartitionValues]

    Permalink
    Attributes
    protected
    Definition Classes
    ActionSubFeedsImpl
  26. def getOutputDataObject[T <: DataObject](id: DataObjectId)(implicit arg0: ClassTag[T], arg1: scala.reflect.api.JavaUniverse.TypeTag[T], registry: InstanceRegistry): T

    Permalink
    Attributes
    protected
    Definition Classes
    Action
  27. def getRuntimeDataImpl: RuntimeData

    Permalink
    Attributes
    protected
    Definition Classes
    Action
  28. def getRuntimeInfo(executionId: Option[ExecutionId] = None): Option[RuntimeInfo]

    Permalink

    Get summarized runtime information for a given ExecutionId.

    Get summarized runtime information for a given ExecutionId.

    executionId

    ExecutionId to get runtime information for. If empty runtime information for last ExecutionId are returned.

    Definition Classes
    Action
  29. def getRuntimeMetrics(executionId: Option[ExecutionId] = None): Map[DataObjectId, Option[ActionMetrics]]

    Permalink

    Get the latest metrics for all DataObjects and a given SDLExecutionId.

    Get the latest metrics for all DataObjects and a given SDLExecutionId.

    executionId

    ExecutionId to get metrics for. If empty metrics for last ExecutionId are returned.

    Definition Classes
    Action
  30. val id: ActionId

    Permalink

    A unique identifier for this instance.

    A unique identifier for this instance.

    Definition Classes
    CustomScriptAction → Action → SdlConfigObject
  31. final def init(subFeeds: Seq[SubFeed])(implicit session: SparkSession, context: ActionPipelineContext): Seq[SubFeed]

    Permalink

    Initialize Action with SubFeed's to be processed.

    Initialize Action with SubFeed's to be processed. In this step the execution mode is evaluated and the result stored for the exec phase. If successful - the DAG can be built - Spark DataFrame lineage can be built

    subFeeds

    SparkSubFeed's to be processed

    returns

    processed SparkSubFeed's

    Definition Classes
    ActionSubFeedsImpl → Action
  32. val inputIds: Seq[DataObjectId]

    Permalink

    input DataObject's

  33. def inputIdsToIgnoreFilter: Seq[DataObjectId]

    Permalink
    Definition Classes
    ActionSubFeedsImpl
  34. val inputs: Seq[DataObject]

    Permalink

    Input DataObjects To be implemented by subclasses

    Input DataObjects To be implemented by subclasses

    Definition Classes
    CustomScriptActionScriptActionImpl → Action
  35. final def isInstanceOf[T0]: Boolean

    Permalink
    Definition Classes
    Any
  36. def logWritingFinished(subFeed: ScriptSubFeed, noData: Option[Boolean], duration: Duration)(implicit session: SparkSession, context: ActionPipelineContext): Unit

    Permalink
    Attributes
    protected
    Definition Classes
    ActionSubFeedsImpl
  37. def logWritingStarted(subFeed: ScriptSubFeed)(implicit session: SparkSession, context: ActionPipelineContext): Unit

    Permalink
    Attributes
    protected
    Definition Classes
    ActionSubFeedsImpl
  38. lazy val logger: Logger

    Permalink
    Attributes
    protected
    Definition Classes
    SmartDataLakeLogger
  39. def mainInputId: Option[DataObjectId]

    Permalink
    Definition Classes
    ActionSubFeedsImpl
  40. lazy val mainOutput: DataObject

    Permalink
    Attributes
    protected
    Definition Classes
    ActionSubFeedsImpl
  41. def mainOutputId: Option[DataObjectId]

    Permalink
    Definition Classes
    ActionSubFeedsImpl
  42. val metadata: Option[ActionMetadata]

    Permalink

    Additional metadata for the Action

    Additional metadata for the Action

    Definition Classes
    CustomScriptAction → Action
  43. def metricsFailCondition: Option[String]

    Permalink

    Spark SQL condition evaluated as where-clause against dataframe of metrics.

    Spark SQL condition evaluated as where-clause against dataframe of metrics. Available columns are dataObjectId, key, value. If there are any rows passing the where clause, a MetricCheckFailed exception is thrown.

    Definition Classes
    ScriptActionImpl → Action
  44. final def ne(arg0: AnyRef): Boolean

    Permalink
    Definition Classes
    AnyRef
  45. def nodeId: String

    Permalink

    provide an implementation of the DAG node id

    provide an implementation of the DAG node id

    Definition Classes
    Action → DAGNode
  46. final def notify(): Unit

    Permalink
    Definition Classes
    AnyRef
  47. final def notifyAll(): Unit

    Permalink
    Definition Classes
    AnyRef
  48. val outputIds: Seq[DataObjectId]

    Permalink

    output DataObject's

  49. val outputs: Seq[DataObject with CanReceiveScriptNotification]

    Permalink

    Output DataObjects To be implemented by subclasses

    Output DataObjects To be implemented by subclasses

    Definition Classes
    CustomScriptActionScriptActionImpl → Action
  50. def postExec(inputSubFeeds: Seq[SubFeed], outputSubFeeds: Seq[SubFeed])(implicit session: SparkSession, context: ActionPipelineContext): Unit

    Permalink

    Executes operations needed after executing an action.

    Executes operations needed after executing an action. In this step any task on Input- or Output-DataObjects needed after the main task is executed, e.g. JdbcTableDataObjects postWriteSql or CopyActions deleteInputData.

    Definition Classes
    ActionSubFeedsImpl → Action
  51. def postExecFailed(implicit session: SparkSession): Unit

    Permalink

    Executes operations needed to cleanup after executing an action failed.

    Executes operations needed to cleanup after executing an action failed.

    Definition Classes
    Action
  52. def postprocessOutputSubFeedCustomized(subFeed: ScriptSubFeed)(implicit session: SparkSession, context: ActionPipelineContext): ScriptSubFeed

    Permalink

    Implement additional processing logic for SubFeeds after transformation.

    Implement additional processing logic for SubFeeds after transformation. Can be implemented by subclass.

    Attributes
    protected
    Definition Classes
    ActionSubFeedsImpl
  53. def postprocessOutputSubFeeds(subFeeds: Seq[ScriptSubFeed])(implicit session: SparkSession, context: ActionPipelineContext): Seq[ScriptSubFeed]

    Permalink
    Definition Classes
    ActionSubFeedsImpl
  54. def preExec(subFeeds: Seq[SubFeed])(implicit session: SparkSession, context: ActionPipelineContext): Unit

    Permalink

    Executes operations needed before executing an action.

    Executes operations needed before executing an action. In this step any phase on Input- or Output-DataObjects needed before the main task is executed, e.g. JdbcTableDataObjects preWriteSql

    Definition Classes
    Action
  55. def preInit(subFeeds: Seq[SubFeed], dataObjectsState: Seq[DataObjectState])(implicit session: SparkSession, context: ActionPipelineContext): Unit

    Permalink

    Checks before initalization of Action In this step execution condition is evaluated and Action init is skipped if result is false.

    Checks before initalization of Action In this step execution condition is evaluated and Action init is skipped if result is false.

    Definition Classes
    Action
  56. def prepare(implicit session: SparkSession, context: ActionPipelineContext): Unit

    Permalink

    Prepare DataObjects prerequisites.

    Prepare DataObjects prerequisites. In this step preconditions are prepared & tested: - connections can be created - needed structures exist, e.g Kafka topic or Jdbc table

    This runs during the "prepare" phase of the DAG.

    Definition Classes
    ActionSubFeedsImpl → Action
  57. def prepareInputSubFeeds(subFeeds: Seq[SubFeed])(implicit session: SparkSession, context: ActionPipelineContext): (Seq[ScriptSubFeed], Seq[ScriptSubFeed])

    Permalink
    Definition Classes
    ActionSubFeedsImpl
  58. def preprocessInputSubFeedCustomized(subFeed: ScriptSubFeed, ignoreFilter: Boolean, isRecursive: Boolean)(implicit session: SparkSession, context: ActionPipelineContext): ScriptSubFeed

    Permalink

    Implement additional preprocess logic for SubFeeds before transformation Can be implemented by subclass.

    Implement additional preprocess logic for SubFeeds before transformation Can be implemented by subclass.

    ignoreFilter

    If filters should be ignored for this feed

    isRecursive

    If subfeed is recursive (input & output)

    Attributes
    protected
    Definition Classes
    ActionSubFeedsImpl
  59. lazy val prioritizedMainInputCandidates: Seq[DataObject]

    Permalink
    Attributes
    protected
    Definition Classes
    ActionSubFeedsImpl
  60. def recursiveInputs: Seq[DataObject]

    Permalink

    Recursive Inputs are DataObjects that are used as Output and Input in the same action.

    Recursive Inputs are DataObjects that are used as Output and Input in the same action. This is usually prohibited as it creates loops in the DAG. In special cases this makes sense, i.e. when building a complex comparision/update logic.

    Usage: add DataObjects used as Output and Input as outputIds and recursiveInputIds, but not as inputIds.

    Definition Classes
    Action
  61. val scripts: Seq[ParsableScriptDef]

    Permalink

    definition of scripts to execute

  62. def setSparkJobMetadata(operation: Option[String] = None)(implicit session: SparkSession, context: ActionPipelineContext): Unit

    Permalink

    Sets the util job description for better traceability in the Spark UI

    Sets the util job description for better traceability in the Spark UI

    Note: This sets Spark local properties, which are propagated to the respective executor tasks. We rely on this to match metrics back to Actions and DataObjects. As writing to a DataObject on the Driver happens uninterrupted in the same exclusive thread, this is suitable.

    operation

    phase description (be short...)

    Definition Classes
    Action
  63. final def synchronized[T0](arg0: ⇒ T0): T0

    Permalink
    Definition Classes
    AnyRef
  64. final def toString(executionId: Option[ExecutionId]): String

    Permalink
    Definition Classes
    Action
  65. final def toString(): String

    Permalink

    This is displayed in ascii graph visualization

    This is displayed in ascii graph visualization

    Definition Classes
    Action → AnyRef → Any
  66. def toStringMedium: String

    Permalink
    Definition Classes
    Action
  67. def toStringShort: String

    Permalink
    Definition Classes
    Action
  68. def transform(inputSubFeeds: Seq[ScriptSubFeed], outputSubFeeds: Seq[ScriptSubFeed])(implicit session: SparkSession, context: ActionPipelineContext): Seq[ScriptSubFeed]

    Permalink

    Transform subfeed content To be implemented by subclass.

    Transform subfeed content To be implemented by subclass.

    Attributes
    protected
    Definition Classes
    ScriptActionImplActionSubFeedsImpl
  69. def transformPartitionValues(partitionValues: Seq[PartitionValues])(implicit session: SparkSession, context: ActionPipelineContext): Map[PartitionValues, PartitionValues]

    Permalink

    Transform partition values.

    Transform partition values. Can be implemented by subclass.

    Attributes
    protected
    Definition Classes
    ActionSubFeedsImpl
  70. def validateConfig(): Unit

    Permalink

    put configuration validation checks here

    put configuration validation checks here

    Definition Classes
    ActionSubFeedsImpl → Action
  71. def validatePartitionValuesExisting(dataObject: DataObject with CanHandlePartitions, subFeed: SubFeed)(implicit session: SparkSession, context: ActionPipelineContext): Unit

    Permalink
    Attributes
    protected
    Definition Classes
    ActionSubFeedsImpl
  72. final def wait(): Unit

    Permalink
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  73. final def wait(arg0: Long, arg1: Int): Unit

    Permalink
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  74. final def wait(arg0: Long): Unit

    Permalink
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  75. def writeOutputSubFeeds(subFeeds: Seq[ScriptSubFeed])(implicit session: SparkSession, context: ActionPipelineContext): Unit

    Permalink
    Definition Classes
    ActionSubFeedsImpl
  76. def writeSubFeed(subFeed: ScriptSubFeed, isRecursive: Boolean)(implicit session: SparkSession, context: ActionPipelineContext): WriteSubFeedResult

    Permalink

    Write subfeed data to output.

    Write subfeed data to output. To be implemented by subclass.

    isRecursive

    If subfeed is recursive (input & output)

    returns

    false if there was no data to process, otherwise true.

    Definition Classes
    ScriptActionImplActionSubFeedsImpl

Inherited from Serializable

Inherited from Serializable

Inherited from Product

Inherited from Equals

Inherited from ScriptActionImpl

Inherited from Action

Inherited from AtlasExportable

Inherited from SmartDataLakeLogger

Inherited from DAGNode

Inherited from ParsableFromConfig[Action]

Inherited from SdlConfigObject

Inherited from AnyRef

Inherited from Any

Ungrouped