io.smartdatalake.workflow.action
input DataObject's
output DataObject's
definition of scripts to execute
optional spark sql expression evaluated against SubFeedsExpressionData. If true Action is executed, otherwise skipped. Details see Condition. If there are any rows passing the where clause, a MetricCheckFailed exception is thrown.
Adds a runtime event for this Action
Adds a runtime event for this Action
Adds a runtime metric for this Action
Adds a runtime metric for this Action
Applies the executionMode and stores result in executionModeResult variable
Applies the executionMode and stores result in executionModeResult variable
Executes the main task of an action.
Executes the main task of an action. In this step the data of the SubFeed's is moved from Input- to Output-DataObjects.
SparkSubFeed's to be processed
processed SparkSubFeed's
To be implemented by sub-classes
To be implemented by sub-classes
optional spark sql expression evaluated against SubFeedsExpressionData.
optional spark sql expression evaluated against SubFeedsExpressionData. If true Action is executed, otherwise skipped. Details see Condition. If there are any rows passing the where clause, a MetricCheckFailed exception is thrown.
execution mode for this action.
execution mode for this action.
Returns the factory that can parse this type (that is, type CO
).
Returns the factory that can parse this type (that is, type CO
).
Typically, implementations of this method should return the companion object of the implementing class. The companion object in turn should implement FromConfigFactory.
the factory (object) for this class.
Get potential state of input DataObjects when executionMode is DataObjectStateIncrementalMode.
Get potential state of input DataObjects when executionMode is DataObjectStateIncrementalMode.
Get latest runtime state
Get latest runtime state
Get summarized runtime information for a given ExecutionId.
Get summarized runtime information for a given ExecutionId.
ExecutionId to get runtime information for. If empty runtime information for last ExecutionId are returned.
Get the latest metrics for all DataObjects and a given SDLExecutionId.
Get the latest metrics for all DataObjects and a given SDLExecutionId.
ExecutionId to get metrics for. If empty metrics for last ExecutionId are returned.
A unique identifier for this instance.
A unique identifier for this instance.
Initialize Action with SubFeed's to be processed.
Initialize Action with SubFeed's to be processed. In this step the execution mode is evaluated and the result stored for the exec phase. If successful - the DAG can be built - Spark DataFrame lineage can be built
SparkSubFeed's to be processed
processed SparkSubFeed's
input DataObject's
Input DataObjects To be implemented by subclasses
Input DataObjects To be implemented by subclasses
Additional metadata for the Action
Additional metadata for the Action
Spark SQL condition evaluated as where-clause against dataframe of metrics.
Spark SQL condition evaluated as where-clause against dataframe of metrics. Available columns are dataObjectId, key, value. If there are any rows passing the where clause, a MetricCheckFailed exception is thrown.
provide an implementation of the DAG node id
provide an implementation of the DAG node id
output DataObject's
Output DataObjects To be implemented by subclasses
Output DataObjects To be implemented by subclasses
Executes operations needed after executing an action.
Executes operations needed after executing an action. In this step any task on Input- or Output-DataObjects needed after the main task is executed, e.g. JdbcTableDataObjects postWriteSql or CopyActions deleteInputData.
Executes operations needed to cleanup after executing an action failed.
Executes operations needed to cleanup after executing an action failed.
Implement additional processing logic for SubFeeds after transformation.
Implement additional processing logic for SubFeeds after transformation. Can be implemented by subclass.
Executes operations needed before executing an action.
Executes operations needed before executing an action. In this step any phase on Input- or Output-DataObjects needed before the main task is executed, e.g. JdbcTableDataObjects preWriteSql
Checks before initalization of Action In this step execution condition is evaluated and Action init is skipped if result is false.
Checks before initalization of Action In this step execution condition is evaluated and Action init is skipped if result is false.
Prepare DataObjects prerequisites.
Prepare DataObjects prerequisites. In this step preconditions are prepared & tested: - connections can be created - needed structures exist, e.g Kafka topic or Jdbc table
This runs during the "prepare" phase of the DAG.
Implement additional preprocess logic for SubFeeds before transformation Can be implemented by subclass.
Implement additional preprocess logic for SubFeeds before transformation Can be implemented by subclass.
If filters should be ignored for this feed
If subfeed is recursive (input & output)
Recursive Inputs are DataObjects that are used as Output and Input in the same action.
Recursive Inputs are DataObjects that are used as Output and Input in the same action. This is usually prohibited as it creates loops in the DAG. In special cases this makes sense, i.e. when building a complex comparision/update logic.
Usage: add DataObjects used as Output and Input as outputIds and recursiveInputIds, but not as inputIds.
definition of scripts to execute
Sets the util job description for better traceability in the Spark UI
Sets the util job description for better traceability in the Spark UI
Note: This sets Spark local properties, which are propagated to the respective executor tasks. We rely on this to match metrics back to Actions and DataObjects. As writing to a DataObject on the Driver happens uninterrupted in the same exclusive thread, this is suitable.
phase description (be short...)
This is displayed in ascii graph visualization
This is displayed in ascii graph visualization
Transform subfeed content To be implemented by subclass.
Transform subfeed content To be implemented by subclass.
Transform partition values.
Transform partition values. Can be implemented by subclass.
put configuration validation checks here
put configuration validation checks here
Write subfeed data to output.
Write subfeed data to output. To be implemented by subclass.
If subfeed is recursive (input & output)
false if there was no data to process, otherwise true.
Action execute script after multiple input DataObjects are ready, notifying multiple output DataObjects when script succeeded.
input DataObject's
output DataObject's
definition of scripts to execute
optional spark sql expression evaluated against SubFeedsExpressionData. If true Action is executed, otherwise skipped. Details see Condition. If there are any rows passing the where clause, a MetricCheckFailed exception is thrown.