io.smartdatalake.workflow.action
inputs DataObject
output DataObject
filter of data to be processed by historization. It can be used to exclude historical data not needed to create new history, for performance reasons.
optional list of columns to ignore when comparing two records in historization. Can not be used together with historizeWhitelist.
optional final list of columns to use when comparing two records in historization. Can not be used together with historizeBlacklist.
if true, remove no longer existing columns in Schema Evolution
if true, remove no longer existing columns from nested data types in Schema Evolution. Keeping deleted columns in complex data types has performance impact as all new data in the future has to be converted by a complex function.
optional execution mode if this Action is a start node of a DAG run
Adds an action event
Adds an action event
Stop propagating input DataFrame through action and instead get a new DataFrame from DataObject.
Stop propagating input DataFrame through action and instead get a new DataFrame from DataObject. This can help to save memory and performance if the input DataFrame includes many transformations from previous Actions. The new DataFrame will be initialized according to the SubFeed's partitionValues.
Runtime metrics
Runtime metrics
Note: runtime metrics are disabled by default, because they are only collected when running Actions from an ActionDAG. This is not the case for Tests other use cases. If enabled exceptions are thrown if metrics are not found.
Action.exec implementation
Action.exec implementation
SparkSubFeed's to be processed
processed SparkSubFeed's
Returns the factory that can parse this type (that is, type CO
).
Returns the factory that can parse this type (that is, type CO
).
Typically, implementations of this method should return the companion object of the implementing class. The companion object in turn should implement FromConfigFactory.
the factory (object) for this class.
filter of data to be processed by historization.
filter of data to be processed by historization. It can be used to exclude historical data not needed to create new history, for performance reasons.
get latest runtime state and duration if successfully finished.
get latest runtime state and duration if successfully finished.
optional list of columns to ignore when comparing two records in historization.
optional list of columns to ignore when comparing two records in historization. Can not be used together with historizeWhitelist.
optional final list of columns to use when comparing two records in historization.
optional final list of columns to use when comparing two records in historization. Can not be used together with historizeBlacklist.
A unique identifier for this instance.
A unique identifier for this instance.
if true, remove no longer existing columns in Schema Evolution
if true, remove no longer existing columns from nested data types in Schema Evolution.
if true, remove no longer existing columns from nested data types in Schema Evolution. Keeping deleted columns in complex data types has performance impact as all new data in the future has to be converted by a complex function.
Action.init implementation
Action.init implementation
SparkSubFeed's to be processed
processed SparkSubFeed's
optional execution mode if this Action is a start node of a DAG run
optional execution mode if this Action is a start node of a DAG run
Input DataObject which can CanCreateDataFrame
Input DataObject which can CanCreateDataFrame
inputs DataObject
Input DataObjects To be implemented by subclasses
Input DataObjects To be implemented by subclasses
Additional metadata for the Action
Additional metadata for the Action
provide an implementation of the DAG node id
provide an implementation of the DAG node id
Output DataObject which can CanWriteDataFrame
Output DataObject which can CanWriteDataFrame
output DataObject
Output DataObjects To be implemented by subclasses
Output DataObjects To be implemented by subclasses
Force persisting DataFrame on Disk.
Force persisting DataFrame on Disk. This helps to reduce memory needed for caching the DataFrame content and can serve as a recovery point in case an task get's lost.
Executes operations needed after executing an action.
Executes operations needed after executing an action. In this step any operation on Input- or Output-DataObjects needed after the main task is executed, e.g. JdbcTableDataObjects postSql or CopyActions deleteInputData.
Executes operations needed before executing an action.
Executes operations needed before executing an action. In this step any operation on Input- or Output-DataObjects needed before the main task is executed, e.g. JdbcTableDataObjects preSql
Prepare DataObjects prerequisites.
Prepare DataObjects prerequisites. In this step preconditions are prepared & tested: - directories exists or can be created - connections can be created
This runs during the "prepare" operation of the DAG.
Sets the util job description for better traceability in the Spark UI
Sets the util job description for better traceability in the Spark UI
Note: This sets Spark local properties, which are propagated to the respective executor tasks. We rely on this to match metrics back to Actions and DataObjects. As writing to a DataObject on the Driver happens uninterrupted in the same exclusive thread, this is suitable.
operation description (be short...)
This is displayed in ascii graph visualization
This is displayed in ascii graph visualization
Transform a SparkSubFeed.
Transform a SparkSubFeed. To be implemented by subclasses.
SparkSubFeed to be transformed
transformed SparkSubFeed
Action to historize a subfeed. Historization creates a technical history of data by creating valid-from/to columns. It needs a transactional table as output with defined primary keys.
inputs DataObject
output DataObject
filter of data to be processed by historization. It can be used to exclude historical data not needed to create new history, for performance reasons.
optional list of columns to ignore when comparing two records in historization. Can not be used together with historizeWhitelist.
optional final list of columns to use when comparing two records in historization. Can not be used together with historizeBlacklist.
if true, remove no longer existing columns in Schema Evolution
if true, remove no longer existing columns from nested data types in Schema Evolution. Keeping deleted columns in complex data types has performance impact as all new data in the future has to be converted by a complex function.
optional execution mode if this Action is a start node of a DAG run