Stop propagating input FileRefs through action and instead get new FileRefs from DataObject according to the SubFeed's partitionValue.
Stop propagating input FileRefs through action and instead get new FileRefs from DataObject according to the SubFeed's partitionValue. This is needed to reprocess all files of a path/partition instead of the FileRef's passed from the previous Action.
If true delete files after they are successfully processed.
Executes Action for a given FileSubFeed
Executes Action for a given FileSubFeed
subFeed to be processed (referencing files to be read)
processed subFeed (referencing files written by this action)
Execution mode if this Action is a start node of a DAG run
Returns the factory that can parse this type (that is, type CO
).
Returns the factory that can parse this type (that is, type CO
).
Typically, implementations of this method should return the companion object of the implementing class. The companion object in turn should implement FromConfigFactory.
the factory (object) for this class.
A unique identifier for this instance.
A unique identifier for this instance.
Initialize Action with a given FileSubFeed Note that this only checks the prerequisits to do the processing and simulates the output FileRef's that would be created.
Initialize Action with a given FileSubFeed Note that this only checks the prerequisits to do the processing and simulates the output FileRef's that would be created.
subFeed to be processed (referencing files to be read)
processed subFeed (referencing files that would be written by this action)
Input FileRefDataObject which can CanCreateInputStream
Input DataObjects To be implemented by subclasses
Input DataObjects To be implemented by subclasses
Additional metadata for the Action
Additional metadata for the Action
Spark SQL condition evaluated as where-clause against dataframe of metrics.
Spark SQL condition evaluated as where-clause against dataframe of metrics. Available columns are dataObjectId, key, value. If there are any rows passing the where clause, a MetricCheckFailed exception is thrown.
Output FileRefDataObject which can CanCreateOutputStream
Output DataObjects To be implemented by subclasses
Output DataObjects To be implemented by subclasses
Adds an action event
Adds an action event
Runtime metrics
Runtime metrics
Note: runtime metrics are disabled by default, because they are only collected when running Actions from an ActionDAG. This is not the case for Tests or other use cases. If enabled exceptions are thrown if metrics are not found.
Action.exec implementation
Action.exec implementation
SparkSubFeed's to be processed
processed SparkSubFeed's
get latest runtime state
get latest runtime state
get latest runtime information for this action
get latest runtime information for this action
Action.init implementation
Action.init implementation
SparkSubFeed's to be processed
processed SparkSubFeed's
provide an implementation of the DAG node id
provide an implementation of the DAG node id
Executes operations needed after executing an action.
Executes operations needed after executing an action. In this step any phase on Input- or Output-DataObjects needed after the main task is executed, e.g. JdbcTableDataObjects postWriteSql or CopyActions deleteInputData.
Executes operations needed before executing an action.
Executes operations needed before executing an action. In this step any phase on Input- or Output-DataObjects needed before the main task is executed, e.g. JdbcTableDataObjects preWriteSql
Prepare DataObjects prerequisites.
Prepare DataObjects prerequisites. In this step preconditions are prepared & tested: - connections can be created - needed structures exist, e.g Kafka topic or Jdbc table
This runs during the "prepare" phase of the DAG.
Recursive Inputs on FileSubFeeds are not supported so empty Seq is set.
Recursive Inputs on FileSubFeeds are not supported so empty Seq is set.
Resets the runtime state of this Action This is mainly used for testing
Resets the runtime state of this Action This is mainly used for testing
Sets the util job description for better traceability in the Spark UI
Sets the util job description for better traceability in the Spark UI
Note: This sets Spark local properties, which are propagated to the respective executor tasks. We rely on this to match metrics back to Actions and DataObjects. As writing to a DataObject on the Driver happens uninterrupted in the same exclusive thread, this is suitable.
phase description (be short...)
This is displayed in ascii graph visualization
This is displayed in ascii graph visualization