io.smartdatalake.workflow.action
inputs DataObject
output DataObject
a custom file transformer, which reads a file from HadoopFileDataObject and writes it back to another HadoopFileDataObject
if the input files should be deleted after processing successfully
number of files per Spark partition
Adds an action event
Adds an action event
Stop propagating input FileRefs through action and instead get new FileRefs from DataObject according to the SubFeed's partitionValue.
Stop propagating input FileRefs through action and instead get new FileRefs from DataObject according to the SubFeed's partitionValue. This is needed to reprocess all files of a path/partition instead of the FileRef's passed from the previous Action.
if the input files should be deleted after processing successfully
if the input files should be deleted after processing successfully
Action.exec implementation
Action.exec implementation
SparkSubFeed's to be processed
processed SparkSubFeed's
Executes Action for a given FileSubFeed
Executes Action for a given FileSubFeed
subFeed to be processed (referencing files to be read)
processed subFeed (referencing files written by this action)
Returns the factory that can parse this type (that is, type CO
).
Returns the factory that can parse this type (that is, type CO
).
Typically, implementations of this method should return the companion object of the implementing class. The companion object in turn should implement FromConfigFactory.
the factory (object) for this class.
number of files per Spark partition
A unique identifier for this instance.
A unique identifier for this instance.
Action.init implementation
Action.init implementation
SparkSubFeed's to be processed
processed SparkSubFeed's
Execution mode if this Action is a start node of a DAG run
Execution mode if this Action is a start node of a DAG run
Initialize Action with a given FileSubFeed Note that this only checks the prerequisits to do the processing and simulates the output FileRef's that would be created.
Initialize Action with a given FileSubFeed Note that this only checks the prerequisits to do the processing and simulates the output FileRef's that would be created.
subFeed to be processed (referencing files to be read)
processed subFeed (referencing files that would be written by this action)
Input FileRefDataObject which can CanCreateInputStream
Input FileRefDataObject which can CanCreateInputStream
inputs DataObject
Input DataObjects To be implemented by subclasses
Input DataObjects To be implemented by subclasses
Additional metadata for the Action
Additional metadata for the Action
provide an implementation of the DAG node id
provide an implementation of the DAG node id
Output FileRefDataObject which can CanCreateOutputStream
Output FileRefDataObject which can CanCreateOutputStream
output DataObject
Output DataObjects To be implemented by subclasses
Output DataObjects To be implemented by subclasses
Executes operations needed after executing an action.
Executes operations needed after executing an action. In this step any operation on Input- or Output-DataObjects needed after the main task is executed, e.g. JdbcTableDataObjects postSql or CopyActions deleteInputData.
Executes operations needed before executing an action.
Executes operations needed before executing an action. In this step any operation on Input- or Output-DataObjects needed before the main task is executed, e.g. JdbcTableDataObjects preSql
Prepare DataObjects prerequisites.
Prepare DataObjects prerequisites. In this step preconditions are prepared & tested: - directories exists or can be created - connections can be created
This runs during the "prepare" operation of the DAG.
Sets the util job description for better traceability in the Spark UI
Sets the util job description for better traceability in the Spark UI
operation description (be short...)
util session
This is displayed in ascii graph visualization
This is displayed in ascii graph visualization
a custom file transformer, which reads a file from HadoopFileDataObject and writes it back to another HadoopFileDataObject
Action to transform files between two Hadoop Data Objects. The transformation is executed in distributed mode on the Spark executors. A custom file transformer must be given, which reads a file from Hadoop and writes it back to Hadoop.
inputs DataObject
output DataObject
a custom file transformer, which reads a file from HadoopFileDataObject and writes it back to another HadoopFileDataObject
if the input files should be deleted after processing successfully
number of files per Spark partition