io.smartdatalake.workflow.action.sparktransformer
name of the transformer
Optional description of the transformer
Number of Spark tasks to create per partition value by repartitioning the DataFrame.
Optional key columns to distribute records over Spark tasks inside a partition value.
Optional description of the transformer
Optional description of the transformer
Returns the factory that can parse this type (that is, type CO
).
Returns the factory that can parse this type (that is, type CO
).
Typically, implementations of this method should return the companion object of the implementing class. The companion object in turn should implement FromConfigFactory.
the factory (object) for this class.
Optional key columns to distribute records over Spark tasks inside a partition value.
name of the transformer
name of the transformer
Number of Spark tasks to create per partition value by repartitioning the DataFrame.
Optional function to implement validations in prepare phase.
Optional function to implement validations in prepare phase.
Function to be implemented to define the transformation between an input and output DataFrame (1:1)
Function to be implemented to define the transformation between an input and output DataFrame (1:1)
Optional function to define the transformation of input to output partition values.
Optional function to define the transformation of input to output partition values. For example this enables to implement aggregations where multiple input partitions are combined into one output partition. Note that the default value is input = output partition values, which should be correct for most use cases.
id of the action which executes this transformation. This is mainly used to prefix error messages.
partition values to transform
Map of input to output partition values. This allows to map partition values forward and backward, which is needed in execution modes. Return None if mapping is 1:1.
Repartition DataFrame For detailled description about repartitioning DataFrames see also SparkRepartitionDef
name of the transformer
Optional description of the transformer
Number of Spark tasks to create per partition value by repartitioning the DataFrame.
Optional key columns to distribute records over Spark tasks inside a partition value.