Package

io.smartdatalake.workflow

action

Permalink

package action

Visibility
  1. Public
  2. All

Type Members

  1. case class ActionMetadata(name: Option[String] = None, description: Option[String] = None, feed: Option[String] = None, tags: Seq[String] = Seq()) extends Product with Serializable

    Permalink

    Additional metadata for a Action

    Additional metadata for a Action

    name

    Readable name of the Action

    description

    Description of the content of the Action

    feed

    Name of the feed this Action belongs to

    tags

    Optional custom tags for this object

  2. case class CopyAction(id: ActionObjectId, inputId: DataObjectId, outputId: DataObjectId, deleteDataAfterRead: Boolean = false, transformer: Option[CustomDfTransformerConfig] = None, columnBlacklist: Option[Seq[String]] = None, columnWhitelist: Option[Seq[String]] = None, filterClause: Option[String] = None, standardizeDatatypes: Boolean = false, breakDataFrameLineage: Boolean = false, persist: Boolean = false, initExecutionMode: Option[ExecutionMode] = None, metadata: Option[ActionMetadata] = None)(implicit instanceRegistry: InstanceRegistry) extends SparkSubFeedAction with Product with Serializable

    Permalink

    Action to copy files (i.e.

    Action to copy files (i.e. from stage to integration)

    inputId

    inputs DataObject

    outputId

    output DataObject

    deleteDataAfterRead

    a flag to enable deletion of input partitions after copying.

    transformer

    a custom transformation that is applied to each SubFeed separately

    initExecutionMode

    optional execution mode if this Action is a start node of a DAG run

  3. case class CustomFileAction(id: ActionObjectId, inputId: DataObjectId, outputId: DataObjectId, transformer: CustomFileTransformerConfig, deleteDataAfterRead: Boolean = false, filesPerPartition: Int = 10, breakFileRefLineage: Boolean = false, initExecutionMode: Option[ExecutionMode] = None, metadata: Option[ActionMetadata] = None)(implicit instanceRegistry: InstanceRegistry) extends FileSubFeedAction with SmartDataLakeLogger with Product with Serializable

    Permalink

    Action to transform files between two Hadoop Data Objects.

    Action to transform files between two Hadoop Data Objects. The transformation is executed in distributed mode on the Spark executors. A custom file transformer must be given, which reads a file from Hadoop and writes it back to Hadoop.

    inputId

    inputs DataObject

    outputId

    output DataObject

    transformer

    a custom file transformer, which reads a file from HadoopFileDataObject and writes it back to another HadoopFileDataObject

    deleteDataAfterRead

    if the input files should be deleted after processing successfully

    filesPerPartition

    number of files per Spark partition

  4. case class CustomSparkAction(id: ActionObjectId, inputIds: Seq[DataObjectId], outputIds: Seq[DataObjectId], transformer: CustomDfsTransformerConfig, breakDataFrameLineage: Boolean = false, persist: Boolean = false, initExecutionMode: Option[ExecutionMode] = None, metadata: Option[ActionMetadata] = None)(implicit instanceRegistry: InstanceRegistry) extends SparkSubFeedsAction with Product with Serializable

    Permalink

    Action to transform data according to a custom transformer.

    Action to transform data according to a custom transformer. Allows to transform multiple input and output dataframes.

    inputIds

    input DataObject's

    outputIds

    output DataObject's

    transformer

    Custom Transformer to transform Seq[DataFrames]

  5. case class DeduplicateAction(id: ActionObjectId, inputId: DataObjectId, outputId: DataObjectId, transformer: Option[CustomDfTransformerConfig] = None, columnBlacklist: Option[Seq[String]] = None, columnWhitelist: Option[Seq[String]] = None, filterClause: Option[String] = None, standardizeDatatypes: Boolean = false, ignoreOldDeletedColumns: Boolean = false, ignoreOldDeletedNestedColumns: Boolean = true, breakDataFrameLineage: Boolean = false, persist: Boolean = false, initExecutionMode: Option[ExecutionMode] = None, metadata: Option[ActionMetadata] = None)(implicit instanceRegistry: InstanceRegistry) extends SparkSubFeedAction with Product with Serializable

    Permalink

    Action to deduplicate a subfeed.

    Action to deduplicate a subfeed. Deduplication keeps the last record for every key, also after it has been deleted in the source. It needs a transactional table as output with defined primary keys.

    inputId

    inputs DataObject

    outputId

    output DataObject

    ignoreOldDeletedColumns

    if true, remove no longer existing columns in Schema Evolution

    ignoreOldDeletedNestedColumns

    if true, remove no longer existing columns from nested data types in Schema Evolution. Keeping deleted columns in complex data types has performance impact as all new data in the future has to be converted by a complex function.

    initExecutionMode

    optional execution mode if this Action is a start node of a DAG run

  6. abstract class FileSubFeedAction extends Action

    Permalink
  7. case class FileTransferAction(id: ActionObjectId, inputId: DataObjectId, outputId: DataObjectId, deleteDataAfterRead: Boolean = false, overwrite: Boolean = true, breakFileRefLineage: Boolean = false, initExecutionMode: Option[ExecutionMode] = None, metadata: Option[ActionMetadata] = None)(implicit instanceRegistry: InstanceRegistry) extends FileSubFeedAction with Product with Serializable

    Permalink

    Action to transfer files between SFtp, Hadoop and local Fs.

    Action to transfer files between SFtp, Hadoop and local Fs.

    inputId

    inputs DataObject

    outputId

    output DataObject

    deleteDataAfterRead

    if the input files should be deleted after processing successfully

  8. case class HistorizeAction(id: ActionObjectId, inputId: DataObjectId, outputId: DataObjectId, transformer: Option[CustomDfTransformerConfig] = None, columnBlacklist: Option[Seq[String]] = None, columnWhitelist: Option[Seq[String]] = None, standardizeDatatypes: Boolean = false, filterClause: Option[String] = None, historizeBlacklist: Option[Seq[String]] = None, historizeWhitelist: Option[Seq[String]] = None, ignoreOldDeletedColumns: Boolean = false, ignoreOldDeletedNestedColumns: Boolean = true, breakDataFrameLineage: Boolean = false, persist: Boolean = false, initExecutionMode: Option[ExecutionMode] = None, metadata: Option[ActionMetadata] = None)(implicit instanceRegistry: InstanceRegistry) extends SparkSubFeedAction with Product with Serializable

    Permalink

    Action to historize a subfeed.

    Action to historize a subfeed. Historization creates a technical history of data by creating valid-from/to columns. It needs a transactional table as output with defined primary keys.

    inputId

    inputs DataObject

    outputId

    output DataObject

    filterClause

    filter of data to be processed by historization. It can be used to exclude historical data not needed to create new history, for performance reasons.

    historizeBlacklist

    optional list of columns to ignore when comparing two records in historization. Can not be used together with historizeWhitelist.

    historizeWhitelist

    optional final list of columns to use when comparing two records in historization. Can not be used together with historizeBlacklist.

    ignoreOldDeletedColumns

    if true, remove no longer existing columns in Schema Evolution

    ignoreOldDeletedNestedColumns

    if true, remove no longer existing columns from nested data types in Schema Evolution. Keeping deleted columns in complex data types has performance impact as all new data in the future has to be converted by a complex function.

    initExecutionMode

    optional execution mode if this Action is a start node of a DAG run

  9. case class NoDataToProcessWarning(actionId: NodeId, msg: String) extends TaskSkippedWarning with Product with Serializable

    Permalink
  10. abstract class SparkSubFeedAction extends Action

    Permalink
  11. abstract class SparkSubFeedsAction extends Action

    Permalink

Value Members

  1. object ActionHelper extends SmartDataLakeLogger

    Permalink
  2. object CopyAction extends FromConfigFactory[Action] with Serializable

    Permalink
  3. object CustomFileAction extends FromConfigFactory[Action] with Serializable

    Permalink
  4. object CustomSparkAction extends FromConfigFactory[Action] with Serializable

    Permalink
  5. object DeduplicateAction extends FromConfigFactory[Action] with Serializable

    Permalink
  6. object FileTransferAction extends FromConfigFactory[Action] with Serializable

    Permalink
  7. object HistorizeAction extends FromConfigFactory[Action] with Serializable

    Permalink
  8. package customlogic

    Permalink

Ungrouped