Class/Object

io.smartdatalake.workflow.action

CustomSparkAction

Related Docs: object CustomSparkAction | package action

Permalink

case class CustomSparkAction(id: ActionObjectId, inputIds: Seq[DataObjectId], outputIds: Seq[DataObjectId], transformer: CustomDfsTransformerConfig, breakDataFrameLineage: Boolean = false, persist: Boolean = false, initExecutionMode: Option[ExecutionMode] = None, metadata: Option[ActionMetadata] = None)(implicit instanceRegistry: InstanceRegistry) extends SparkSubFeedsAction with Product with Serializable

Action to transform data according to a custom transformer. Allows to transform multiple input and output dataframes.

inputIds

input DataObject's

outputIds

output DataObject's

transformer

Custom Transformer to transform Seq[DataFrames]

Linear Supertypes
Serializable, Serializable, Product, Equals, SparkSubFeedsAction, Action, SmartDataLakeLogger, DAGNode, ParsableFromConfig[Action], SdlConfigObject, AnyRef, Any
Ordering
  1. Alphabetic
  2. By Inheritance
Inherited
  1. CustomSparkAction
  2. Serializable
  3. Serializable
  4. Product
  5. Equals
  6. SparkSubFeedsAction
  7. Action
  8. SmartDataLakeLogger
  9. DAGNode
  10. ParsableFromConfig
  11. SdlConfigObject
  12. AnyRef
  13. Any
  1. Hide All
  2. Show All
Visibility
  1. Public
  2. All

Instance Constructors

  1. new CustomSparkAction(id: ActionObjectId, inputIds: Seq[DataObjectId], outputIds: Seq[DataObjectId], transformer: CustomDfsTransformerConfig, breakDataFrameLineage: Boolean = false, persist: Boolean = false, initExecutionMode: Option[ExecutionMode] = None, metadata: Option[ActionMetadata] = None)(implicit instanceRegistry: InstanceRegistry)

    Permalink

    inputIds

    input DataObject's

    outputIds

    output DataObject's

    transformer

    Custom Transformer to transform Seq[DataFrames]

Value Members

  1. final def !=(arg0: Any): Boolean

    Permalink
    Definition Classes
    AnyRef → Any
  2. final def ##(): Int

    Permalink
    Definition Classes
    AnyRef → Any
  3. final def ==(arg0: Any): Boolean

    Permalink
    Definition Classes
    AnyRef → Any
  4. def addRuntimeEvent(phase: String, state: RuntimeEventState, msg: String): Unit

    Permalink

    Adds an action event

    Adds an action event

    Definition Classes
    Action
  5. final def asInstanceOf[T0]: T0

    Permalink
    Definition Classes
    Any
  6. val breakDataFrameLineage: Boolean

    Permalink

    Stop propagating input DataFrame through action and instead get a new DataFrame from DataObject This is needed if the input DataFrame includes many transformations from previous Actions.

    Stop propagating input DataFrame through action and instead get a new DataFrame from DataObject This is needed if the input DataFrame includes many transformations from previous Actions.

    Definition Classes
    CustomSparkActionSparkSubFeedsAction
  7. def clone(): AnyRef

    Permalink
    Attributes
    protected[java.lang]
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  8. def enrichSubFeedsDataFrame(inputs: Seq[DataObject with CanCreateDataFrame], subFeeds: Seq[SparkSubFeed])(implicit session: SparkSession): Seq[SparkSubFeed]

    Permalink

    Enriches SparkSubFeeds with DataFrame if not existing

    Enriches SparkSubFeeds with DataFrame if not existing

    inputs

    input data objects.

    subFeeds

    input SubFeeds.

    Attributes
    protected
    Definition Classes
    SparkSubFeedsAction
  9. final def eq(arg0: AnyRef): Boolean

    Permalink
    Definition Classes
    AnyRef
  10. final def exec(subFeeds: Seq[SubFeed])(implicit session: SparkSession, context: ActionPipelineContext): Seq[SubFeed]

    Permalink

    Action.exec implementation

    Action.exec implementation

    subFeeds

    SparkSubFeed's to be processed

    returns

    processed SparkSubFeed's

    Definition Classes
    SparkSubFeedsAction → Action
  11. def factory: FromConfigFactory[Action]

    Permalink

    Returns the factory that can parse this type (that is, type CO).

    Returns the factory that can parse this type (that is, type CO).

    Typically, implementations of this method should return the companion object of the implementing class. The companion object in turn should implement FromConfigFactory.

    returns

    the factory (object) for this class.

    Definition Classes
    CustomSparkAction → ParsableFromConfig
  12. def finalize(): Unit

    Permalink
    Attributes
    protected[java.lang]
    Definition Classes
    AnyRef
    Annotations
    @throws( classOf[java.lang.Throwable] )
  13. final def getClass(): Class[_]

    Permalink
    Definition Classes
    AnyRef → Any
  14. def getInputDataObject[T <: DataObject](id: DataObjectId)(implicit arg0: ClassTag[T], arg1: scala.reflect.api.JavaUniverse.TypeTag[T], registry: InstanceRegistry): T

    Permalink
    Attributes
    protected
    Definition Classes
    Action
  15. def getOutputDataObject[T <: DataObject](id: DataObjectId)(implicit arg0: ClassTag[T], arg1: scala.reflect.api.JavaUniverse.TypeTag[T], registry: InstanceRegistry): T

    Permalink
    Attributes
    protected
    Definition Classes
    Action
  16. def getRuntimeState: Option[String]

    Permalink

    Definition Classes
    Action
  17. val id: ActionObjectId

    Permalink

    A unique identifier for this instance.

    A unique identifier for this instance.

    Definition Classes
    CustomSparkAction → Action → SdlConfigObject
  18. final def init(subFeeds: Seq[SubFeed])(implicit session: SparkSession, context: ActionPipelineContext): Seq[SubFeed]

    Permalink

    Generic init implementation for Action.init

    Generic init implementation for Action.init

    subFeeds

    SparkSubFeed's to be processed

    returns

    processed SparkSubFeed's

    Definition Classes
    SparkSubFeedsAction → Action
  19. val initExecutionMode: Option[ExecutionMode]

    Permalink

    Execution mode if this Action is a start node of a DAG run

    Execution mode if this Action is a start node of a DAG run

    Definition Classes
    CustomSparkActionSparkSubFeedsAction
  20. val inputIds: Seq[DataObjectId]

    Permalink

    input DataObject's

  21. val inputs: Seq[DataObject with CanCreateDataFrame]

    Permalink

    Input DataObjects To be implemented by subclasses

    Input DataObjects To be implemented by subclasses

    Definition Classes
    CustomSparkActionSparkSubFeedsAction → Action
  22. final def isInstanceOf[T0]: Boolean

    Permalink
    Definition Classes
    Any
  23. lazy val logger: Logger

    Permalink
    Attributes
    protected
    Definition Classes
    SmartDataLakeLogger
  24. lazy val mainInput: Option[DataObject with CanCreateDataFrame]

    Permalink
    Attributes
    protected
    Definition Classes
    SparkSubFeedsAction
  25. lazy val mainOutput: Option[DataObject with CanWriteDataFrame]

    Permalink
    Attributes
    protected
    Definition Classes
    SparkSubFeedsAction
  26. val metadata: Option[ActionMetadata]

    Permalink

    Additional metadata for the Action

    Additional metadata for the Action

    Definition Classes
    CustomSparkAction → Action
  27. final def ne(arg0: AnyRef): Boolean

    Permalink
    Definition Classes
    AnyRef
  28. def nodeId: String

    Permalink

    provide an implementation of the DAG node id

    provide an implementation of the DAG node id

    Definition Classes
    Action → DAGNode
  29. final def notify(): Unit

    Permalink
    Definition Classes
    AnyRef
  30. final def notifyAll(): Unit

    Permalink
    Definition Classes
    AnyRef
  31. val outputIds: Seq[DataObjectId]

    Permalink

    output DataObject's

  32. val outputs: Seq[DataObject with CanWriteDataFrame]

    Permalink

    Output DataObjects To be implemented by subclasses

    Output DataObjects To be implemented by subclasses

    Definition Classes
    CustomSparkActionSparkSubFeedsAction → Action
  33. val persist: Boolean

    Permalink

    Force persisting DataFrame on Disk.

    Force persisting DataFrame on Disk. This helps to reduce memory needed for caching the DataFrame content and can serve as a recovery point in case an task get's lost.

    Definition Classes
    CustomSparkActionSparkSubFeedsAction
  34. def postExec(inputSubFeed: Seq[SubFeed], outputSubFeed: Seq[SubFeed])(implicit session: SparkSession, context: ActionPipelineContext): Unit

    Permalink

    Executes operations needed after executing an action.

    Executes operations needed after executing an action. In this step any operation on Input- or Output-DataObjects needed after the main task is executed, e.g. JdbcTableDataObjects postSql or CopyActions deleteInputData.

    Definition Classes
    Action
  35. def preExec(implicit session: SparkSession, context: ActionPipelineContext): Unit

    Permalink

    Executes operations needed before executing an action.

    Executes operations needed before executing an action. In this step any operation on Input- or Output-DataObjects needed before the main task is executed, e.g. JdbcTableDataObjects preSql

    Definition Classes
    Action
  36. def prepare(implicit session: SparkSession, context: ActionPipelineContext): Unit

    Permalink

    Prepare DataObjects prerequisites.

    Prepare DataObjects prerequisites. In this step preconditions are prepared & tested: - directories exists or can be created - connections can be created

    This runs during the "prepare" operation of the DAG.

    Definition Classes
    Action
  37. def setSparkJobDescription(operation: String)(implicit session: SparkSession): Unit

    Permalink

    Sets the util job description for better traceability in the Spark UI

    Sets the util job description for better traceability in the Spark UI

    operation

    operation description (be short...)

    session

    util session

    Definition Classes
    Action
  38. final def synchronized[T0](arg0: ⇒ T0): T0

    Permalink
    Definition Classes
    AnyRef
  39. final def toString(): String

    Permalink

    This is displayed in ascii graph visualization

    This is displayed in ascii graph visualization

    Definition Classes
    Action → AnyRef → Any
  40. def toStringMedium: String

    Permalink
    Definition Classes
    Action
  41. def toStringShort: String

    Permalink
    Definition Classes
    Action
  42. def transform(subFeeds: Seq[SparkSubFeed])(implicit session: SparkSession, context: ActionPipelineContext): Seq[SparkSubFeed]

    Permalink

    Transform SparkSubFeed's.

    Transform SparkSubFeed's. To be implemented by subclasses.

    subFeeds

    SparkSubFeed's to be transformed

    returns

    transformed SparkSubFeed's

    Definition Classes
    CustomSparkActionSparkSubFeedsAction
  43. val transformer: CustomDfsTransformerConfig

    Permalink

    Custom Transformer to transform Seq[DataFrames]

  44. final def wait(): Unit

    Permalink
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  45. final def wait(arg0: Long, arg1: Int): Unit

    Permalink
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  46. final def wait(arg0: Long): Unit

    Permalink
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )

Inherited from Serializable

Inherited from Serializable

Inherited from Product

Inherited from Equals

Inherited from SparkSubFeedsAction

Inherited from Action

Inherited from SmartDataLakeLogger

Inherited from DAGNode

Inherited from ParsableFromConfig[Action]

Inherited from SdlConfigObject

Inherited from AnyRef

Inherited from Any

Ungrouped