Class

io.smartdatalake.workflow.action

FileSubFeedAction

Related Doc: package action

Permalink

abstract class FileSubFeedAction extends Action

Linear Supertypes
Action, AtlasExportable, SmartDataLakeLogger, DAGNode, ParsableFromConfig[Action], SdlConfigObject, AnyRef, Any
Known Subclasses
Ordering
  1. Alphabetic
  2. By Inheritance
Inherited
  1. FileSubFeedAction
  2. Action
  3. AtlasExportable
  4. SmartDataLakeLogger
  5. DAGNode
  6. ParsableFromConfig
  7. SdlConfigObject
  8. AnyRef
  9. Any
  1. Hide All
  2. Show All
Visibility
  1. Public
  2. All

Instance Constructors

  1. new FileSubFeedAction()

    Permalink

Abstract Value Members

  1. abstract def breakFileRefLineage: Boolean

    Permalink

    Stop propagating input FileRefs through action and instead get new FileRefs from DataObject according to the SubFeed's partitionValue.

    Stop propagating input FileRefs through action and instead get new FileRefs from DataObject according to the SubFeed's partitionValue. This is needed to reprocess all files of a path/partition instead of the FileRef's passed from the previous Action.

  2. abstract def deleteDataAfterRead(): Boolean

    Permalink

    If true delete files after they are successfully processed.

  3. abstract def doTransform(inputSubFeed: FileSubFeed, outputSubFeed: FileSubFeed, doExec: Boolean)(implicit session: SparkSession, context: ActionPipelineContext): FileSubFeed

    Permalink

    "Transforms" a given FileSubFeed Note usage of doExec to choose between initialization or actual execution.

    "Transforms" a given FileSubFeed Note usage of doExec to choose between initialization or actual execution.

    inputSubFeed

    subFeed to be processed (referencing files to be read)

    outputSubFeed

    prepared output subFeed

    doExec

    true if action should be executed. If false this only checks the prerequisits to do the processing and simulates the output FileRef's that would be created.

    returns

    processed output subFeed (referencing files written by this action)

  4. abstract def executionCondition: Option[Condition]

    Permalink

    execution condition for this action.

    execution condition for this action.

    Definition Classes
    Action
  5. abstract def executionMode: Option[ExecutionMode]

    Permalink

    execution mode for this action.

    execution mode for this action.

    Definition Classes
    Action
  6. abstract def factory: FromConfigFactory[Action]

    Permalink

    Returns the factory that can parse this type (that is, type CO).

    Returns the factory that can parse this type (that is, type CO).

    Typically, implementations of this method should return the companion object of the implementing class. The companion object in turn should implement FromConfigFactory.

    returns

    the factory (object) for this class.

    Definition Classes
    ParsableFromConfig
  7. abstract val id: ActionId

    Permalink

    A unique identifier for this instance.

    A unique identifier for this instance.

    Definition Classes
    Action → SdlConfigObject
  8. abstract def input: FileRefDataObject with CanCreateInputStream

    Permalink

    Input FileRefDataObject which can CanCreateInputStream

  9. abstract def inputs: Seq[DataObject]

    Permalink

    Input DataObjects To be implemented by subclasses

    Input DataObjects To be implemented by subclasses

    Definition Classes
    Action
  10. abstract def metadata: Option[ActionMetadata]

    Permalink

    Additional metadata for the Action

    Additional metadata for the Action

    Definition Classes
    Action
  11. abstract def metricsFailCondition: Option[String]

    Permalink

    Spark SQL condition evaluated as where-clause against dataframe of metrics.

    Spark SQL condition evaluated as where-clause against dataframe of metrics. Available columns are dataObjectId, key, value. If there are any rows passing the where clause, a MetricCheckFailed exception is thrown.

    Definition Classes
    Action
  12. abstract def output: FileRefDataObject with CanCreateOutputStream

    Permalink

    Output FileRefDataObject which can CanCreateOutputStream

  13. abstract def outputs: Seq[DataObject]

    Permalink

    Output DataObjects To be implemented by subclasses

    Output DataObjects To be implemented by subclasses

    Definition Classes
    Action

Concrete Value Members

  1. final def !=(arg0: Any): Boolean

    Permalink
    Definition Classes
    AnyRef → Any
  2. final def ##(): Int

    Permalink
    Definition Classes
    AnyRef → Any
  3. final def ==(arg0: Any): Boolean

    Permalink
    Definition Classes
    AnyRef → Any
  4. def addRuntimeEvent(phase: ExecutionPhase, state: RuntimeEventState, msg: Option[String] = None, results: Seq[SubFeed] = Seq()): Unit

    Permalink

    Adds an action event

    Adds an action event

    Definition Classes
    Action
  5. final def asInstanceOf[T0]: T0

    Permalink
    Definition Classes
    Any
  6. def atlasName: String

    Permalink
    Definition Classes
    Action → AtlasExportable
  7. def atlasQualifiedName(prefix: String): String

    Permalink
    Definition Classes
    AtlasExportable
  8. def clone(): AnyRef

    Permalink
    Attributes
    protected[java.lang]
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  9. def enableRuntimeMetrics(): Unit

    Permalink

    Runtime metrics

    Runtime metrics

    Note: runtime metrics are disabled by default, because they are only collected when running Actions from an ActionDAG. This is not the case for Tests or other use cases. If enabled exceptions are thrown if metrics are not found.

    Definition Classes
    Action
  10. final def eq(arg0: AnyRef): Boolean

    Permalink
    Definition Classes
    AnyRef
  11. def equals(arg0: Any): Boolean

    Permalink
    Definition Classes
    AnyRef → Any
  12. final def exec(subFeeds: Seq[SubFeed])(implicit session: SparkSession, context: ActionPipelineContext): Seq[SubFeed]

    Permalink

    Action.exec implementation

    Action.exec implementation

    subFeeds

    SparkSubFeed's to be processed

    returns

    processed SparkSubFeed's

    Definition Classes
    FileSubFeedAction → Action
  13. var executionConditionResult: (Boolean, Option[String])

    Permalink
    Attributes
    protected
    Definition Classes
    Action
  14. var executionModeResult: Try[Option[ExecutionModeResult]]

    Permalink
    Attributes
    protected
    Definition Classes
    Action
  15. def finalize(): Unit

    Permalink
    Attributes
    protected[java.lang]
    Definition Classes
    AnyRef
    Annotations
    @throws( classOf[java.lang.Throwable] )
  16. def getAllLatestMetrics: Map[DataObjectId, Option[ActionMetrics]]

    Permalink
    Definition Classes
    Action
  17. final def getClass(): Class[_]

    Permalink
    Definition Classes
    AnyRef → Any
  18. def getFinalMetrics(dataObjectId: DataObjectId): Option[ActionMetrics]

    Permalink
    Definition Classes
    Action
  19. def getInputDataObject[T <: DataObject](id: DataObjectId)(implicit arg0: ClassTag[T], arg1: scala.reflect.api.JavaUniverse.TypeTag[T], registry: InstanceRegistry): T

    Permalink
    Attributes
    protected
    Definition Classes
    Action
  20. def getLatestMetrics(dataObjectId: DataObjectId): Option[ActionMetrics]

    Permalink
    Definition Classes
    Action
  21. def getLatestRuntimeState: Option[RuntimeEventState]

    Permalink

    get latest runtime state

    get latest runtime state

    Definition Classes
    Action
  22. def getOutputDataObject[T <: DataObject](id: DataObjectId)(implicit arg0: ClassTag[T], arg1: scala.reflect.api.JavaUniverse.TypeTag[T], registry: InstanceRegistry): T

    Permalink
    Attributes
    protected
    Definition Classes
    Action
  23. def getRuntimeInfo: Option[RuntimeInfo]

    Permalink

    get latest runtime information for this action

    get latest runtime information for this action

    Definition Classes
    Action
  24. def hashCode(): Int

    Permalink
    Definition Classes
    AnyRef → Any
  25. final def init(subFeeds: Seq[SubFeed])(implicit session: SparkSession, context: ActionPipelineContext): Seq[SubFeed]

    Permalink

    Action.init implementation

    Action.init implementation

    subFeeds

    SparkSubFeed's to be processed

    returns

    processed SparkSubFeed's

    Definition Classes
    FileSubFeedAction → Action
  26. final def isInstanceOf[T0]: Boolean

    Permalink
    Definition Classes
    Any
  27. lazy val logger: Logger

    Permalink
    Attributes
    protected
    Definition Classes
    SmartDataLakeLogger
  28. final def ne(arg0: AnyRef): Boolean

    Permalink
    Definition Classes
    AnyRef
  29. def nodeId: String

    Permalink

    provide an implementation of the DAG node id

    provide an implementation of the DAG node id

    Definition Classes
    Action → DAGNode
  30. final def notify(): Unit

    Permalink
    Definition Classes
    AnyRef
  31. final def notifyAll(): Unit

    Permalink
    Definition Classes
    AnyRef
  32. def onRuntimeMetrics(dataObjectId: Option[DataObjectId], metrics: ActionMetrics): Unit

    Permalink
    Definition Classes
    Action
  33. final def postExec(inputSubFeeds: Seq[SubFeed], outputSubFeeds: Seq[SubFeed])(implicit session: SparkSession, context: ActionPipelineContext): Unit

    Permalink

    Executes operations needed after executing an action.

    Executes operations needed after executing an action. In this step any task on Input- or Output-DataObjects needed after the main task is executed, e.g. JdbcTableDataObjects postWriteSql or CopyActions deleteInputData.

    Definition Classes
    FileSubFeedAction → Action
  34. def postExecSubFeed(inputSubFeed: SubFeed, outputSubFeed: SubFeed)(implicit session: SparkSession, context: ActionPipelineContext): Unit

    Permalink
  35. def preExec(subFeeds: Seq[SubFeed])(implicit session: SparkSession, context: ActionPipelineContext): Unit

    Permalink

    Executes operations needed before executing an action.

    Executes operations needed before executing an action. In this step any phase on Input- or Output-DataObjects needed before the main task is executed, e.g. JdbcTableDataObjects preWriteSql

    Definition Classes
    Action
  36. def preInit(subFeeds: Seq[SubFeed])(implicit session: SparkSession, context: ActionPipelineContext): Unit

    Permalink

    Checks before initalization of Action In this step execution condition is evaluated and is Action init is skipped if result is false.

    Checks before initalization of Action In this step execution condition is evaluated and is Action init is skipped if result is false.

    Definition Classes
    Action
  37. def prepare(implicit session: SparkSession, context: ActionPipelineContext): Unit

    Permalink

    Prepare DataObjects prerequisites.

    Prepare DataObjects prerequisites. In this step preconditions are prepared & tested: - connections can be created - needed structures exist, e.g Kafka topic or Jdbc table

    This runs during the "prepare" phase of the DAG.

    Definition Classes
    FileSubFeedAction → Action
  38. def recursiveInputs: Seq[FileRefDataObject with CanCreateInputStream]

    Permalink

    Recursive Inputs on FileSubFeeds are not supported so empty Seq is set.

    Recursive Inputs on FileSubFeeds are not supported so empty Seq is set.

    Definition Classes
    FileSubFeedAction → Action
  39. def reset(): Unit

    Permalink

    Resets the runtime state of this Action This is mainly used for testing

    Resets the runtime state of this Action This is mainly used for testing

    Definition Classes
    Action
  40. def setSparkJobMetadata(operation: Option[String] = None)(implicit session: SparkSession): Unit

    Permalink

    Sets the util job description for better traceability in the Spark UI

    Sets the util job description for better traceability in the Spark UI

    Note: This sets Spark local properties, which are propagated to the respective executor tasks. We rely on this to match metrics back to Actions and DataObjects. As writing to a DataObject on the Driver happens uninterrupted in the same exclusive thread, this is suitable.

    operation

    phase description (be short...)

    Definition Classes
    Action
  41. final def synchronized[T0](arg0: ⇒ T0): T0

    Permalink
    Definition Classes
    AnyRef
  42. final def toString(): String

    Permalink

    This is displayed in ascii graph visualization

    This is displayed in ascii graph visualization

    Definition Classes
    Action → AnyRef → Any
  43. def toStringMedium: String

    Permalink
    Definition Classes
    Action
  44. def toStringShort: String

    Permalink
    Definition Classes
    Action
  45. final def wait(): Unit

    Permalink
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  46. final def wait(arg0: Long, arg1: Int): Unit

    Permalink
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  47. final def wait(arg0: Long): Unit

    Permalink
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )

Inherited from Action

Inherited from AtlasExportable

Inherited from SmartDataLakeLogger

Inherited from DAGNode

Inherited from ParsableFromConfig[Action]

Inherited from SdlConfigObject

Inherited from AnyRef

Inherited from Any

Ungrouped