FileSubFeedAction

Instance Constructors

new FileSubFeedAction()

Abstract Value Members

abstract def breakFileRefLineage: Boolean

Stop propagating input FileRefs through action and instead get new FileRefs from DataObject according to the SubFeed's partitionValue.
Stop propagating input FileRefs through action and instead get new FileRefs from DataObject according to the SubFeed's partitionValue. This is needed to reprocess all files of a path/partition instead of the FileRef's passed from the previous Action.
abstract def deleteDataAfterRead(): Boolean

If true delete files after they are successfully processed.
abstract def execSubFeed(subFeed: FileSubFeed)(implicit session: SparkSession, context: ActionPipelineContext): FileSubFeed

Executes Action for a given FileSubFeed
Executes Action for a given FileSubFeed
subFeed
subFeed to be processed (referencing files to be read)
returns
processed subFeed (referencing files written by this action)
abstract def executionMode: Option[ExecutionMode]

Execution mode if this Action is a start node of a DAG run
abstract def factory: FromConfigFactory[Action]

Returns the factory that can parse this type (that is, type CO).
Returns the factory that can parse this type (that is, type CO).
Typically, implementations of this method should return the companion object of the implementing class. The companion object in turn should implement FromConfigFactory.
returns
the factory (object) for this class.

Definition Classes
ParsableFromConfig
abstract val id: ActionObjectId

A unique identifier for this instance.
A unique identifier for this instance.

Definition Classes
Action → SdlConfigObject
abstract def initSubFeed(subFeed: FileSubFeed)(implicit session: SparkSession, context: ActionPipelineContext): FileSubFeed

Initialize Action with a given FileSubFeed Note that this only checks the prerequisits to do the processing and simulates the output FileRef's that would be created.
Initialize Action with a given FileSubFeed Note that this only checks the prerequisits to do the processing and simulates the output FileRef's that would be created.
subFeed
subFeed to be processed (referencing files to be read)
returns
processed subFeed (referencing files that would be written by this action)
abstract def input: FileRefDataObject with CanCreateInputStream

Input FileRefDataObject which can CanCreateInputStream
abstract def inputs: Seq[DataObject]

Input DataObjects To be implemented by subclasses
Input DataObjects To be implemented by subclasses

Definition Classes
Action
abstract def metadata: Option[ActionMetadata]

Additional metadata for the Action
Additional metadata for the Action

Definition Classes
Action
abstract def metricsFailCondition: Option[String]

Spark SQL condition evaluated as where-clause against dataframe of metrics.
Spark SQL condition evaluated as where-clause against dataframe of metrics. Available columns are dataObjectId, key, value. If there are any rows passing the where clause, a MetricCheckFailed exception is thrown.

Definition Classes
Action
abstract def output: FileRefDataObject with CanCreateOutputStream

Output FileRefDataObject which can CanCreateOutputStream
abstract def outputs: Seq[DataObject]

Output DataObjects To be implemented by subclasses
Output DataObjects To be implemented by subclasses

Definition Classes
Action

Concrete Value Members

final def !=(arg0: Any): Boolean

Definition Classes
AnyRef → Any
final def ##(): Int

Definition Classes
AnyRef → Any
final def ==(arg0: Any): Boolean

Definition Classes
AnyRef → Any
def addRuntimeEvent(phase: ExecutionPhase, state: RuntimeEventState, msg: Option[String] = None, results: Seq[SubFeed] = Seq()): Unit

Adds an action event
Adds an action event

Definition Classes
Action
final def asInstanceOf[T0]: T0

Definition Classes
Any
def clone(): AnyRef

Attributes
protected[java.lang]
Definition Classes
AnyRef
Annotations
@throws( ... )
def enableRuntimeMetrics(): Unit

Runtime metrics
Runtime metrics
Note: runtime metrics are disabled by default, because they are only collected when running Actions from an ActionDAG. This is not the case for Tests or other use cases. If enabled exceptions are thrown if metrics are not found.

Definition Classes
Action
final def eq(arg0: AnyRef): Boolean

Definition Classes
AnyRef
def equals(arg0: Any): Boolean

Definition Classes
AnyRef → Any
final def exec(subFeeds: Seq[SubFeed])(implicit session: SparkSession, context: ActionPipelineContext): Seq[SubFeed]

Action.exec implementation
Action.exec implementation
subFeeds
SparkSubFeed's to be processed
returns
processed SparkSubFeed's

Definition Classes
FileSubFeedAction → Action
def finalize(): Unit

Attributes
protected[java.lang]
Definition Classes
AnyRef
Annotations
@throws( classOf[java.lang.Throwable] )
def getAllLatestMetrics: Map[DataObjectId, Option[ActionMetrics]]

Definition Classes
Action
final def getClass(): Class[_]

Definition Classes
AnyRef → Any
def getFinalMetrics(dataObjectId: DataObjectId): Option[ActionMetrics]

Definition Classes
Action
def getInputDataObject[T <: DataObject](id: DataObjectId)(implicit arg0: ClassTag[T], arg1: scala.reflect.api.JavaUniverse.TypeTag[T], registry: InstanceRegistry): T

Attributes
protected
Definition Classes
Action
def getLatestMetrics(dataObjectId: DataObjectId): Option[ActionMetrics]

Definition Classes
Action
def getLatestRuntimeState: Option[RuntimeEventState]

get latest runtime state
get latest runtime state

Definition Classes
Action
def getOutputDataObject[T <: DataObject](id: DataObjectId)(implicit arg0: ClassTag[T], arg1: scala.reflect.api.JavaUniverse.TypeTag[T], registry: InstanceRegistry): T

Attributes
protected
Definition Classes
Action
def getRuntimeInfo: Option[RuntimeInfo]

get latest runtime information for this action
get latest runtime information for this action

Definition Classes
Action
def hashCode(): Int

Definition Classes
AnyRef → Any
final def init(subFeeds: Seq[SubFeed])(implicit session: SparkSession, context: ActionPipelineContext): Seq[SubFeed]

Action.init implementation
Action.init implementation
subFeeds
SparkSubFeed's to be processed
returns
processed SparkSubFeed's

Definition Classes
FileSubFeedAction → Action
final def isInstanceOf[T0]: Boolean

Definition Classes
Any
lazy val logger: Logger

Attributes
protected
Definition Classes
SmartDataLakeLogger
final def ne(arg0: AnyRef): Boolean

Definition Classes
AnyRef
def nodeId: String

provide an implementation of the DAG node id
provide an implementation of the DAG node id

Definition Classes
Action → DAGNode
final def notify(): Unit

Definition Classes
AnyRef
final def notifyAll(): Unit

Definition Classes
AnyRef
def onRuntimeMetrics(dataObjectId: Option[DataObjectId], metrics: ActionMetrics): Unit

Definition Classes
Action
final def postExec(inputSubFeeds: Seq[SubFeed], outputSubFeeds: Seq[SubFeed])(implicit session: SparkSession, context: ActionPipelineContext): Unit

Executes operations needed after executing an action.
Executes operations needed after executing an action. In this step any phase on Input- or Output-DataObjects needed after the main task is executed, e.g. JdbcTableDataObjects postWriteSql or CopyActions deleteInputData.

Definition Classes
FileSubFeedAction → Action
def postExecSubFeed(inputSubFeed: SubFeed, outputSubFeed: SubFeed)(implicit session: SparkSession, context: ActionPipelineContext): Unit
def preExec(subFeeds: Seq[SubFeed])(implicit session: SparkSession, context: ActionPipelineContext): Unit

Executes operations needed before executing an action.
Executes operations needed before executing an action. In this step any phase on Input- or Output-DataObjects needed before the main task is executed, e.g. JdbcTableDataObjects preWriteSql

Definition Classes
Action
def prepare(implicit session: SparkSession, context: ActionPipelineContext): Unit

Prepare DataObjects prerequisites.
Prepare DataObjects prerequisites. In this step preconditions are prepared & tested: - connections can be created - needed structures exist, e.g Kafka topic or Jdbc table
This runs during the "prepare" phase of the DAG.

Definition Classes
FileSubFeedAction → Action
def recursiveInputs: Seq[FileRefDataObject with CanCreateInputStream]

Recursive Inputs on FileSubFeeds are not supported so empty Seq is set.
Recursive Inputs on FileSubFeeds are not supported so empty Seq is set.

Definition Classes
FileSubFeedAction → Action
def reset(): Unit

Resets the runtime state of this Action This is mainly used for testing
Resets the runtime state of this Action This is mainly used for testing

Definition Classes
Action
def setSparkJobMetadata(operation: Option[String] = None)(implicit session: SparkSession): Unit

Sets the util job description for better traceability in the Spark UI
Sets the util job description for better traceability in the Spark UI
Note: This sets Spark local properties, which are propagated to the respective executor tasks. We rely on this to match metrics back to Actions and DataObjects. As writing to a DataObject on the Driver happens uninterrupted in the same exclusive thread, this is suitable.
operation
phase description (be short...)

Definition Classes
Action
final def synchronized[T0](arg0: ⇒ T0): T0

Definition Classes
AnyRef
final def toString(): String

This is displayed in ascii graph visualization
This is displayed in ascii graph visualization

Definition Classes
Action → AnyRef → Any
def toStringMedium: String

Definition Classes
Action
def toStringShort: String

Definition Classes
Action
final def wait(): Unit

Definition Classes
AnyRef
Annotations
@throws( ... )
final def wait(arg0: Long, arg1: Int): Unit

Definition Classes
AnyRef
Annotations
@throws( ... )
final def wait(arg0: Long): Unit

Definition Classes
AnyRef
Annotations
@throws( ... )

Related Doc: package action

abstract class FileSubFeedAction extends Action

Instance Constructors

new FileSubFeedAction()

Abstract Value Members

abstract def breakFileRefLineage: Boolean

abstract def deleteDataAfterRead(): Boolean

abstract def execSubFeed(subFeed: FileSubFeed)(implicit session: SparkSession, context: ActionPipelineContext): FileSubFeed

abstract def executionMode: Option[ExecutionMode]

abstract def factory: FromConfigFactory[Action]

abstract val id: ActionObjectId

abstract def initSubFeed(subFeed: FileSubFeed)(implicit session: SparkSession, context: ActionPipelineContext): FileSubFeed

abstract def input: FileRefDataObject with CanCreateInputStream

abstract def inputs: Seq[DataObject]

abstract def metadata: Option[ActionMetadata]

abstract def metricsFailCondition: Option[String]

abstract def output: FileRefDataObject with CanCreateOutputStream

abstract def outputs: Seq[DataObject]

Concrete Value Members

final def !=(arg0: Any): Boolean

final def ##(): Int

final def ==(arg0: Any): Boolean

def addRuntimeEvent(phase: ExecutionPhase, state: RuntimeEventState, msg: Option[String] = None, results: Seq[SubFeed] = Seq()): Unit

final def asInstanceOf[T0]: T0

def clone(): AnyRef

def enableRuntimeMetrics(): Unit

final def eq(arg0: AnyRef): Boolean

def equals(arg0: Any): Boolean

final def exec(subFeeds: Seq[SubFeed])(implicit session: SparkSession, context: ActionPipelineContext): Seq[SubFeed]

def finalize(): Unit

def getAllLatestMetrics: Map[DataObjectId, Option[ActionMetrics]]

final def getClass(): Class[_]

def getFinalMetrics(dataObjectId: DataObjectId): Option[ActionMetrics]

def getInputDataObject[T <: DataObject](id: DataObjectId)(implicit arg0: ClassTag[T], arg1: scala.reflect.api.JavaUniverse.TypeTag[T], registry: InstanceRegistry): T

def getLatestMetrics(dataObjectId: DataObjectId): Option[ActionMetrics]

def getLatestRuntimeState: Option[RuntimeEventState]

def getOutputDataObject[T <: DataObject](id: DataObjectId)(implicit arg0: ClassTag[T], arg1: scala.reflect.api.JavaUniverse.TypeTag[T], registry: InstanceRegistry): T

def getRuntimeInfo: Option[RuntimeInfo]

def hashCode(): Int

final def init(subFeeds: Seq[SubFeed])(implicit session: SparkSession, context: ActionPipelineContext): Seq[SubFeed]

final def isInstanceOf[T0]: Boolean

lazy val logger: Logger

final def ne(arg0: AnyRef): Boolean

def nodeId: String

final def notify(): Unit

final def notifyAll(): Unit

def onRuntimeMetrics(dataObjectId: Option[DataObjectId], metrics: ActionMetrics): Unit

final def postExec(inputSubFeeds: Seq[SubFeed], outputSubFeeds: Seq[SubFeed])(implicit session: SparkSession, context: ActionPipelineContext): Unit

def postExecSubFeed(inputSubFeed: SubFeed, outputSubFeed: SubFeed)(implicit session: SparkSession, context: ActionPipelineContext): Unit

def preExec(subFeeds: Seq[SubFeed])(implicit session: SparkSession, context: ActionPipelineContext): Unit

def prepare(implicit session: SparkSession, context: ActionPipelineContext): Unit

def recursiveInputs: Seq[FileRefDataObject with CanCreateInputStream]

def reset(): Unit

def setSparkJobMetadata(operation: Option[String] = None)(implicit session: SparkSession): Unit

final def synchronized[T0](arg0: ⇒ T0): T0

final def toString(): String

def toStringMedium: String

def toStringShort: String

final def wait(): Unit

final def wait(arg0: Long, arg1: Int): Unit

final def wait(arg0: Long): Unit

Inherited from Action

Inherited from SmartDataLakeLogger

Inherited from DAGNode

Inherited from ParsableFromConfig[Action]

Inherited from SdlConfigObject

Inherited from AnyRef

Inherited from Any

Ungrouped