Class

io.smartdatalake.definitions

SparkIncrementalMode

Related Doc: package definitions

Permalink

case class SparkIncrementalMode(compareCol: String, alternativeOutputId: Option[DataObjectId] = None, stopIfNoData: Boolean = true, applyCondition: Option[Condition] = None) extends ExecutionMode with ExecutionModeWithMainInputOutput with Product with Serializable

Compares max entry in "compare column" between mainOutput and mainInput and incrementally loads the delta. This mode works only with SparkSubFeeds. The filter is not propagated to following actions.

compareCol

a comparable column name existing in mainInput and mainOutput used to identify the delta. Column content should be bigger for newer records.

alternativeOutputId

optional alternative outputId of DataObject later in the DAG. This replaces the mainOutputId. It can be used to ensure processing all partitions over multiple actions in case of errors.

stopIfNoData

optional setting if further actions should be skipped if this action has no data to process (default). Set stopIfNoData=false if you want to run further actions nevertheless. They will receive output dataObject unfiltered as input.

applyCondition

Condition to decide if execution mode should be applied or not. Define a spark sql expression working with attributes of DefaultExecutionModeExpressionData returning a boolean. Default is to apply the execution mode if given partition values (partition values from command line or passed from previous action) are not empty.

Linear Supertypes
Serializable, Serializable, Product, Equals, ExecutionModeWithMainInputOutput, ExecutionMode, SmartDataLakeLogger, AnyRef, Any
Ordering
  1. Alphabetic
  2. By Inheritance
Inherited
  1. SparkIncrementalMode
  2. Serializable
  3. Serializable
  4. Product
  5. Equals
  6. ExecutionModeWithMainInputOutput
  7. ExecutionMode
  8. SmartDataLakeLogger
  9. AnyRef
  10. Any
  1. Hide All
  2. Show All
Visibility
  1. Public
  2. All

Instance Constructors

  1. new SparkIncrementalMode(compareCol: String, alternativeOutputId: Option[DataObjectId] = None, stopIfNoData: Boolean = true, applyCondition: Option[Condition] = None)

    Permalink

    compareCol

    a comparable column name existing in mainInput and mainOutput used to identify the delta. Column content should be bigger for newer records.

    alternativeOutputId

    optional alternative outputId of DataObject later in the DAG. This replaces the mainOutputId. It can be used to ensure processing all partitions over multiple actions in case of errors.

    stopIfNoData

    optional setting if further actions should be skipped if this action has no data to process (default). Set stopIfNoData=false if you want to run further actions nevertheless. They will receive output dataObject unfiltered as input.

    applyCondition

    Condition to decide if execution mode should be applied or not. Define a spark sql expression working with attributes of DefaultExecutionModeExpressionData returning a boolean. Default is to apply the execution mode if given partition values (partition values from command line or passed from previous action) are not empty.

Value Members

  1. final def !=(arg0: Any): Boolean

    Permalink
    Definition Classes
    AnyRef → Any
  2. final def ##(): Int

    Permalink
    Definition Classes
    AnyRef → Any
  3. final def ==(arg0: Any): Boolean

    Permalink
    Definition Classes
    AnyRef → Any
  4. def alternativeOutput(implicit context: ActionPipelineContext): Option[DataObject]

    Permalink
    Definition Classes
    ExecutionModeWithMainInputOutput
  5. val alternativeOutputId: Option[DataObjectId]

    Permalink

    optional alternative outputId of DataObject later in the DAG.

    optional alternative outputId of DataObject later in the DAG. This replaces the mainOutputId. It can be used to ensure processing all partitions over multiple actions in case of errors.

    Definition Classes
    SparkIncrementalMode → ExecutionModeWithMainInputOutput
  6. val applyCondition: Option[Condition]

    Permalink

    Condition to decide if execution mode should be applied or not.

    Condition to decide if execution mode should be applied or not. Define a spark sql expression working with attributes of DefaultExecutionModeExpressionData returning a boolean. Default is to apply the execution mode if given partition values (partition values from command line or passed from previous action) are not empty.

  7. final def asInstanceOf[T0]: T0

    Permalink
    Definition Classes
    Any
  8. def clone(): AnyRef

    Permalink
    Attributes
    protected[java.lang]
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  9. val compareCol: String

    Permalink

    a comparable column name existing in mainInput and mainOutput used to identify the delta.

    a comparable column name existing in mainInput and mainOutput used to identify the delta. Column content should be bigger for newer records.

  10. final def eq(arg0: AnyRef): Boolean

    Permalink
    Definition Classes
    AnyRef
  11. def finalize(): Unit

    Permalink
    Attributes
    protected[java.lang]
    Definition Classes
    AnyRef
    Annotations
    @throws( classOf[java.lang.Throwable] )
  12. final def getClass(): Class[_]

    Permalink
    Definition Classes
    AnyRef → Any
  13. final def isInstanceOf[T0]: Boolean

    Permalink
    Definition Classes
    Any
  14. lazy val logger: Logger

    Permalink
    Attributes
    protected
    Definition Classes
    SmartDataLakeLogger
  15. final def ne(arg0: AnyRef): Boolean

    Permalink
    Definition Classes
    AnyRef
  16. final def notify(): Unit

    Permalink
    Definition Classes
    AnyRef
  17. final def notifyAll(): Unit

    Permalink
    Definition Classes
    AnyRef
  18. val stopIfNoData: Boolean

    Permalink

    optional setting if further actions should be skipped if this action has no data to process (default).

    optional setting if further actions should be skipped if this action has no data to process (default). Set stopIfNoData=false if you want to run further actions nevertheless. They will receive output dataObject unfiltered as input.

  19. final def synchronized[T0](arg0: ⇒ T0): T0

    Permalink
    Definition Classes
    AnyRef
  20. final def wait(): Unit

    Permalink
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  21. final def wait(arg0: Long, arg1: Int): Unit

    Permalink
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  22. final def wait(arg0: Long): Unit

    Permalink
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )

Inherited from Serializable

Inherited from Serializable

Inherited from Product

Inherited from Equals

Inherited from ExecutionModeWithMainInputOutput

Inherited from ExecutionMode

Inherited from SmartDataLakeLogger

Inherited from AnyRef

Inherited from Any

Ungrouped