Class

io.smartdatalake.definitions

PartitionDiffMode

Related Doc: package definitions

Permalink

case class PartitionDiffMode(partitionColNb: Option[Int] = None, alternativeOutputId: Option[DataObjectId] = None, nbOfPartitionValuesPerRun: Option[Int] = None, applyCondition: Option[String] = None, failCondition: Option[String] = None, failConditions: Seq[Condition] = Seq(), stopIfNoData: Boolean = true, selectExpression: Option[String] = None, applyPartitionValuesTransform: Boolean = false, selectAdditionalInputExpression: Option[String] = None) extends ExecutionMode with ExecutionModeWithMainInputOutput with Product with Serializable

Partition difference execution mode lists partitions on mainInput & mainOutput DataObject and starts loading all missing partitions. Partition columns to be used for comparision need to be a common 'init' of input and output partition columns. This mode needs mainInput/Output DataObjects which CanHandlePartitions to list partitions. Partition values are passed to following actions for partition columns which they have in common.

partitionColNb

optional number of partition columns to use as a common 'init'.

alternativeOutputId

optional alternative outputId of DataObject later in the DAG. This replaces the mainOutputId. It can be used to ensure processing all partitions over multiple actions in case of errors.

nbOfPartitionValuesPerRun

optional restriction of the number of partition values per run.

applyCondition

Condition to decide if execution mode should be applied or not. Define a spark sql expression working with attributes of DefaultExecutionModeExpressionData returning a boolean. Default is to apply the execution mode if given partition values (partition values from command line or passed from previous action) are not empty.

failConditions

List of conditions to fail application of execution mode if true. Define as spark sql expressions working with attributes of PartitionDiffModeExpressionData returning a boolean. Default is that the application of the PartitionDiffMode does not fail the action. If there is no data to process, the following actions are skipped. Multiple conditions are evaluated individually and every condition may fail the execution mode (or-logic)

stopIfNoData

Optional setting if further actions should be skipped if this action has no data to process (default). Set stopIfNoData=false if you want to run further actions nevertheless. They will receive output dataObject unfiltered as input.

selectExpression

optional expression to define or refine the list of selected output partitions. Define a spark sql expression working with the attributes of PartitionDiffModeExpressionData returning a list<map<string,string>>. Default is to return the originally selected output partitions found in attribute selectedOutputPartitionValues.

applyPartitionValuesTransform

If true applies the partition values transform of custom transformations on input partition values before comparison with output partition values. If enabled input and output partition columns can be different. Default is to disable the transformation of partition values.

selectAdditionalInputExpression

optional expression to refine the list of selected input partitions. Note that primarily output partitions are selected by PartitionDiffMode. The selected output partitions are then transformed back to the input partitions needed to create the selected output partitions. This is one-to-one except if applyPartitionValuesTransform=true. And sometimes there is a need for additional input data to create the output partitions, e.g. if you aggregate a window of 7 days for every day. You can customize selected input partitions by defining a spark sql expression working with the attributes of PartitionDiffModeExpressionData returning a list<map<string,string>>. Default is to return the originally selected input partitions found in attribute selectedInputPartitionValues.

Linear Supertypes
Serializable, Serializable, Product, Equals, ExecutionModeWithMainInputOutput, ExecutionMode, SmartDataLakeLogger, AnyRef, Any
Ordering
  1. Alphabetic
  2. By Inheritance
Inherited
  1. PartitionDiffMode
  2. Serializable
  3. Serializable
  4. Product
  5. Equals
  6. ExecutionModeWithMainInputOutput
  7. ExecutionMode
  8. SmartDataLakeLogger
  9. AnyRef
  10. Any
  1. Hide All
  2. Show All
Visibility
  1. Public
  2. All

Instance Constructors

  1. new PartitionDiffMode(partitionColNb: Option[Int] = None, alternativeOutputId: Option[DataObjectId] = None, nbOfPartitionValuesPerRun: Option[Int] = None, applyCondition: Option[String] = None, failCondition: Option[String] = None, failConditions: Seq[Condition] = Seq(), stopIfNoData: Boolean = true, selectExpression: Option[String] = None, applyPartitionValuesTransform: Boolean = false, selectAdditionalInputExpression: Option[String] = None)

    Permalink

    partitionColNb

    optional number of partition columns to use as a common 'init'.

    alternativeOutputId

    optional alternative outputId of DataObject later in the DAG. This replaces the mainOutputId. It can be used to ensure processing all partitions over multiple actions in case of errors.

    nbOfPartitionValuesPerRun

    optional restriction of the number of partition values per run.

    applyCondition

    Condition to decide if execution mode should be applied or not. Define a spark sql expression working with attributes of DefaultExecutionModeExpressionData returning a boolean. Default is to apply the execution mode if given partition values (partition values from command line or passed from previous action) are not empty.

    failConditions

    List of conditions to fail application of execution mode if true. Define as spark sql expressions working with attributes of PartitionDiffModeExpressionData returning a boolean. Default is that the application of the PartitionDiffMode does not fail the action. If there is no data to process, the following actions are skipped. Multiple conditions are evaluated individually and every condition may fail the execution mode (or-logic)

    stopIfNoData

    Optional setting if further actions should be skipped if this action has no data to process (default). Set stopIfNoData=false if you want to run further actions nevertheless. They will receive output dataObject unfiltered as input.

    selectExpression

    optional expression to define or refine the list of selected output partitions. Define a spark sql expression working with the attributes of PartitionDiffModeExpressionData returning a list<map<string,string>>. Default is to return the originally selected output partitions found in attribute selectedOutputPartitionValues.

    applyPartitionValuesTransform

    If true applies the partition values transform of custom transformations on input partition values before comparison with output partition values. If enabled input and output partition columns can be different. Default is to disable the transformation of partition values.

    selectAdditionalInputExpression

    optional expression to refine the list of selected input partitions. Note that primarily output partitions are selected by PartitionDiffMode. The selected output partitions are then transformed back to the input partitions needed to create the selected output partitions. This is one-to-one except if applyPartitionValuesTransform=true. And sometimes there is a need for additional input data to create the output partitions, e.g. if you aggregate a window of 7 days for every day. You can customize selected input partitions by defining a spark sql expression working with the attributes of PartitionDiffModeExpressionData returning a list<map<string,string>>. Default is to return the originally selected input partitions found in attribute selectedInputPartitionValues.

Value Members

  1. final def !=(arg0: Any): Boolean

    Permalink
    Definition Classes
    AnyRef → Any
  2. final def ##(): Int

    Permalink
    Definition Classes
    AnyRef → Any
  3. final def ==(arg0: Any): Boolean

    Permalink
    Definition Classes
    AnyRef → Any
  4. def alternativeOutput(implicit context: ActionPipelineContext): Option[DataObject]

    Permalink
    Definition Classes
    ExecutionModeWithMainInputOutput
  5. val alternativeOutputId: Option[DataObjectId]

    Permalink

    optional alternative outputId of DataObject later in the DAG.

    optional alternative outputId of DataObject later in the DAG. This replaces the mainOutputId. It can be used to ensure processing all partitions over multiple actions in case of errors.

    Definition Classes
    PartitionDiffMode → ExecutionModeWithMainInputOutput
  6. val applyCondition: Option[String]

    Permalink

    Condition to decide if execution mode should be applied or not.

    Condition to decide if execution mode should be applied or not. Define a spark sql expression working with attributes of DefaultExecutionModeExpressionData returning a boolean. Default is to apply the execution mode if given partition values (partition values from command line or passed from previous action) are not empty.

  7. val applyPartitionValuesTransform: Boolean

    Permalink

    If true applies the partition values transform of custom transformations on input partition values before comparison with output partition values.

    If true applies the partition values transform of custom transformations on input partition values before comparison with output partition values. If enabled input and output partition columns can be different. Default is to disable the transformation of partition values.

  8. final def asInstanceOf[T0]: T0

    Permalink
    Definition Classes
    Any
  9. def clone(): AnyRef

    Permalink
    Attributes
    protected[java.lang]
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  10. final def eq(arg0: AnyRef): Boolean

    Permalink
    Definition Classes
    AnyRef
  11. val failCondition: Option[String]

    Permalink
  12. val failConditions: Seq[Condition]

    Permalink

    List of conditions to fail application of execution mode if true.

    List of conditions to fail application of execution mode if true. Define as spark sql expressions working with attributes of PartitionDiffModeExpressionData returning a boolean. Default is that the application of the PartitionDiffMode does not fail the action. If there is no data to process, the following actions are skipped. Multiple conditions are evaluated individually and every condition may fail the execution mode (or-logic)

  13. def finalize(): Unit

    Permalink
    Attributes
    protected[java.lang]
    Definition Classes
    AnyRef
    Annotations
    @throws( classOf[java.lang.Throwable] )
  14. final def getClass(): Class[_]

    Permalink
    Definition Classes
    AnyRef → Any
  15. final def isInstanceOf[T0]: Boolean

    Permalink
    Definition Classes
    Any
  16. lazy val logger: Logger

    Permalink
    Attributes
    protected
    Definition Classes
    SmartDataLakeLogger
  17. val nbOfPartitionValuesPerRun: Option[Int]

    Permalink

    optional restriction of the number of partition values per run.

  18. final def ne(arg0: AnyRef): Boolean

    Permalink
    Definition Classes
    AnyRef
  19. final def notify(): Unit

    Permalink
    Definition Classes
    AnyRef
  20. final def notifyAll(): Unit

    Permalink
    Definition Classes
    AnyRef
  21. val partitionColNb: Option[Int]

    Permalink

    optional number of partition columns to use as a common 'init'.

  22. val selectAdditionalInputExpression: Option[String]

    Permalink

    optional expression to refine the list of selected input partitions.

    optional expression to refine the list of selected input partitions. Note that primarily output partitions are selected by PartitionDiffMode. The selected output partitions are then transformed back to the input partitions needed to create the selected output partitions. This is one-to-one except if applyPartitionValuesTransform=true. And sometimes there is a need for additional input data to create the output partitions, e.g. if you aggregate a window of 7 days for every day. You can customize selected input partitions by defining a spark sql expression working with the attributes of PartitionDiffModeExpressionData returning a list<map<string,string>>. Default is to return the originally selected input partitions found in attribute selectedInputPartitionValues.

  23. val selectExpression: Option[String]

    Permalink

    optional expression to define or refine the list of selected output partitions.

    optional expression to define or refine the list of selected output partitions. Define a spark sql expression working with the attributes of PartitionDiffModeExpressionData returning a list<map<string,string>>. Default is to return the originally selected output partitions found in attribute selectedOutputPartitionValues.

  24. final def synchronized[T0](arg0: ⇒ T0): T0

    Permalink
    Definition Classes
    AnyRef
  25. final def wait(): Unit

    Permalink
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  26. final def wait(arg0: Long, arg1: Int): Unit

    Permalink
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  27. final def wait(arg0: Long): Unit

    Permalink
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )

Deprecated Value Members

  1. val stopIfNoData: Boolean

    Permalink

    Optional setting if further actions should be skipped if this action has no data to process (default).

    Optional setting if further actions should be skipped if this action has no data to process (default). Set stopIfNoData=false if you want to run further actions nevertheless. They will receive output dataObject unfiltered as input.

    Annotations
    @deprecated
    Deprecated

    (Since version 2.0.3) use following actions executionCondition=true & executionMode=ProcessAll instead

Inherited from Serializable

Inherited from Serializable

Inherited from Product

Inherited from Equals

Inherited from ExecutionModeWithMainInputOutput

Inherited from ExecutionMode

Inherited from SmartDataLakeLogger

Inherited from AnyRef

Inherited from Any

Ungrouped