Class/Object

io.smartdatalake.workflow

SparkSubFeed

Related Docs: object SparkSubFeed | package workflow

Permalink

case class SparkSubFeed(dataFrame: Option[DataFrame], dataObjectId: DataObjectId, partitionValues: Seq[PartitionValues], isDAGStart: Boolean = false, isSkipped: Boolean = false, isDummy: Boolean = false, filter: Option[String] = None) extends SubFeed with Product with Serializable

A SparkSubFeed is used to transport DataFrame's between Actions.

dataFrame

Spark DataFrame to be processed. DataFrame should not be saved to state (@transient).

dataObjectId

id of the DataObject this SubFeed corresponds to

partitionValues

Values of Partitions transported by this SubFeed

isDAGStart

true if this subfeed is a start node of the dag

isDummy

true if this subfeed only contains a dummy DataFrame. Dummy DataFrames can be used for validating the lineage in init phase, but not for the exec phase.

filter

a spark sql filter expression. This is used by SparkIncrementalMode.

Linear Supertypes
Serializable, Serializable, Product, Equals, SubFeed, SmartDataLakeLogger, DAGResult, AnyRef, Any
Ordering
  1. Alphabetic
  2. By Inheritance
Inherited
  1. SparkSubFeed
  2. Serializable
  3. Serializable
  4. Product
  5. Equals
  6. SubFeed
  7. SmartDataLakeLogger
  8. DAGResult
  9. AnyRef
  10. Any
  1. Hide All
  2. Show All
Visibility
  1. Public
  2. All

Instance Constructors

  1. new SparkSubFeed(dataFrame: Option[DataFrame], dataObjectId: DataObjectId, partitionValues: Seq[PartitionValues], isDAGStart: Boolean = false, isSkipped: Boolean = false, isDummy: Boolean = false, filter: Option[String] = None)

    Permalink

    dataFrame

    Spark DataFrame to be processed. DataFrame should not be saved to state (@transient).

    dataObjectId

    id of the DataObject this SubFeed corresponds to

    partitionValues

    Values of Partitions transported by this SubFeed

    isDAGStart

    true if this subfeed is a start node of the dag

    isDummy

    true if this subfeed only contains a dummy DataFrame. Dummy DataFrames can be used for validating the lineage in init phase, but not for the exec phase.

    filter

    a spark sql filter expression. This is used by SparkIncrementalMode.

Value Members

  1. final def !=(arg0: Any): Boolean

    Permalink
    Definition Classes
    AnyRef → Any
  2. final def ##(): Int

    Permalink
    Definition Classes
    AnyRef → Any
  3. final def ==(arg0: Any): Boolean

    Permalink
    Definition Classes
    AnyRef → Any
  4. final def asInstanceOf[T0]: T0

    Permalink
    Definition Classes
    Any
  5. def breakLineage(implicit session: SparkSession, context: ActionPipelineContext): SparkSubFeed

    Permalink

    Break lineage.

    Break lineage. This means to discard an existing DataFrame or List of FileRefs, so that it is requested again from the DataObject. On one side this is usable to break long DataFrame Lineages over multiple Actions and instead reread the data from an intermediate table. On the other side it is needed if partition values or filter condition are changed.

    Definition Classes
    SparkSubFeedSubFeed
  6. def clearDAGStart(): SparkSubFeed

    Permalink
    Definition Classes
    SparkSubFeedSubFeed
  7. def clearFilter(breakLineageOnChange: Boolean = true)(implicit session: SparkSession, context: ActionPipelineContext): SparkSubFeed

    Permalink
  8. def clearPartitionValues(breakLineageOnChange: Boolean = true)(implicit session: SparkSession, context: ActionPipelineContext): SparkSubFeed

    Permalink
    Definition Classes
    SparkSubFeedSubFeed
  9. def clone(): AnyRef

    Permalink
    Attributes
    protected[java.lang]
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  10. val dataFrame: Option[DataFrame]

    Permalink

    Spark DataFrame to be processed.

    Spark DataFrame to be processed. DataFrame should not be saved to state (@transient).

  11. val dataObjectId: DataObjectId

    Permalink

    id of the DataObject this SubFeed corresponds to

    id of the DataObject this SubFeed corresponds to

    Definition Classes
    SparkSubFeedSubFeed
  12. final def eq(arg0: AnyRef): Boolean

    Permalink
    Definition Classes
    AnyRef
  13. val filter: Option[String]

    Permalink

    a spark sql filter expression.

    a spark sql filter expression. This is used by SparkIncrementalMode.

  14. def finalize(): Unit

    Permalink
    Attributes
    protected[java.lang]
    Definition Classes
    AnyRef
    Annotations
    @throws( classOf[java.lang.Throwable] )
  15. final def getClass(): Class[_]

    Permalink
    Definition Classes
    AnyRef → Any
  16. def getFilterCol: Option[Column]

    Permalink
  17. def hasReusableDataFrame: Boolean

    Permalink
  18. val isDAGStart: Boolean

    Permalink

    true if this subfeed is a start node of the dag

    true if this subfeed is a start node of the dag

    Definition Classes
    SparkSubFeedSubFeed
  19. val isDummy: Boolean

    Permalink

    true if this subfeed only contains a dummy DataFrame.

    true if this subfeed only contains a dummy DataFrame. Dummy DataFrames can be used for validating the lineage in init phase, but not for the exec phase.

  20. final def isInstanceOf[T0]: Boolean

    Permalink
    Definition Classes
    Any
  21. val isSkipped: Boolean

    Permalink
    Definition Classes
    SparkSubFeedSubFeed
  22. def isStreaming: Option[Boolean]

    Permalink
  23. lazy val logger: Logger

    Permalink
    Attributes
    protected
    Definition Classes
    SmartDataLakeLogger
  24. def movePartitionColumnsLast(partitions: Seq[String]): SparkSubFeed

    Permalink
  25. final def ne(arg0: AnyRef): Boolean

    Permalink
    Definition Classes
    AnyRef
  26. final def notify(): Unit

    Permalink
    Definition Classes
    AnyRef
  27. final def notifyAll(): Unit

    Permalink
    Definition Classes
    AnyRef
  28. val partitionValues: Seq[PartitionValues]

    Permalink

    Values of Partitions transported by this SubFeed

    Values of Partitions transported by this SubFeed

    Definition Classes
    SparkSubFeedSubFeed
  29. def persist: SparkSubFeed

    Permalink
  30. def resultId: String

    Permalink
    Definition Classes
    SubFeed → DAGResult
  31. final def synchronized[T0](arg0: ⇒ T0): T0

    Permalink
    Definition Classes
    AnyRef
  32. def toOutput(dataObjectId: DataObjectId): SparkSubFeed

    Permalink
    Definition Classes
    SparkSubFeedSubFeed
  33. def union(other: SubFeed)(implicit session: SparkSession, context: ActionPipelineContext): SubFeed

    Permalink
    Definition Classes
    SparkSubFeedSubFeed
  34. def unionPartitionValues(otherPartitionValues: Seq[PartitionValues]): Seq[PartitionValues]

    Permalink
    Definition Classes
    SubFeed
  35. def updatePartitionValues(partitions: Seq[String], breakLineageOnChange: Boolean = true, newPartitionValues: Option[Seq[PartitionValues]] = None)(implicit session: SparkSession, context: ActionPipelineContext): SparkSubFeed

    Permalink
    Definition Classes
    SparkSubFeedSubFeed
  36. final def wait(): Unit

    Permalink
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  37. final def wait(arg0: Long, arg1: Int): Unit

    Permalink
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  38. final def wait(arg0: Long): Unit

    Permalink
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )

Inherited from Serializable

Inherited from Serializable

Inherited from Product

Inherited from Equals

Inherited from SubFeed

Inherited from SmartDataLakeLogger

Inherited from DAGResult

Inherited from AnyRef

Inherited from Any

Ungrouped