Packages

class AsyncProgressTrackingMicroBatchExecution extends MicroBatchExecution

Class to execute micro-batches when async progress tracking is enabled

Linear Supertypes
MicroBatchExecution, AsyncLogPurge, StreamExecution, Logging, StreamingQuery, StreamingQuery, AnyRef, Any
Ordering
  1. Alphabetic
  2. By Inheritance
Inherited
  1. AsyncProgressTrackingMicroBatchExecution
  2. MicroBatchExecution
  3. AsyncLogPurge
  4. StreamExecution
  5. Logging
  6. StreamingQuery
  7. StreamingQuery
  8. AnyRef
  9. Any
  1. Hide All
  2. Show All
Visibility
  1. Public
  2. Protected

Instance Constructors

  1. new AsyncProgressTrackingMicroBatchExecution(sparkSession: classic.SparkSession, trigger: Trigger, triggerClock: Clock, extraOptions: Map[String, String], plan: WriteToStream)

Type Members

  1. implicit class LogStringContext extends AnyRef
    Definition Classes
    Logging

Value Members

  1. final def !=(arg0: Any): Boolean
    Definition Classes
    AnyRef → Any
  2. final def ##: Int
    Definition Classes
    AnyRef → Any
  3. final def ==(arg0: Any): Boolean
    Definition Classes
    AnyRef → Any
  4. val analyzedPlan: LogicalPlan
    Definition Classes
    StreamExecution
  5. def areWritesPendingOrInProgress(): Boolean
  6. final def asInstanceOf[T0]: T0
    Definition Classes
    Any
  7. def asyncLogPurgeShutdown(): Unit
    Attributes
    protected
    Definition Classes
    AsyncLogPurge
  8. val asyncProgressTrackingCheckpointingIntervalMs: Long
    Attributes
    protected
  9. val asyncWritesExecutorService: ThreadPoolExecutor
    Attributes
    protected
  10. def availableOffsets: StreamProgress

    Get the end or formerly know as "available" offsets of the latest batch that has been planned

    Get the end or formerly know as "available" offsets of the latest batch that has been planned

    Definition Classes
    StreamExecution
  11. def awaitInitialization(timeoutMs: Long): Unit

    Await until all fields of the query have been initialized.

    Await until all fields of the query have been initialized.

    Definition Classes
    StreamExecution
  12. val awaitProgressLock: ReentrantLock

    A lock used to wait/notify when batches complete.

    A lock used to wait/notify when batches complete. Use a fair lock to avoid thread starvation.

    Attributes
    protected
    Definition Classes
    StreamExecution
  13. val awaitProgressLockCondition: Condition
    Attributes
    protected
    Definition Classes
    StreamExecution
  14. def awaitTermination(timeoutMs: Long): Boolean
    Definition Classes
    StreamExecution → StreamingQuery
  15. def awaitTermination(): Unit
    Definition Classes
    StreamExecution → StreamingQuery
  16. def checkpointFile(name: String): String

    Returns the path of a file with name in the checkpoint directory.

    Returns the path of a file with name in the checkpoint directory.

    Attributes
    protected
    Definition Classes
    StreamExecution
  17. val checkpointMetadata: StreamingQueryCheckpointMetadata

    Manages the metadata from this checkpoint location.

    Manages the metadata from this checkpoint location.

    Attributes
    protected
    Definition Classes
    StreamExecution
  18. def cleanUpLastExecutedMicroBatch(execCtx: MicroBatchExecutionContext): Unit
  19. def cleanup(): Unit

    Any clean up that needs to happen when the query is stopped or exits

    Any clean up that needs to happen when the query is stopped or exits

    Definition Classes
    AsyncProgressTrackingMicroBatchExecutionMicroBatchExecutionStreamExecution
  20. def clone(): AnyRef
    Attributes
    protected[lang]
    Definition Classes
    AnyRef
    Annotations
    @throws(classOf[java.lang.CloneNotSupportedException]) @IntrinsicCandidate() @native()
  21. lazy val commitLog: AsyncCommitLog
  22. def commitSources(offsetSeq: OffsetSeq): Unit
    Attributes
    protected
    Definition Classes
    MicroBatchExecution
  23. var committedOffsets: StreamProgress

    Tracks how much data we have processed and committed to the sink or state store from each input source.

    Tracks how much data we have processed and committed to the sink or state store from each input source. Only the scheduler thread should modify this field, and only in atomic steps. Other threads should make a shallow copy if they are going to access this field more than once, since the field's value may change at any time.

    Definition Classes
    StreamExecution
  24. def createWrite(table: SupportsWrite, options: Map[String, String], inputPlan: LogicalPlan): Write
    Attributes
    protected
    Definition Classes
    StreamExecution
  25. final def eq(arg0: AnyRef): Boolean
    Definition Classes
    AnyRef
  26. def equals(arg0: AnyRef): Boolean
    Definition Classes
    AnyRef → Any
  27. val errorNotifier: ErrorNotifier
    Attributes
    protected[sql]
    Definition Classes
    MicroBatchExecutionAsyncLogPurge
  28. def exception: Option[StreamingQueryException]

    Returns the StreamingQueryException if the query was terminated by an exception.

    Returns the StreamingQueryException if the query was terminated by an exception.

    Definition Classes
    StreamExecution → StreamingQuery
  29. def explain(): Unit
    Definition Classes
    StreamExecution → StreamingQuery
  30. def explain(extended: Boolean): Unit
    Definition Classes
    StreamExecution → StreamingQuery
  31. def explainInternal(extended: Boolean): String

    Expose for tests

    Expose for tests

    Definition Classes
    StreamExecution
  32. def getBatchDescriptionString: String
    Attributes
    protected
    Definition Classes
    StreamExecution
  33. final def getClass(): Class[_ <: AnyRef]
    Definition Classes
    AnyRef → Any
    Annotations
    @IntrinsicCandidate() @native()
  34. def getLatestExecutionContext(): StreamExecutionContext

    Get the latest execution context .

    Get the latest execution context .

    Definition Classes
    MicroBatchExecutionStreamExecution
  35. def getStartOffsetsOfLatestBatch: StreamProgress

    Get the start offsets of the latest batch that has been planned

    Get the start offsets of the latest batch that has been planned

    Definition Classes
    StreamExecution
  36. def getTrigger(): TriggerExecutor
    Attributes
    protected
    Definition Classes
    AsyncProgressTrackingMicroBatchExecutionMicroBatchExecution
  37. def hashCode(): Int
    Definition Classes
    AnyRef → Any
    Annotations
    @IntrinsicCandidate() @native()
  38. val id: UUID
    Definition Classes
    StreamExecution → StreamingQuery
  39. def initializeLogIfNecessary(isInterpreter: Boolean, silent: Boolean): Boolean
    Attributes
    protected
    Definition Classes
    Logging
  40. def initializeLogIfNecessary(isInterpreter: Boolean): Unit
    Attributes
    protected
    Definition Classes
    Logging
  41. def interruptAndAwaitExecutionThreadTermination(): Unit

    Interrupts the query execution thread and awaits its termination until until it exceeds the timeout.

    Interrupts the query execution thread and awaits its termination until until it exceeds the timeout. The timeout can be set on "spark.sql.streaming.stopTimeout".

    Attributes
    protected
    Definition Classes
    StreamExecution
    Annotations
    @throws(scala.this.throws.<init>$default$1[java.util.concurrent.TimeoutException])
    Exceptions thrown

    TimeoutException If the thread cannot be stopped within the timeout

  42. def isActive: Boolean

    Whether the query is currently active or not

    Whether the query is currently active or not

    Definition Classes
    StreamExecution → StreamingQuery
  43. final def isInstanceOf[T0]: Boolean
    Definition Classes
    Any
  44. def isTraceEnabled(): Boolean
    Attributes
    protected
    Definition Classes
    Logging
  45. def lastExecution: IncrementalExecution
    Definition Classes
    StreamExecution
  46. def lastProgress: StreamingQueryProgress
    Definition Classes
    StreamExecution → StreamingQuery
  47. def latestOffsets: StreamProgress
    Definition Classes
    StreamExecution
  48. def log: Logger
    Attributes
    protected
    Definition Classes
    Logging
  49. def logBasedOnLevel(level: Level)(f: => MessageWithContext): Unit
    Attributes
    protected
    Definition Classes
    Logging
  50. def logDebug(msg: => String, throwable: Throwable): Unit
    Attributes
    protected
    Definition Classes
    Logging
  51. def logDebug(entry: LogEntry, throwable: Throwable): Unit
    Attributes
    protected
    Definition Classes
    Logging
  52. def logDebug(entry: LogEntry): Unit
    Attributes
    protected
    Definition Classes
    Logging
  53. def logDebug(msg: => String): Unit
    Attributes
    protected
    Definition Classes
    Logging
  54. def logError(msg: => String, throwable: Throwable): Unit
    Attributes
    protected
    Definition Classes
    Logging
  55. def logError(entry: LogEntry, throwable: Throwable): Unit
    Attributes
    protected
    Definition Classes
    Logging
  56. def logError(entry: LogEntry): Unit
    Attributes
    protected
    Definition Classes
    Logging
  57. def logError(msg: => String): Unit
    Attributes
    protected
    Definition Classes
    Logging
  58. def logInfo(msg: => String, throwable: Throwable): Unit
    Attributes
    protected
    Definition Classes
    Logging
  59. def logInfo(entry: LogEntry, throwable: Throwable): Unit
    Attributes
    protected
    Definition Classes
    Logging
  60. def logInfo(entry: LogEntry): Unit
    Attributes
    protected
    Definition Classes
    Logging
  61. def logInfo(msg: => String): Unit
    Attributes
    protected
    Definition Classes
    Logging
  62. def logName: String
    Attributes
    protected
    Definition Classes
    Logging
  63. def logTrace(msg: => String, throwable: Throwable): Unit
    Attributes
    protected
    Definition Classes
    Logging
  64. def logTrace(entry: LogEntry, throwable: Throwable): Unit
    Attributes
    protected
    Definition Classes
    Logging
  65. def logTrace(entry: LogEntry): Unit
    Attributes
    protected
    Definition Classes
    Logging
  66. def logTrace(msg: => String): Unit
    Attributes
    protected
    Definition Classes
    Logging
  67. def logWarning(msg: => String, throwable: Throwable): Unit
    Attributes
    protected
    Definition Classes
    Logging
  68. def logWarning(entry: LogEntry, throwable: Throwable): Unit
    Attributes
    protected
    Definition Classes
    Logging
  69. def logWarning(entry: LogEntry): Unit
    Attributes
    protected
    Definition Classes
    Logging
  70. def logWarning(msg: => String): Unit
    Attributes
    protected
    Definition Classes
    Logging
  71. var loggingThreadContext: Instance
    Attributes
    protected
    Definition Classes
    StreamExecution
  72. lazy val logicalPlan: LogicalPlan

    The base logical plan which will be used across batch runs.

    The base logical plan which will be used across batch runs. Once the value is set, it should not be modified.

    Definition Classes
    MicroBatchExecutionStreamExecution
  73. def markMicroBatchEnd(execCtx: MicroBatchExecutionContext): Unit

    Called after the microbatch has completed execution.

    Called after the microbatch has completed execution. It takes care of committing the offset to commit log and other bookkeeping.

    Definition Classes
    AsyncProgressTrackingMicroBatchExecutionMicroBatchExecution
  74. def markMicroBatchExecutionStart(execCtx: MicroBatchExecutionContext): Unit

    Method called once after the planning is done and before the start of the microbatch execution.

    Method called once after the planning is done and before the start of the microbatch execution. It can be used to perform any pre-execution tasks.

    Definition Classes
    AsyncProgressTrackingMicroBatchExecutionMicroBatchExecution
  75. def markMicroBatchStart(execCtx: MicroBatchExecutionContext): Unit

    Should not call super method as we need to do something completely different in this method for async progress tracking

    Should not call super method as we need to do something completely different in this method for async progress tracking

    Definition Classes
    AsyncProgressTrackingMicroBatchExecutionMicroBatchExecution
  76. val minLogEntriesToMaintain: Int
    Attributes
    protected
    Definition Classes
    StreamExecution
  77. val name: String
    Definition Classes
    StreamExecution → StreamingQuery
  78. final def ne(arg0: AnyRef): Boolean
    Definition Classes
    AnyRef
  79. var noNewData: Boolean

    A flag to indicate that a batch has completed with no new data available.

    A flag to indicate that a batch has completed with no new data available.

    Attributes
    protected
    Definition Classes
    StreamExecution
  80. final def notify(): Unit
    Definition Classes
    AnyRef
    Annotations
    @IntrinsicCandidate() @native()
  81. final def notifyAll(): Unit
    Definition Classes
    AnyRef
    Annotations
    @IntrinsicCandidate() @native()
  82. lazy val offsetLog: AsyncOffsetSeqLog
  83. val outputMode: OutputMode
    Definition Classes
    StreamExecution
  84. val pollingDelayMs: Long
    Attributes
    protected
    Definition Classes
    StreamExecution
  85. def populateStartOffsets(execCtx: MicroBatchExecutionContext, sparkSessionToRunBatches: classic.SparkSession): Unit

    Populate the start offsets to start the execution at the current offsets stored in the sink (i.e.

    Populate the start offsets to start the execution at the current offsets stored in the sink (i.e. avoid reprocessing data that we have already processed). This function must be called before any processing occurs and will populate the following fields in the execution context of this micro-batch

    • batchId
    • startOffset
    • endOffsets The basic structure of this method is as follows:

    Identify (from the offset log) the offsets used to run the last batch IF last batch exists THEN Set the next batch to be executed as the last recovered batch Check the commit log to see which batch was committed last IF the last batch was committed THEN Call getBatch using the last batch start and end offsets // ^^^^ above line is needed since some sources assume last batch always re-executes Setup for a new batch i.e., start = last batch end, and identify new end DONE ELSE Identify a brand new batch DONE

    Attributes
    protected
    Definition Classes
    MicroBatchExecution
  86. def postEvent(event: Event): Unit
    Attributes
    protected
    Definition Classes
    StreamExecution
  87. val prettyIdString: String

    Pretty identified string of printing in logs.

    Pretty identified string of printing in logs. Format is If name is set "queryName [id = xyz, runId = abc]" else "[id = xyz, runId = abc]"

    Attributes
    protected
    Definition Classes
    StreamExecution
  88. def processAllAvailable(): Unit
    Definition Classes
    StreamExecution → StreamingQuery
  89. val progressReporter: ProgressReporter
    Attributes
    protected
    Definition Classes
    StreamExecution
  90. def purge(threshold: Long): Unit
  91. def purgeAsync(batchId: Long): Unit
    Attributes
    protected
    Definition Classes
    AsyncLogPurge
  92. def purgeStatefulMetadata(plan: SparkPlan): Unit
    Attributes
    protected
    Definition Classes
    StreamExecution
  93. def purgeStatefulMetadataAsync(plan: SparkPlan): Unit
    Attributes
    protected
    Definition Classes
    AsyncLogPurge
  94. val queryExecutionThread: QueryExecutionThread

    The thread that runs the micro-batches of this stream.

    The thread that runs the micro-batches of this stream. Note that this thread must be org.apache.spark.util.UninterruptibleThread to workaround KAFKA-1894: interrupting a running KafkaConsumer may cause endless loop.

    Definition Classes
    StreamExecution
  95. def recentProgress: Array[StreamingQueryProgress]
    Definition Classes
    StreamExecution → StreamingQuery
  96. val resolvedCheckpointRoot: String
    Definition Classes
    StreamExecution
  97. def runActivatedStream(sparkSessionForStream: classic.SparkSession): Unit

    Repeatedly attempts to run batches as data arrives.

    Repeatedly attempts to run batches as data arrives.

    Attributes
    protected
    Definition Classes
    MicroBatchExecutionStreamExecution
  98. val runId: UUID
    Definition Classes
    StreamExecution → StreamingQuery
  99. def setLatestExecutionContext(ctx: StreamExecutionContext): Unit

    We will only set the lastExecutionContext only if the batch id is larger than the batch id of the current latestExecutionContext.

    We will only set the lastExecutionContext only if the batch id is larger than the batch id of the current latestExecutionContext. This is done to make sure we will always tracking the latest execution context i.e. we will never set latestExecutionContext to a earlier / older batch.

    Definition Classes
    MicroBatchExecution
  100. val sink: Table
    Definition Classes
    StreamExecution
  101. var sources: Seq[SparkDataStream]

    The list of stream instances which will be used across batch runs.

    The list of stream instances which will be used across batch runs. Once the value is set, it should not be modified.

    Attributes
    protected
    Definition Classes
    MicroBatchExecutionStreamExecution
  102. val sparkSession: classic.SparkSession

    <invalid inheritdoc annotation>

    <invalid inheritdoc annotation>

    Definition Classes
    StreamExecutionStreamingQuery → StreamingQuery
  103. val sparkSessionForStream: classic.SparkSession

    Isolated spark session to run the batches with.

    Isolated spark session to run the batches with.

    Attributes
    protected
    Definition Classes
    StreamExecution
  104. def start(): Unit

    Starts the execution.

    Starts the execution. This returns only after the thread has started and QueryStartedEvent has been posted to all the listeners.

    Definition Classes
    StreamExecution
  105. val state: AtomicReference[State]

    Defines the internal state of execution

    Defines the internal state of execution

    Attributes
    protected
    Definition Classes
    StreamExecution
  106. def status: StreamingQueryStatus
    Definition Classes
    StreamExecution → StreamingQuery
  107. def stop(): Unit

    Signals to the thread executing micro-batches that it should stop running after the next batch.

    Signals to the thread executing micro-batches that it should stop running after the next batch. This method blocks until the thread stops running.

    Definition Classes
    MicroBatchExecution → StreamingQuery
  108. def stopSources(): Unit

    Stops all streaming sources safely.

    Stops all streaming sources safely.

    Attributes
    protected
    Definition Classes
    StreamExecution
  109. var streamDeathCause: StreamingQueryException
    Attributes
    protected
    Definition Classes
    StreamExecution
  110. lazy val streamMetrics: MetricsReporter

    Used to report metrics to coda-hale.

    Used to report metrics to coda-hale. This uses id for easier tracking across restarts.

    Definition Classes
    StreamExecution
  111. final def synchronized[T0](arg0: => T0): T0
    Definition Classes
    AnyRef
  112. def toString(): String
    Definition Classes
    StreamExecution → AnyRef → Any
  113. val trigger: Trigger
    Definition Classes
    StreamExecution
  114. val triggerClock: Clock
    Definition Classes
    StreamExecution
  115. var triggerExecutor: TriggerExecutor
    Attributes
    protected[sql]
    Definition Classes
    MicroBatchExecution
  116. var uniqueSources: Map[SparkDataStream, ReadLimit]

    A list of unique sources in the query plan.

    A list of unique sources in the query plan. This will be set when generating logical plan.

    Attributes
    protected
    Definition Classes
    StreamExecution
  117. lazy val useAsyncPurge: Boolean
    Attributes
    protected
    Definition Classes
    AsyncLogPurge
  118. def validateOffsetLogAndGetPrevOffset(latestBatchId: Long): Option[OffsetSeq]

    Conduct sanity checks on the offset log to make sure it is correct and expected.

    Conduct sanity checks on the offset log to make sure it is correct and expected. Also return the previous offset written to the offset log

    latestBatchId

    the batch id of the current micro batch

    returns

    A option that contains the offset of the previously written batch

    Definition Classes
    AsyncProgressTrackingMicroBatchExecutionMicroBatchExecution
  119. final def wait(arg0: Long, arg1: Int): Unit
    Definition Classes
    AnyRef
    Annotations
    @throws(classOf[java.lang.InterruptedException])
  120. final def wait(arg0: Long): Unit
    Definition Classes
    AnyRef
    Annotations
    @throws(classOf[java.lang.InterruptedException]) @native()
  121. final def wait(): Unit
    Definition Classes
    AnyRef
    Annotations
    @throws(classOf[java.lang.InterruptedException])
  122. val watermarkMsMap: Map[Int, Long]

    A map of current watermarks, keyed by the position of the watermark operator in the physical plan.

    A map of current watermarks, keyed by the position of the watermark operator in the physical plan.

    This state is 'soft state', which does not affect the correctness and semantics of watermarks and is not persisted across query restarts. The fault-tolerant watermark state is in offsetSeqMetadata.

    Attributes
    protected
    Definition Classes
    StreamExecution
  123. var watermarkTracker: WatermarkTracker
    Attributes
    protected
    Definition Classes
    MicroBatchExecution
  124. def withLogContext(context: Map[String, String])(body: => Unit): Unit
    Attributes
    protected
    Definition Classes
    Logging

Deprecated Value Members

  1. def finalize(): Unit
    Attributes
    protected[lang]
    Definition Classes
    AnyRef
    Annotations
    @throws(classOf[java.lang.Throwable]) @Deprecated
    Deprecated

    (Since version 9)

Inherited from MicroBatchExecution

Inherited from AsyncLogPurge

Inherited from StreamExecution

Inherited from Logging

Inherited from StreamingQuery

Inherited from StreamingQuery

Inherited from AnyRef

Inherited from Any

Ungrouped