Packages

class AsyncProgressTrackingMicroBatchExecution extends MicroBatchExecution

Class to execute micro-batches when async progress tracking is enabled

Ordering
  1. Alphabetic
  2. By Inheritance
Inherited
  1. AsyncProgressTrackingMicroBatchExecution
  2. MicroBatchExecution
  3. AsyncLogPurge
  4. StreamExecution
  5. ProgressReporter
  6. Logging
  7. StreamingQuery
  8. AnyRef
  9. Any
  1. Hide All
  2. Show All
Visibility
  1. Public
  2. Protected

Instance Constructors

  1. new AsyncProgressTrackingMicroBatchExecution(sparkSession: SparkSession, trigger: Trigger, triggerClock: Clock, extraOptions: Map[String, String], plan: WriteToStream)

Type Members

  1. case class ExecutionStats(inputRows: Map[SparkDataStream, Long], stateOperators: Seq[StateOperatorProgress], eventTimeStats: Map[String, String]) extends Product with Serializable
    Definition Classes
    ProgressReporter

Value Members

  1. final def !=(arg0: Any): Boolean
    Definition Classes
    AnyRef → Any
  2. final def ##: Int
    Definition Classes
    AnyRef → Any
  3. final def ==(arg0: Any): Boolean
    Definition Classes
    AnyRef → Any
  4. val analyzedPlan: LogicalPlan
    Definition Classes
    StreamExecution
  5. def areWritesPendingOrInProgress(): Boolean
  6. final def asInstanceOf[T0]: T0
    Definition Classes
    Any
  7. def asyncLogPurgeShutdown(): Unit
    Attributes
    protected
    Definition Classes
    AsyncLogPurge
  8. val asyncProgressTrackingCheckpointingIntervalMs: Long
    Attributes
    protected
  9. val asyncWritesExecutorService: ThreadPoolExecutor
    Attributes
    protected
  10. var availableOffsets: StreamProgress

    Tracks the offsets that are available to be processed, but have not yet be committed to the sink.

    Tracks the offsets that are available to be processed, but have not yet be committed to the sink. Only the scheduler thread should modify this field, and only in atomic steps. Other threads should make a shallow copy if they are going to access this field more than once, since the field's value may change at any time.

    Definition Classes
    StreamExecution
  11. def awaitInitialization(timeoutMs: Long): Unit

    Await until all fields of the query have been initialized.

    Await until all fields of the query have been initialized.

    Definition Classes
    StreamExecution
  12. val awaitProgressLock: ReentrantLock

    A lock used to wait/notify when batches complete.

    A lock used to wait/notify when batches complete. Use a fair lock to avoid thread starvation.

    Attributes
    protected
    Definition Classes
    StreamExecution
  13. val awaitProgressLockCondition: Condition
    Attributes
    protected
    Definition Classes
    StreamExecution
  14. def awaitTermination(timeoutMs: Long): Boolean

    Waits for the termination of this query, either by query.stop() or by an exception.

    Waits for the termination of this query, either by query.stop() or by an exception. If the query has terminated with an exception, then the exception will be thrown. Otherwise, it returns whether the query has terminated or not within the timeoutMs milliseconds.

    If the query has terminated, then all subsequent calls to this method will either return true immediately (if the query was terminated by stop()), or throw the exception immediately (if the query has terminated with exception).

    Definition Classes
    StreamExecutionStreamingQuery
    Since

    2.0.0

    Exceptions thrown

    StreamingQueryException if the query has terminated with an exception

  15. def awaitTermination(): Unit

    Waits for the termination of this query, either by query.stop() or by an exception.

    Waits for the termination of this query, either by query.stop() or by an exception. If the query has terminated with an exception, then the exception will be thrown.

    If the query has terminated, then all subsequent calls to this method will either return immediately (if the query was terminated by stop()), or throw the exception immediately (if the query has terminated with exception).

    Definition Classes
    StreamExecutionStreamingQuery
    Since

    2.0.0

    Exceptions thrown

    StreamingQueryException if the query has terminated with an exception.

  16. def checkpointFile(name: String): String

    Returns the path of a file with name in the checkpoint directory.

    Returns the path of a file with name in the checkpoint directory.

    Attributes
    protected
    Definition Classes
    StreamExecution
  17. def cleanUpLastExecutedMicroBatch(): Unit
  18. def cleanup(): Unit

    Any clean up that needs to happen when the query is stopped or exits

    Any clean up that needs to happen when the query is stopped or exits

    Definition Classes
    AsyncProgressTrackingMicroBatchExecutionMicroBatchExecutionStreamExecution
  19. def clone(): AnyRef
    Attributes
    protected[lang]
    Definition Classes
    AnyRef
    Annotations
    @throws(classOf[java.lang.CloneNotSupportedException]) @native()
  20. val commitLog: AsyncCommitLog
  21. def commitSources(offsetSeq: OffsetSeq): Unit
    Attributes
    protected
    Definition Classes
    MicroBatchExecution
  22. var committedOffsets: StreamProgress

    Tracks how much data we have processed and committed to the sink or state store from each input source.

    Tracks how much data we have processed and committed to the sink or state store from each input source. Only the scheduler thread should modify this field, and only in atomic steps. Other threads should make a shallow copy if they are going to access this field more than once, since the field's value may change at any time.

    Definition Classes
    StreamExecution
  23. def createWrite(table: SupportsWrite, options: Map[String, String], inputPlan: LogicalPlan): Write
    Attributes
    protected
    Definition Classes
    StreamExecution
  24. var currentBatchId: Long

    The current batchId or -1 if execution has not yet been initialized.

    The current batchId or -1 if execution has not yet been initialized.

    Attributes
    protected
    Definition Classes
    StreamExecutionProgressReporter
  25. val currentStatus: StreamingQueryStatus
    Attributes
    protected
    Definition Classes
    ProgressReporter
    Annotations
    @volatile()
  26. final def eq(arg0: AnyRef): Boolean
    Definition Classes
    AnyRef
  27. def equals(arg0: AnyRef): Boolean
    Definition Classes
    AnyRef → Any
  28. val errorNotifier: ErrorNotifier
    Attributes
    protected[sql]
    Definition Classes
    MicroBatchExecutionAsyncLogPurge
  29. def exception: Option[StreamingQueryException]

    Returns the StreamingQueryException if the query was terminated by an exception.

    Returns the StreamingQueryException if the query was terminated by an exception.

    Definition Classes
    StreamExecutionStreamingQuery
  30. def explain(): Unit

    Prints the physical plan to the console for debugging purposes.

    Prints the physical plan to the console for debugging purposes.

    Definition Classes
    StreamExecutionStreamingQuery
    Since

    2.0.0

  31. def explain(extended: Boolean): Unit

    Prints the physical plan to the console for debugging purposes.

    Prints the physical plan to the console for debugging purposes.

    extended

    whether to do extended explain or not

    Definition Classes
    StreamExecutionStreamingQuery
    Since

    2.0.0

  32. def explainInternal(extended: Boolean): String

    Expose for tests

    Expose for tests

    Definition Classes
    StreamExecution
  33. def finalize(): Unit
    Attributes
    protected[lang]
    Definition Classes
    AnyRef
    Annotations
    @throws(classOf[java.lang.Throwable])
  34. def finishTrigger(hasNewData: Boolean, hasExecuted: Boolean): Unit

    Finalizes the query progress and adds it to list of recent status updates.

    Finalizes the query progress and adds it to list of recent status updates.

    hasNewData

    Whether the sources of this stream had new data for this trigger.

    hasExecuted

    Whether any batch was executed during this trigger. Streaming queries that perform stateful aggregations with timeouts can still run batches even though the sources don't have any new data.

    Attributes
    protected
    Definition Classes
    ProgressReporter
  35. def formatTimestamp(millis: Long): String
    Attributes
    protected
    Definition Classes
    ProgressReporter
  36. def getBatchDescriptionString: String
    Attributes
    protected
    Definition Classes
    StreamExecution
  37. final def getClass(): Class[_ <: AnyRef]
    Definition Classes
    AnyRef → Any
    Annotations
    @native()
  38. def hashCode(): Int
    Definition Classes
    AnyRef → Any
    Annotations
    @native()
  39. val id: UUID

    Returns the unique id of this query that persists across restarts from checkpoint data.

    Returns the unique id of this query that persists across restarts from checkpoint data. That is, this id is generated when a query is started for the first time, and will be the same every time it is restarted from checkpoint data. Also see runId.

    Definition Classes
    StreamExecutionProgressReporterStreamingQuery
    Since

    2.1.0

  40. def initializeLogIfNecessary(isInterpreter: Boolean, silent: Boolean): Boolean
    Attributes
    protected
    Definition Classes
    Logging
  41. def initializeLogIfNecessary(isInterpreter: Boolean): Unit
    Attributes
    protected
    Definition Classes
    Logging
  42. def interruptAndAwaitExecutionThreadTermination(): Unit

    Interrupts the query execution thread and awaits its termination until until it exceeds the timeout.

    Interrupts the query execution thread and awaits its termination until until it exceeds the timeout. The timeout can be set on "spark.sql.streaming.stopTimeout".

    Attributes
    protected
    Definition Classes
    StreamExecution
    Annotations
    @throws(scala.this.throws.<init>$default$1[java.util.concurrent.TimeoutException])
    Exceptions thrown

    TimeoutException If the thread cannot be stopped within the timeout

  43. def isActive: Boolean

    Whether the query is currently active or not

    Whether the query is currently active or not

    Definition Classes
    StreamExecutionStreamingQuery
  44. final def isInstanceOf[T0]: Boolean
    Definition Classes
    Any
  45. def isTraceEnabled(): Boolean
    Attributes
    protected
    Definition Classes
    Logging
  46. var lastExecution: IncrementalExecution
    Definition Classes
    StreamExecutionProgressReporter
  47. def lastProgress: StreamingQueryProgress

    Returns the most recent query progress update or null if there were no progress updates.

    Returns the most recent query progress update or null if there were no progress updates.

    Definition Classes
    ProgressReporter
  48. var latestOffsets: StreamProgress

    Tracks the latest offsets for each input source.

    Tracks the latest offsets for each input source. Only the scheduler thread should modify this field, and only in atomic steps. Other threads should make a shallow copy if they are going to access this field more than once, since the field's value may change at any time.

    Definition Classes
    StreamExecution
  49. def log: Logger
    Attributes
    protected
    Definition Classes
    Logging
  50. def logDebug(msg: => String, throwable: Throwable): Unit
    Attributes
    protected
    Definition Classes
    Logging
  51. def logDebug(msg: => String): Unit
    Attributes
    protected
    Definition Classes
    Logging
  52. def logError(msg: => String, throwable: Throwable): Unit
    Attributes
    protected
    Definition Classes
    Logging
  53. def logError(msg: => String): Unit
    Attributes
    protected
    Definition Classes
    Logging
  54. def logInfo(msg: => String, throwable: Throwable): Unit
    Attributes
    protected
    Definition Classes
    Logging
  55. def logInfo(msg: => String): Unit
    Attributes
    protected
    Definition Classes
    Logging
  56. def logName: String
    Attributes
    protected
    Definition Classes
    Logging
  57. def logTrace(msg: => String, throwable: Throwable): Unit
    Attributes
    protected
    Definition Classes
    Logging
  58. def logTrace(msg: => String): Unit
    Attributes
    protected
    Definition Classes
    Logging
  59. def logWarning(msg: => String, throwable: Throwable): Unit
    Attributes
    protected
    Definition Classes
    Logging
  60. def logWarning(msg: => String): Unit
    Attributes
    protected
    Definition Classes
    Logging
  61. lazy val logicalPlan: LogicalPlan
  62. def markMicroBatchEnd(): Unit

    Called after the microbatch has completed execution.

    Called after the microbatch has completed execution. It takes care of committing the offset to commit log and other bookkeeping.

    Definition Classes
    AsyncProgressTrackingMicroBatchExecutionMicroBatchExecution
  63. def markMicroBatchExecutionStart(): Unit

    Method called once after the planning is done and before the start of the microbatch execution.

    Method called once after the planning is done and before the start of the microbatch execution. It can be used to perform any pre-execution tasks.

    Definition Classes
    AsyncProgressTrackingMicroBatchExecutionMicroBatchExecution
  64. def markMicroBatchStart(): Unit

    Should not call super method as we need to do something completely different in this method for async progress tracking

    Should not call super method as we need to do something completely different in this method for async progress tracking

    Definition Classes
    AsyncProgressTrackingMicroBatchExecutionMicroBatchExecution
  65. val minLogEntriesToMaintain: Int
    Attributes
    protected
    Definition Classes
    StreamExecution
  66. val name: String

    Returns the user-specified name of the query, or null if not specified.

    Returns the user-specified name of the query, or null if not specified. This name can be specified in the org.apache.spark.sql.streaming.DataStreamWriter as dataframe.writeStream.queryName("query").start(). This name, if set, must be unique across all active queries.

    Definition Classes
    StreamExecutionProgressReporterStreamingQuery
    Since

    2.0.0

  67. final def ne(arg0: AnyRef): Boolean
    Definition Classes
    AnyRef
  68. var newData: Map[SparkDataStream, LogicalPlan]

    Holds the most recent input data for each source.

    Holds the most recent input data for each source.

    Attributes
    protected
    Definition Classes
    StreamExecutionProgressReporter
  69. var noNewData: Boolean

    A flag to indicate that a batch has completed with no new data available.

    A flag to indicate that a batch has completed with no new data available.

    Attributes
    protected
    Definition Classes
    StreamExecution
  70. final def notify(): Unit
    Definition Classes
    AnyRef
    Annotations
    @native()
  71. final def notifyAll(): Unit
    Definition Classes
    AnyRef
    Annotations
    @native()
  72. val offsetLog: AsyncOffsetSeqLog
  73. var offsetSeqMetadata: OffsetSeqMetadata

    Metadata associated with the offset seq of a batch in the query.

    Metadata associated with the offset seq of a batch in the query.

    Attributes
    protected
    Definition Classes
    StreamExecutionProgressReporter
  74. val outputMode: OutputMode
    Definition Classes
    StreamExecution
  75. val pollingDelayMs: Long
    Attributes
    protected
    Definition Classes
    StreamExecution
  76. def postEvent(event: Event): Unit
    Attributes
    protected
    Definition Classes
    StreamExecutionProgressReporter
  77. val prettyIdString: String

    Pretty identified string of printing in logs.

    Pretty identified string of printing in logs. Format is If name is set "queryName [id = xyz, runId = abc]" else "[id = xyz, runId = abc]"

    Attributes
    protected
    Definition Classes
    StreamExecution
  78. def processAllAvailable(): Unit

    Blocks until all available data in the source has been processed and committed to the sink.

    Blocks until all available data in the source has been processed and committed to the sink. This method is intended for testing. Note that in the case of continually arriving data, this method may block forever. Additionally, this method is only guaranteed to block until data that has been synchronously appended data to a org.apache.spark.sql.execution.streaming.Source prior to invocation. (i.e. getOffset must immediately reflect the addition).

    Definition Classes
    StreamExecutionStreamingQuery
    Since

    2.0.0

  79. def purge(threshold: Long): Unit
  80. def purgeAsync(): Unit
    Attributes
    protected
    Definition Classes
    AsyncLogPurge
  81. val queryExecutionThread: QueryExecutionThread

    The thread that runs the micro-batches of this stream.

    The thread that runs the micro-batches of this stream. Note that this thread must be org.apache.spark.util.UninterruptibleThread to workaround KAFKA-1894: interrupting a running KafkaConsumer may cause endless loop.

    Definition Classes
    StreamExecution
  82. def recentProgress: Array[StreamingQueryProgress]

    Returns an array containing the most recent query progress updates.

    Returns an array containing the most recent query progress updates.

    Definition Classes
    ProgressReporter
  83. def recordTriggerOffsets(from: StreamProgress, to: StreamProgress, latest: StreamProgress): Unit

    Record the offsets range this trigger will process.

    Record the offsets range this trigger will process. Call this before updating committedOffsets in StreamExecution to make sure that the correct range is recorded.

    Attributes
    protected
    Definition Classes
    ProgressReporter
  84. def reportTimeTaken[T](triggerDetailKey: String)(body: => T): T

    Records the duration of running body for the next query progress update.

    Records the duration of running body for the next query progress update.

    Attributes
    protected
    Definition Classes
    ProgressReporter
  85. val resolvedCheckpointRoot: String
    Definition Classes
    StreamExecution
  86. def runActivatedStream(sparkSessionForStream: SparkSession): Unit

    Repeatedly attempts to run batches as data arrives.

    Repeatedly attempts to run batches as data arrives.

    Attributes
    protected
    Definition Classes
    MicroBatchExecutionStreamExecution
  87. val runId: UUID

    Returns the unique id of this run of the query.

    Returns the unique id of this run of the query. That is, every start/restart of a query will generate a unique runId. Therefore, every time a query is restarted from checkpoint, it will have the same id but different runIds.

    Definition Classes
    StreamExecutionProgressReporterStreamingQuery
  88. val sink: Table
    Definition Classes
    StreamExecutionProgressReporter
  89. var sinkCommitProgress: Option[StreamWriterCommitProgress]
    Definition Classes
    StreamExecutionProgressReporter
  90. var sources: Seq[SparkDataStream]
    Attributes
    protected
    Definition Classes
    MicroBatchExecutionProgressReporter
  91. val sparkSession: SparkSession

    Returns the SparkSession associated with this.

    Returns the SparkSession associated with this.

    Definition Classes
    StreamExecutionProgressReporterStreamingQuery
    Since

    2.0.0

  92. def start(): Unit

    Starts the execution.

    Starts the execution. This returns only after the thread has started and QueryStartedEvent has been posted to all the listeners.

    Definition Classes
    StreamExecution
  93. def startTrigger(): Unit

    Begins recording statistics about query progress for a given trigger.

    Begins recording statistics about query progress for a given trigger.

    Attributes
    protected
    Definition Classes
    MicroBatchExecutionProgressReporter
  94. val state: AtomicReference[State]

    Defines the internal state of execution

    Defines the internal state of execution

    Attributes
    protected
    Definition Classes
    StreamExecution
  95. def status: StreamingQueryStatus

    Returns the current status of the query.

    Returns the current status of the query.

    Definition Classes
    ProgressReporter
  96. def stop(): Unit

    Signals to the thread executing micro-batches that it should stop running after the next batch.

    Signals to the thread executing micro-batches that it should stop running after the next batch. This method blocks until the thread stops running.

    Definition Classes
    MicroBatchExecutionStreamingQuery
  97. def stopSources(): Unit

    Stops all streaming sources safely.

    Stops all streaming sources safely.

    Attributes
    protected
    Definition Classes
    StreamExecution
  98. var streamDeathCause: StreamingQueryException
    Attributes
    protected
    Definition Classes
    StreamExecution
  99. val streamMetadata: StreamMetadata

    Metadata associated with the whole query

    Metadata associated with the whole query

    Attributes
    protected
    Definition Classes
    StreamExecution
  100. lazy val streamMetrics: MetricsReporter

    Used to report metrics to coda-hale.

    Used to report metrics to coda-hale. This uses id for easier tracking across restarts.

    Definition Classes
    StreamExecution
  101. final def synchronized[T0](arg0: => T0): T0
    Definition Classes
    AnyRef
  102. def toString(): String
    Definition Classes
    StreamExecution → AnyRef → Any
  103. val trigger: Trigger
    Definition Classes
    StreamExecution
  104. val triggerClock: Clock
    Definition Classes
    StreamExecutionProgressReporter
  105. val triggerExecutor: TriggerExecutor
  106. var uniqueSources: Map[SparkDataStream, ReadLimit]

    A list of unique sources in the query plan.

    A list of unique sources in the query plan. This will be set when generating logical plan.

    Attributes
    protected
    Definition Classes
    StreamExecution
  107. def updateStatusMessage(message: String): Unit

    Updates the message returned in status.

    Updates the message returned in status.

    Attributes
    protected
    Definition Classes
    ProgressReporter
  108. lazy val useAsyncPurge: Boolean
    Attributes
    protected
    Definition Classes
    AsyncLogPurge
  109. def validateOffsetLogAndGetPrevOffset(latestBatchId: Long): Option[OffsetSeq]

    Conduct sanity checks on the offset log to make sure it is correct and expected.

    Conduct sanity checks on the offset log to make sure it is correct and expected. Also return the previous offset written to the offset log

    latestBatchId

    the batch id of the current micro batch

    returns

    A option that contains the offset of the previously written batch

    Definition Classes
    AsyncProgressTrackingMicroBatchExecutionMicroBatchExecution
  110. final def wait(): Unit
    Definition Classes
    AnyRef
    Annotations
    @throws(classOf[java.lang.InterruptedException])
  111. final def wait(arg0: Long, arg1: Int): Unit
    Definition Classes
    AnyRef
    Annotations
    @throws(classOf[java.lang.InterruptedException])
  112. final def wait(arg0: Long): Unit
    Definition Classes
    AnyRef
    Annotations
    @throws(classOf[java.lang.InterruptedException]) @native()
  113. val watermarkMsMap: Map[Int, Long]

    A map of current watermarks, keyed by the position of the watermark operator in the physical plan.

    A map of current watermarks, keyed by the position of the watermark operator in the physical plan.

    This state is 'soft state', which does not affect the correctness and semantics of watermarks and is not persisted across query restarts. The fault-tolerant watermark state is in offsetSeqMetadata.

    Attributes
    protected
    Definition Classes
    StreamExecution
  114. var watermarkTracker: WatermarkTracker
    Attributes
    protected
    Definition Classes
    MicroBatchExecution

Inherited from MicroBatchExecution

Inherited from AsyncLogPurge

Inherited from StreamExecution

Inherited from ProgressReporter

Inherited from Logging

Inherited from StreamingQuery

Inherited from AnyRef

Inherited from Any

Ungrouped