org.apache.spark.sql.execution.streaming

AsyncProgressTrackingMicroBatchExecution

Companion object AsyncProgressTrackingMicroBatchExecution

class AsyncProgressTrackingMicroBatchExecution extends MicroBatchExecution

Class to execute micro-batches when async progress tracking is enabled

Linear Supertypes

MicroBatchExecution, AsyncLogPurge, StreamExecution, ProgressReporter, Logging, StreamingQuery, AnyRef, Any

Ordering

Alphabetic
By Inheritance

Inherited

AsyncProgressTrackingMicroBatchExecution
MicroBatchExecution
AsyncLogPurge
StreamExecution
ProgressReporter
Logging
StreamingQuery
AnyRef
Any

Hide All
Show All

Visibility

Public
Protected

Instance Constructors

new AsyncProgressTrackingMicroBatchExecution(sparkSession: SparkSession, trigger: Trigger, triggerClock: Clock, extraOptions: Map[String, String], plan: WriteToStream)

Type Members

case class ExecutionStats(inputRows: Map[SparkDataStream, Long], stateOperators: Seq[StateOperatorProgress], eventTimeStats: Map[String, String]) extends Product with Serializable
Definition Classes
ProgressReporter

Value Members

final def !=(arg0: Any): Boolean
Definition Classes
AnyRef → Any
final def ##: Int
Definition Classes
AnyRef → Any
final def ==(arg0: Any): Boolean
Definition Classes
AnyRef → Any
val analyzedPlan: LogicalPlan
Definition Classes
StreamExecution
def areWritesPendingOrInProgress(): Boolean
final def asInstanceOf[T0]: T0
Definition Classes
Any
def asyncLogPurgeShutdown(): Unit
Attributes
protected
Definition Classes
AsyncLogPurge
val asyncProgressTrackingCheckpointingIntervalMs: Long
Attributes
protected
val asyncWritesExecutorService: ThreadPoolExecutor
Attributes
protected
var availableOffsets: StreamProgress
Tracks the offsets that are available to be processed, but have not yet be committed to the sink.
Tracks the offsets that are available to be processed, but have not yet be committed to the sink. Only the scheduler thread should modify this field, and only in atomic steps. Other threads should make a shallow copy if they are going to access this field more than once, since the field's value may change at any time.
Definition Classes
StreamExecution
def awaitInitialization(timeoutMs: Long): Unit
Await until all fields of the query have been initialized.
Await until all fields of the query have been initialized.
Definition Classes
StreamExecution
val awaitProgressLock: ReentrantLock
A lock used to wait/notify when batches complete.
A lock used to wait/notify when batches complete. Use a fair lock to avoid thread starvation.
Attributes
protected
Definition Classes
StreamExecution
val awaitProgressLockCondition: Condition
Attributes
protected
Definition Classes
StreamExecution
def awaitTermination(timeoutMs: Long): Boolean
Waits for the termination of this query, either by query.stop() or by an exception.
Waits for the termination of this query, either by query.stop() or by an exception. If the query has terminated with an exception, then the exception will be thrown. Otherwise, it returns whether the query has terminated or not within the timeoutMs milliseconds.
If the query has terminated, then all subsequent calls to this method will either return true immediately (if the query was terminated by stop()), or throw the exception immediately (if the query has terminated with exception).
Definition Classes
StreamExecution → StreamingQuery
Since
2.0.0
Exceptions thrown
StreamingQueryException if the query has terminated with an exception
def awaitTermination(): Unit
Waits for the termination of this query, either by query.stop() or by an exception.
Waits for the termination of this query, either by query.stop() or by an exception. If the query has terminated with an exception, then the exception will be thrown.
If the query has terminated, then all subsequent calls to this method will either return immediately (if the query was terminated by stop()), or throw the exception immediately (if the query has terminated with exception).
Definition Classes
StreamExecution → StreamingQuery
Since
2.0.0
Exceptions thrown
StreamingQueryException if the query has terminated with an exception.
def checkpointFile(name: String): String
Returns the path of a file with name in the checkpoint directory.
Returns the path of a file with name in the checkpoint directory.
Attributes
protected
Definition Classes
StreamExecution
def cleanUpLastExecutedMicroBatch(): Unit
Definition Classes
AsyncProgressTrackingMicroBatchExecution → MicroBatchExecution
def cleanup(): Unit
Any clean up that needs to happen when the query is stopped or exits
Any clean up that needs to happen when the query is stopped or exits
Definition Classes
AsyncProgressTrackingMicroBatchExecution → MicroBatchExecution → StreamExecution
def clone(): AnyRef
Attributes
protected[lang]
Definition Classes
AnyRef
Annotations
@throws(classOf[java.lang.CloneNotSupportedException]) @native()
val commitLog: AsyncCommitLog
Definition Classes
AsyncProgressTrackingMicroBatchExecution → StreamExecution
def commitSources(offsetSeq: OffsetSeq): Unit
Attributes
protected
Definition Classes
MicroBatchExecution
var committedOffsets: StreamProgress
Tracks how much data we have processed and committed to the sink or state store from each input source.
Tracks how much data we have processed and committed to the sink or state store from each input source. Only the scheduler thread should modify this field, and only in atomic steps. Other threads should make a shallow copy if they are going to access this field more than once, since the field's value may change at any time.
Definition Classes
StreamExecution
def createWrite(table: SupportsWrite, options: Map[String, String], inputPlan: LogicalPlan): Write
Attributes
protected
Definition Classes
StreamExecution
var currentBatchId: Long
The current batchId or -1 if execution has not yet been initialized.
The current batchId or -1 if execution has not yet been initialized.
Attributes
protected
Definition Classes
StreamExecution → ProgressReporter
val currentStatus: StreamingQueryStatus
Attributes
protected
Definition Classes
ProgressReporter
Annotations
@volatile()
final def eq(arg0: AnyRef): Boolean
Definition Classes
AnyRef
def equals(arg0: AnyRef): Boolean
Definition Classes
AnyRef → Any
val errorNotifier: ErrorNotifier
Attributes
protected[sql]
Definition Classes
MicroBatchExecution → AsyncLogPurge
def exception: Option[StreamingQueryException]
Returns the StreamingQueryException if the query was terminated by an exception.
Returns the StreamingQueryException if the query was terminated by an exception.
Definition Classes
StreamExecution → StreamingQuery
def explain(): Unit
Prints the physical plan to the console for debugging purposes.
Prints the physical plan to the console for debugging purposes.
Definition Classes
StreamExecution → StreamingQuery
Since
2.0.0
def explain(extended: Boolean): Unit
Prints the physical plan to the console for debugging purposes.
Prints the physical plan to the console for debugging purposes.
extended
whether to do extended explain or not
Definition Classes
StreamExecution → StreamingQuery
Since
2.0.0
def explainInternal(extended: Boolean): String
Expose for tests
Expose for tests
Definition Classes
StreamExecution
def finalize(): Unit
Attributes
protected[lang]
Definition Classes
AnyRef
Annotations
@throws(classOf[java.lang.Throwable])
def finishTrigger(hasNewData: Boolean, hasExecuted: Boolean): Unit
Finalizes the query progress and adds it to list of recent status updates.
Finalizes the query progress and adds it to list of recent status updates.
hasNewData
Whether the sources of this stream had new data for this trigger.
hasExecuted
Whether any batch was executed during this trigger. Streaming queries that perform stateful aggregations with timeouts can still run batches even though the sources don't have any new data.
Attributes
protected
Definition Classes
ProgressReporter
def formatTimestamp(millis: Long): String
Attributes
protected
Definition Classes
ProgressReporter
def getBatchDescriptionString: String
Attributes
protected
Definition Classes
StreamExecution
final def getClass(): Class[_ <: AnyRef]
Definition Classes
AnyRef → Any
Annotations
@native()
def hashCode(): Int
Definition Classes
AnyRef → Any
Annotations
@native()
val id: UUID
Returns the unique id of this query that persists across restarts from checkpoint data.
Returns the unique id of this query that persists across restarts from checkpoint data. That is, this id is generated when a query is started for the first time, and will be the same every time it is restarted from checkpoint data. Also see runId.
Definition Classes
StreamExecution → ProgressReporter → StreamingQuery
Since
2.1.0
def initializeLogIfNecessary(isInterpreter: Boolean, silent: Boolean): Boolean
Attributes
protected
Definition Classes
Logging
def initializeLogIfNecessary(isInterpreter: Boolean): Unit
Attributes
protected
Definition Classes
Logging
def interruptAndAwaitExecutionThreadTermination(): Unit
Interrupts the query execution thread and awaits its termination until until it exceeds the timeout.
Interrupts the query execution thread and awaits its termination until until it exceeds the timeout. The timeout can be set on "spark.sql.streaming.stopTimeout".
Attributes
protected
Definition Classes
StreamExecution
Annotations
@throws(scala.this.throws.<init>$default$1[java.util.concurrent.TimeoutException])
Exceptions thrown
TimeoutException If the thread cannot be stopped within the timeout
def isActive: Boolean
Whether the query is currently active or not
Whether the query is currently active or not
Definition Classes
StreamExecution → StreamingQuery
final def isInstanceOf[T0]: Boolean
Definition Classes
Any
def isTraceEnabled(): Boolean
Attributes
protected
Definition Classes
Logging
var lastExecution: IncrementalExecution
Definition Classes
StreamExecution → ProgressReporter
def lastProgress: StreamingQueryProgress
Returns the most recent query progress update or null if there were no progress updates.
Returns the most recent query progress update or null if there were no progress updates.
Definition Classes
ProgressReporter
var latestOffsets: StreamProgress
Tracks the latest offsets for each input source.
Tracks the latest offsets for each input source. Only the scheduler thread should modify this field, and only in atomic steps. Other threads should make a shallow copy if they are going to access this field more than once, since the field's value may change at any time.
Definition Classes
StreamExecution
def log: Logger
Attributes
protected
Definition Classes
Logging
def logDebug(msg: => String, throwable: Throwable): Unit
Attributes
protected
Definition Classes
Logging
def logDebug(msg: => String): Unit
Attributes
protected
Definition Classes
Logging
def logError(msg: => String, throwable: Throwable): Unit
Attributes
protected
Definition Classes
Logging
def logError(msg: => String): Unit
Attributes
protected
Definition Classes
Logging
def logInfo(msg: => String, throwable: Throwable): Unit
Attributes
protected
Definition Classes
Logging
def logInfo(msg: => String): Unit
Attributes
protected
Definition Classes
Logging
def logName: String
Attributes
protected
Definition Classes
Logging
def logTrace(msg: => String, throwable: Throwable): Unit
Attributes
protected
Definition Classes
Logging
def logTrace(msg: => String): Unit
Attributes
protected
Definition Classes
Logging
def logWarning(msg: => String, throwable: Throwable): Unit
Attributes
protected
Definition Classes
Logging
def logWarning(msg: => String): Unit
Attributes
protected
Definition Classes
Logging
lazy val logicalPlan: LogicalPlan
Definition Classes
MicroBatchExecution → StreamExecution → ProgressReporter
def markMicroBatchEnd(): Unit
Called after the microbatch has completed execution.
Called after the microbatch has completed execution. It takes care of committing the offset to commit log and other bookkeeping.
Definition Classes
AsyncProgressTrackingMicroBatchExecution → MicroBatchExecution
def markMicroBatchExecutionStart(): Unit
Method called once after the planning is done and before the start of the microbatch execution.
Method called once after the planning is done and before the start of the microbatch execution. It can be used to perform any pre-execution tasks.
Definition Classes
AsyncProgressTrackingMicroBatchExecution → MicroBatchExecution
def markMicroBatchStart(): Unit
Should not call super method as we need to do something completely different in this method for async progress tracking
Should not call super method as we need to do something completely different in this method for async progress tracking
Definition Classes
AsyncProgressTrackingMicroBatchExecution → MicroBatchExecution
val minLogEntriesToMaintain: Int
Attributes
protected
Definition Classes
StreamExecution
val name: String
Returns the user-specified name of the query, or null if not specified.
Returns the user-specified name of the query, or null if not specified. This name can be specified in the org.apache.spark.sql.streaming.DataStreamWriter as dataframe.writeStream.queryName("query").start(). This name, if set, must be unique across all active queries.
Definition Classes
StreamExecution → ProgressReporter → StreamingQuery
Since
2.0.0
final def ne(arg0: AnyRef): Boolean
Definition Classes
AnyRef
var newData: Map[SparkDataStream, LogicalPlan]
Holds the most recent input data for each source.
Holds the most recent input data for each source.
Attributes
protected
Definition Classes
StreamExecution → ProgressReporter
var noNewData: Boolean
A flag to indicate that a batch has completed with no new data available.
A flag to indicate that a batch has completed with no new data available.
Attributes
protected
Definition Classes
StreamExecution
final def notify(): Unit
Definition Classes
AnyRef
Annotations
@native()
final def notifyAll(): Unit
Definition Classes
AnyRef
Annotations
@native()
val offsetLog: AsyncOffsetSeqLog
Definition Classes
AsyncProgressTrackingMicroBatchExecution → StreamExecution
var offsetSeqMetadata: OffsetSeqMetadata
Metadata associated with the offset seq of a batch in the query.
Metadata associated with the offset seq of a batch in the query.
Attributes
protected
Definition Classes
StreamExecution → ProgressReporter
val outputMode: OutputMode
Definition Classes
StreamExecution
val pollingDelayMs: Long
Attributes
protected
Definition Classes
StreamExecution
def postEvent(event: Event): Unit
Attributes
protected
Definition Classes
StreamExecution → ProgressReporter
val prettyIdString: String
Pretty identified string of printing in logs.
Pretty identified string of printing in logs. Format is If name is set "queryName [id = xyz, runId = abc]" else "[id = xyz, runId = abc]"
Attributes
protected
Definition Classes
StreamExecution
def processAllAvailable(): Unit
Blocks until all available data in the source has been processed and committed to the sink.
Blocks until all available data in the source has been processed and committed to the sink. This method is intended for testing. Note that in the case of continually arriving data, this method may block forever. Additionally, this method is only guaranteed to block until data that has been synchronously appended data to a org.apache.spark.sql.execution.streaming.Source prior to invocation. (i.e. getOffset must immediately reflect the addition).
Definition Classes
StreamExecution → StreamingQuery
Since
2.0.0
def purge(threshold: Long): Unit
Definition Classes
AsyncProgressTrackingMicroBatchExecution → AsyncLogPurge → StreamExecution
def purgeAsync(): Unit
Attributes
protected
Definition Classes
AsyncLogPurge
val queryExecutionThread: QueryExecutionThread
The thread that runs the micro-batches of this stream.
The thread that runs the micro-batches of this stream. Note that this thread must be org.apache.spark.util.UninterruptibleThread to workaround KAFKA-1894: interrupting a running KafkaConsumer may cause endless loop.
Definition Classes
StreamExecution
def recentProgress: Array[StreamingQueryProgress]
Returns an array containing the most recent query progress updates.
Returns an array containing the most recent query progress updates.
Definition Classes
ProgressReporter
def recordTriggerOffsets(from: StreamProgress, to: StreamProgress, latest: StreamProgress): Unit
Record the offsets range this trigger will process.
Record the offsets range this trigger will process. Call this before updating committedOffsets in StreamExecution to make sure that the correct range is recorded.
Attributes
protected
Definition Classes
ProgressReporter
def reportTimeTaken[T](triggerDetailKey: String)(body: => T): T
Records the duration of running body for the next query progress update.
Records the duration of running body for the next query progress update.
Attributes
protected
Definition Classes
ProgressReporter
val resolvedCheckpointRoot: String
Definition Classes
StreamExecution
def runActivatedStream(sparkSessionForStream: SparkSession): Unit
Repeatedly attempts to run batches as data arrives.
Repeatedly attempts to run batches as data arrives.
Attributes
protected
Definition Classes
MicroBatchExecution → StreamExecution
val runId: UUID
Returns the unique id of this run of the query.
Returns the unique id of this run of the query. That is, every start/restart of a query will generate a unique runId. Therefore, every time a query is restarted from checkpoint, it will have the same id but different runIds.
Definition Classes
StreamExecution → ProgressReporter → StreamingQuery
val sink: Table
Definition Classes
StreamExecution → ProgressReporter
var sinkCommitProgress: Option[StreamWriterCommitProgress]
Definition Classes
StreamExecution → ProgressReporter
var sources: Seq[SparkDataStream]
Attributes
protected
Definition Classes
MicroBatchExecution → ProgressReporter
val sparkSession: SparkSession
Returns the SparkSession associated with this.
Returns the SparkSession associated with this.
Definition Classes
StreamExecution → ProgressReporter → StreamingQuery
Since
2.0.0
def start(): Unit
Starts the execution.
Starts the execution. This returns only after the thread has started and QueryStartedEvent has been posted to all the listeners.
Definition Classes
StreamExecution
def startTrigger(): Unit
Begins recording statistics about query progress for a given trigger.
Begins recording statistics about query progress for a given trigger.
Attributes
protected
Definition Classes
MicroBatchExecution → ProgressReporter
val state: AtomicReference[State]
Defines the internal state of execution
Defines the internal state of execution
Attributes
protected
Definition Classes
StreamExecution
def status: StreamingQueryStatus
Returns the current status of the query.
Returns the current status of the query.
Definition Classes
ProgressReporter
def stop(): Unit
Signals to the thread executing micro-batches that it should stop running after the next batch.
Signals to the thread executing micro-batches that it should stop running after the next batch. This method blocks until the thread stops running.
Definition Classes
MicroBatchExecution → StreamingQuery
def stopSources(): Unit
Stops all streaming sources safely.
Stops all streaming sources safely.
Attributes
protected
Definition Classes
StreamExecution
var streamDeathCause: StreamingQueryException
Attributes
protected
Definition Classes
StreamExecution
val streamMetadata: StreamMetadata
Metadata associated with the whole query
Metadata associated with the whole query
Attributes
protected
Definition Classes
StreamExecution
lazy val streamMetrics: MetricsReporter
Used to report metrics to coda-hale.
Used to report metrics to coda-hale. This uses id for easier tracking across restarts.
Definition Classes
StreamExecution
final def synchronized[T0](arg0: => T0): T0
Definition Classes
AnyRef
def toString(): String
Definition Classes
StreamExecution → AnyRef → Any
val trigger: Trigger
Definition Classes
StreamExecution
val triggerClock: Clock
Definition Classes
StreamExecution → ProgressReporter
val triggerExecutor: TriggerExecutor
Definition Classes
AsyncProgressTrackingMicroBatchExecution → MicroBatchExecution
var uniqueSources: Map[SparkDataStream, ReadLimit]
A list of unique sources in the query plan.
A list of unique sources in the query plan. This will be set when generating logical plan.
Attributes
protected
Definition Classes
StreamExecution
def updateStatusMessage(message: String): Unit
Updates the message returned in status.
Updates the message returned in status.
Attributes
protected
Definition Classes
ProgressReporter
lazy val useAsyncPurge: Boolean
Attributes
protected
Definition Classes
AsyncLogPurge
def validateOffsetLogAndGetPrevOffset(latestBatchId: Long): Option[OffsetSeq]
Conduct sanity checks on the offset log to make sure it is correct and expected.
Conduct sanity checks on the offset log to make sure it is correct and expected. Also return the previous offset written to the offset log
latestBatchId
the batch id of the current micro batch
returns
A option that contains the offset of the previously written batch
Definition Classes
AsyncProgressTrackingMicroBatchExecution → MicroBatchExecution
final def wait(): Unit
Definition Classes
AnyRef
Annotations
@throws(classOf[java.lang.InterruptedException])
final def wait(arg0: Long, arg1: Int): Unit
Definition Classes
AnyRef
Annotations
@throws(classOf[java.lang.InterruptedException])
final def wait(arg0: Long): Unit
Definition Classes
AnyRef
Annotations
@throws(classOf[java.lang.InterruptedException]) @native()
val watermarkMsMap: Map[Int, Long]
A map of current watermarks, keyed by the position of the watermark operator in the physical plan.
A map of current watermarks, keyed by the position of the watermark operator in the physical plan.
This state is 'soft state', which does not affect the correctness and semantics of watermarks and is not persisted across query restarts. The fault-tolerant watermark state is in offsetSeqMetadata.
Attributes
protected
Definition Classes
StreamExecution
var watermarkTracker: WatermarkTracker
Attributes
protected
Definition Classes
MicroBatchExecution

Packages

AsyncProgressTrackingMicroBatchExecution

Companion object AsyncProgressTrackingMicroBatchExecution

class AsyncProgressTrackingMicroBatchExecution extends MicroBatchExecution

Instance Constructors

Type Members

Value Members

Inherited from MicroBatchExecution

Inherited from AsyncLogPurge

Inherited from StreamExecution

Inherited from ProgressReporter

Inherited from Logging

Inherited from StreamingQuery

Inherited from AnyRef

Inherited from Any

Ungrouped

Packages

AsyncProgressTrackingMicroBatchExecution

Companion object AsyncProgressTrackingMicroBatchExecution

class AsyncProgressTrackingMicroBatchExecution extends MicroBatchExecution

Instance Constructors

Type Members

Value Members

Inherited from MicroBatchExecution

Inherited from AsyncLogPurge

Inherited from StreamExecution

Inherited from ProgressReporter

Inherited from Logging

Inherited from StreamingQuery

Inherited from AnyRef

Inherited from Any

Ungrouped

AsyncProgressTrackingMicroBatchExecution