Spark Project SQL 3.1.3 API - org.apache.spark.sql.execution.streaming.MicroBatchExecution

final def !=(arg0: Any): Boolean

Definition Classes: AnyRef → Any

final def ##(): Int

Definition Classes: AnyRef → Any

final def ==(arg0: Any): Boolean

Definition Classes: AnyRef → Any

final def asInstanceOf[T0]: T0

Definition Classes: Any

var availableOffsets: StreamProgress

Tracks the offsets that are available to be processed, but have not yet be committed to the sink.

Tracks the offsets that are available to be processed, but have not yet be committed to the sink. Only the scheduler thread should modify this field, and only in atomic steps. Other threads should make a shallow copy if they are going to access this field more than once, since the field's value may change at any time.

Definition Classes: StreamExecution

def awaitInitialization(timeoutMs: Long): Unit

Await until all fields of the query have been initialized.

Definition Classes: StreamExecution

val awaitProgressLock: ReentrantLock

A lock used to wait/notify when batches complete.

A lock used to wait/notify when batches complete. Use a fair lock to avoid thread starvation.

Attributes: protected
Definition Classes: StreamExecution

val awaitProgressLockCondition: Condition

Attributes: protected
Definition Classes: StreamExecution

def awaitTermination(timeoutMs: Long): Boolean

Waits for the termination of this query, either by query.stop() or by an exception.

Waits for the termination of this query, either by query.stop() or by an exception. If the query has terminated with an exception, then the exception will be thrown. Otherwise, it returns whether the query has terminated or not within the timeoutMs milliseconds.

If the query has terminated, then all subsequent calls to this method will either return true immediately (if the query was terminated by stop()), or throw the exception immediately (if the query has terminated with exception).

Definition Classes: StreamExecution → StreamingQuery
Since: 2.0.0
Exceptions thrown: StreamingQueryException if the query has terminated with an exception

def awaitTermination(): Unit

Waits for the termination of this query, either by query.stop() or by an exception.

Waits for the termination of this query, either by query.stop() or by an exception. If the query has terminated with an exception, then the exception will be thrown.

If the query has terminated, then all subsequent calls to this method will either return immediately (if the query was terminated by stop()), or throw the exception immediately (if the query has terminated with exception).

Definition Classes: StreamExecution → StreamingQuery
Since: 2.0.0
Exceptions thrown: StreamingQueryException if the query has terminated with an exception.

def checkpointFile(name: String): String

Returns the path of a file with name in the checkpoint directory.

Attributes: protected
Definition Classes: StreamExecution

def clone(): AnyRef

Attributes: protected[lang]
Definition Classes: AnyRef
Annotations: @throws( ... ) @native()

val commitLog: CommitLog

A log that records the batch ids that have completed.

A log that records the batch ids that have completed. This is used to check if a batch was fully processed, and its output was committed to the sink, hence no need to process it again. This is used (for instance) during restart, to help identify which batch to run next.

Definition Classes: StreamExecution

var committedOffsets: StreamProgress

Tracks how much data we have processed and committed to the sink or state store from each input source.

Tracks how much data we have processed and committed to the sink or state store from each input source. Only the scheduler thread should modify this field, and only in atomic steps. Other threads should make a shallow copy if they are going to access this field more than once, since the field's value may change at any time.

Definition Classes: StreamExecution

def createStreamingWrite(table: SupportsWrite, options: Map[String, String], inputPlan: LogicalPlan): StreamingWrite

Attributes: protected
Definition Classes: StreamExecution

var currentBatchId: Long

The current batchId or -1 if execution has not yet been initialized.

Attributes: protected
Definition Classes: StreamExecution → ProgressReporter

val currentStatus: StreamingQueryStatus

Attributes: protected
Definition Classes: ProgressReporter
Annotations: @volatile()

final def eq(arg0: AnyRef): Boolean

Definition Classes: AnyRef

def equals(arg0: Any): Boolean

Definition Classes: AnyRef → Any

def exception: Option[StreamingQueryException]

Returns the StreamingQueryException if the query was terminated by an exception.

Definition Classes: StreamExecution → StreamingQuery

def explain(): Unit

Prints the physical plan to the console for debugging purposes.

Definition Classes: StreamExecution → StreamingQuery
Since: 2.0.0

def explain(extended: Boolean): Unit

Prints the physical plan to the console for debugging purposes.

extended: whether to do extended explain or not

Definition Classes: StreamExecution → StreamingQuery
Since: 2.0.0

def explainInternal(extended: Boolean): String

Expose for tests

Definition Classes: StreamExecution

def finalize(): Unit

Attributes: protected[lang]
Definition Classes: AnyRef
Annotations: @throws( classOf[java.lang.Throwable] )

def finishTrigger(hasNewData: Boolean, hasExecuted: Boolean): Unit

Finalizes the query progress and adds it to list of recent status updates.

hasNewData: Whether the sources of this stream had new data for this trigger.
hasExecuted: Whether any batch was executed during this trigger. Streaming queries that perform stateful aggregations with timeouts can still run batches even though the sources don't have any new data.

Attributes: protected
Definition Classes: ProgressReporter

def formatTimestamp(millis: Long): String

Attributes: protected
Definition Classes: ProgressReporter

def getBatchDescriptionString: String

Attributes: protected
Definition Classes: StreamExecution

final def getClass(): Class[_]

Definition Classes: AnyRef → Any
Annotations: @native()

def hashCode(): Int

Definition Classes: AnyRef → Any
Annotations: @native()

val id: UUID

Returns the unique id of this query that persists across restarts from checkpoint data.

Returns the unique id of this query that persists across restarts from checkpoint data. That is, this id is generated when a query is started for the first time, and will be the same every time it is restarted from checkpoint data. Also see runId.

Definition Classes: StreamExecution → ProgressReporter → StreamingQuery
Since: 2.1.0

def initializeLogIfNecessary(isInterpreter: Boolean, silent: Boolean): Boolean

Attributes: protected
Definition Classes: Logging

def initializeLogIfNecessary(isInterpreter: Boolean): Unit

Attributes: protected
Definition Classes: Logging

def interruptAndAwaitExecutionThreadTermination(): Unit

Interrupts the query execution thread and awaits its termination until until it exceeds the timeout.

Interrupts the query execution thread and awaits its termination until until it exceeds the timeout. The timeout can be set on "spark.sql.streaming.stopTimeout".

Attributes: protected
Definition Classes: StreamExecution
Annotations: @throws( ... )
Exceptions thrown: TimeoutException If the thread cannot be stopped within the timeout

def isActive: Boolean

Whether the query is currently active or not

Definition Classes: StreamExecution → StreamingQuery

final def isInstanceOf[T0]: Boolean

Definition Classes: Any

def isTraceEnabled(): Boolean

Attributes: protected
Definition Classes: Logging

var lastExecution: IncrementalExecution

Definition Classes: StreamExecution → ProgressReporter

def lastProgress: StreamingQueryProgress

Returns the most recent query progress update or null if there were no progress updates.

Definition Classes: ProgressReporter

def log: Logger

Attributes: protected
Definition Classes: Logging

def logDebug(msg: ⇒ String, throwable: Throwable): Unit

Attributes: protected
Definition Classes: Logging

def logDebug(msg: ⇒ String): Unit

Attributes: protected
Definition Classes: Logging

def logError(msg: ⇒ String, throwable: Throwable): Unit

Attributes: protected
Definition Classes: Logging

def logError(msg: ⇒ String): Unit

Attributes: protected
Definition Classes: Logging

def logInfo(msg: ⇒ String, throwable: Throwable): Unit

Attributes: protected
Definition Classes: Logging

def logInfo(msg: ⇒ String): Unit

Attributes: protected
Definition Classes: Logging

def logName: String

Attributes: protected
Definition Classes: Logging

def logTrace(msg: ⇒ String, throwable: Throwable): Unit

Attributes: protected
Definition Classes: Logging

def logTrace(msg: ⇒ String): Unit

Attributes: protected
Definition Classes: Logging

def logWarning(msg: ⇒ String, throwable: Throwable): Unit

Attributes: protected
Definition Classes: Logging

def logWarning(msg: ⇒ String): Unit

Attributes: protected
Definition Classes: Logging

lazy val logicalPlan: LogicalPlan

Definition Classes: MicroBatchExecution → StreamExecution → ProgressReporter

val minLogEntriesToMaintain: Int

Attributes: protected
Definition Classes: StreamExecution

val name: String

Returns the user-specified name of the query, or null if not specified.

Returns the user-specified name of the query, or null if not specified. This name can be specified in the org.apache.spark.sql.streaming.DataStreamWriter as dataframe.writeStream.queryName("query").start(). This name, if set, must be unique across all active queries.

Definition Classes: StreamExecution → ProgressReporter → StreamingQuery
Since: 2.0.0

final def ne(arg0: AnyRef): Boolean

Definition Classes: AnyRef

var newData: Map[SparkDataStream, LogicalPlan]

Holds the most recent input data for each source.

Attributes: protected
Definition Classes: StreamExecution → ProgressReporter

var noNewData: Boolean

A flag to indicate that a batch has completed with no new data available.

Attributes: protected
Definition Classes: StreamExecution

final def notify(): Unit

Definition Classes: AnyRef
Annotations: @native()

final def notifyAll(): Unit

Definition Classes: AnyRef
Annotations: @native()

val offsetLog: OffsetSeqLog

A write-ahead-log that records the offsets that are present in each batch.

A write-ahead-log that records the offsets that are present in each batch. In order to ensure that a given batch will always consist of the same data, we write to this log *before* any processing is done. Thus, the Nth record in this log indicated data that is currently being processed and the N-1th entry indicates which offsets have been durably committed to the sink.

Definition Classes: StreamExecution

var offsetSeqMetadata: OffsetSeqMetadata

Metadata associated with the offset seq of a batch in the query.

Attributes: protected
Definition Classes: StreamExecution → ProgressReporter

val outputMode: OutputMode

Definition Classes: StreamExecution

val pollingDelayMs: Long

Attributes: protected
Definition Classes: StreamExecution

def postEvent(event: Event): Unit

Attributes: protected
Definition Classes: StreamExecution → ProgressReporter

val prettyIdString: String

Pretty identified string of printing in logs.

Pretty identified string of printing in logs. Format is If name is set "queryName [id = xyz, runId = abc]" else "[id = xyz, runId = abc]"

Attributes: protected
Definition Classes: StreamExecution

def processAllAvailable(): Unit

Blocks until all available data in the source has been processed and committed to the sink.

Blocks until all available data in the source has been processed and committed to the sink. This method is intended for testing. Note that in the case of continually arriving data, this method may block forever. Additionally, this method is only guaranteed to block until data that has been synchronously appended data to a org.apache.spark.sql.execution.streaming.Source prior to invocation. (i.e. getOffset must immediately reflect the addition).

Definition Classes: StreamExecution → StreamingQuery
Since: 2.0.0

def purge(threshold: Long): Unit

Attributes: protected
Definition Classes: StreamExecution

val queryExecutionThread: QueryExecutionThread

The thread that runs the micro-batches of this stream.

The thread that runs the micro-batches of this stream. Note that this thread must be org.apache.spark.util.UninterruptibleThread to workaround KAFKA-1894: interrupting a running KafkaConsumer may cause endless loop.

Definition Classes: StreamExecution

def recentProgress: Array[StreamingQueryProgress]

Returns an array containing the most recent query progress updates.

Definition Classes: ProgressReporter

def recordTriggerOffsets(from: StreamProgress, to: StreamProgress): Unit

Record the offsets range this trigger will process.

Record the offsets range this trigger will process. Call this before updating committedOffsets in StreamExecution to make sure that the correct range is recorded.

Attributes: protected
Definition Classes: ProgressReporter

def reportTimeTaken[T](triggerDetailKey: String)(body: ⇒ T): T

Records the duration of running body for the next query progress update.

Attributes: protected
Definition Classes: ProgressReporter

val resolvedCheckpointRoot: String

Definition Classes: StreamExecution

def runActivatedStream(sparkSessionForStream: SparkSession): Unit

Repeatedly attempts to run batches as data arrives.

Attributes: protected
Definition Classes: MicroBatchExecution → StreamExecution

val runId: UUID

Returns the unique id of this run of the query.

Returns the unique id of this run of the query. That is, every start/restart of a query will generated a unique runId. Therefore, every time a query is restarted from checkpoint, it will have the same id but different runIds.

Definition Classes: StreamExecution → ProgressReporter → StreamingQuery

val sink: Table

Definition Classes: StreamExecution → ProgressReporter

var sinkCommitProgress: Option[StreamWriterCommitProgress]

Definition Classes: StreamExecution → ProgressReporter

var sources: Seq[SparkDataStream]

Attributes: protected
Definition Classes: MicroBatchExecution → ProgressReporter

val sparkSession: SparkSession

Returns the SparkSession associated with this.

Definition Classes: StreamExecution → ProgressReporter → StreamingQuery
Since: 2.0.0

def start(): Unit

Starts the execution.

Starts the execution. This returns only after the thread has started and QueryStartedEvent has been posted to all the listeners.

Definition Classes: StreamExecution

def startTrigger(): Unit

Begins recording statistics about query progress for a given trigger.

Attributes: protected
Definition Classes: MicroBatchExecution → ProgressReporter

val state: AtomicReference[State]

Defines the internal state of execution

Attributes: protected
Definition Classes: StreamExecution

def status: StreamingQueryStatus

Returns the current status of the query.

Definition Classes: ProgressReporter

def stop(): Unit

Signals to the thread executing micro-batches that it should stop running after the next batch.

Signals to the thread executing micro-batches that it should stop running after the next batch. This method blocks until the thread stops running.

Definition Classes: MicroBatchExecution → StreamingQuery

def stopSources(): Unit

Stops all streaming sources safely.

Attributes: protected
Definition Classes: StreamExecution

var streamDeathCause: StreamingQueryException

Attributes: protected
Definition Classes: StreamExecution

val streamMetadata: StreamMetadata

Metadata associated with the whole query

Attributes: protected
Definition Classes: StreamExecution

lazy val streamMetrics: MetricsReporter

Used to report metrics to coda-hale.

Used to report metrics to coda-hale. This uses id for easier tracking across restarts.

Definition Classes: StreamExecution

final def synchronized[T0](arg0: ⇒ T0): T0

Definition Classes: AnyRef

def toString(): String

Definition Classes: StreamExecution → AnyRef → Any

val trigger: Trigger

Definition Classes: StreamExecution

val triggerClock: Clock

Definition Classes: StreamExecution → ProgressReporter

var uniqueSources: Map[SparkDataStream, ReadLimit]

A list of unique sources in the query plan.

A list of unique sources in the query plan. This will be set when generating logical plan.

Attributes: protected
Definition Classes: StreamExecution

def updateStatusMessage(message: String): Unit

Updates the message returned in status.

Attributes: protected
Definition Classes: ProgressReporter

final def wait(): Unit

Definition Classes: AnyRef
Annotations: @throws( ... )

final def wait(arg0: Long, arg1: Int): Unit

Definition Classes: AnyRef
Annotations: @throws( ... )

final def wait(arg0: Long): Unit

Definition Classes: AnyRef
Annotations: @throws( ... ) @native()

val watermarkMsMap: Map[Int, Long]

A map of current watermarks, keyed by the position of the watermark operator in the physical plan.

This state is 'soft state', which does not affect the correctness and semantics of watermarks and is not persisted across query restarts. The fault-tolerant watermark state is in offsetSeqMetadata.

Attributes: protected
Definition Classes: StreamExecution

Packages

MicroBatchExecution

Companion object MicroBatchExecution

class MicroBatchExecution extends StreamExecution

Instance Constructors

Type Members

Value Members

Inherited from StreamExecution

Inherited from ProgressReporter

Inherited from Logging

Inherited from StreamingQuery

Inherited from AnyRef

Inherited from Any

Ungrouped