org.apache.spark.sql.execution.streaming
MicroBatchExecution
Companion object MicroBatchExecution
class MicroBatchExecution extends StreamExecution
- Alphabetic
- By Inheritance
- MicroBatchExecution
- StreamExecution
- ProgressReporter
- Logging
- StreamingQuery
- AnyRef
- Any
- Hide All
- Show All
- Public
- All
Instance Constructors
- new MicroBatchExecution(sparkSession: SparkSession, name: String, checkpointRoot: String, analyzedPlan: LogicalPlan, sink: Table, trigger: Trigger, triggerClock: Clock, outputMode: OutputMode, extraOptions: Map[String, String], deleteCheckpointOnStop: Boolean)
Type Members
-
case class
ExecutionStats(inputRows: Map[SparkDataStream, Long], stateOperators: Seq[StateOperatorProgress], eventTimeStats: Map[String, String]) extends Product with Serializable
- Definition Classes
- ProgressReporter
Value Members
-
final
def
!=(arg0: Any): Boolean
- Definition Classes
- AnyRef → Any
-
final
def
##(): Int
- Definition Classes
- AnyRef → Any
-
final
def
==(arg0: Any): Boolean
- Definition Classes
- AnyRef → Any
-
final
def
asInstanceOf[T0]: T0
- Definition Classes
- Any
-
var
availableOffsets: StreamProgress
Tracks the offsets that are available to be processed, but have not yet be committed to the sink.
Tracks the offsets that are available to be processed, but have not yet be committed to the sink. Only the scheduler thread should modify this field, and only in atomic steps. Other threads should make a shallow copy if they are going to access this field more than once, since the field's value may change at any time.
- Definition Classes
- StreamExecution
-
def
awaitInitialization(timeoutMs: Long): Unit
Await until all fields of the query have been initialized.
Await until all fields of the query have been initialized.
- Definition Classes
- StreamExecution
-
val
awaitProgressLock: ReentrantLock
A lock used to wait/notify when batches complete.
A lock used to wait/notify when batches complete. Use a fair lock to avoid thread starvation.
- Attributes
- protected
- Definition Classes
- StreamExecution
-
val
awaitProgressLockCondition: Condition
- Attributes
- protected
- Definition Classes
- StreamExecution
-
def
awaitTermination(timeoutMs: Long): Boolean
Waits for the termination of
this
query, either byquery.stop()
or by an exception.Waits for the termination of
this
query, either byquery.stop()
or by an exception. If the query has terminated with an exception, then the exception will be thrown. Otherwise, it returns whether the query has terminated or not within thetimeoutMs
milliseconds.If the query has terminated, then all subsequent calls to this method will either return
true
immediately (if the query was terminated bystop()
), or throw the exception immediately (if the query has terminated with exception).- Definition Classes
- StreamExecution → StreamingQuery
- Since
2.0.0
- Exceptions thrown
StreamingQueryException
if the query has terminated with an exception
-
def
awaitTermination(): Unit
Waits for the termination of
this
query, either byquery.stop()
or by an exception.Waits for the termination of
this
query, either byquery.stop()
or by an exception. If the query has terminated with an exception, then the exception will be thrown.If the query has terminated, then all subsequent calls to this method will either return immediately (if the query was terminated by
stop()
), or throw the exception immediately (if the query has terminated with exception).- Definition Classes
- StreamExecution → StreamingQuery
- Since
2.0.0
- Exceptions thrown
StreamingQueryException
if the query has terminated with an exception.
-
def
checkpointFile(name: String): String
Returns the path of a file with
name
in the checkpoint directory.Returns the path of a file with
name
in the checkpoint directory.- Attributes
- protected
- Definition Classes
- StreamExecution
-
def
clone(): AnyRef
- Attributes
- protected[lang]
- Definition Classes
- AnyRef
- Annotations
- @throws( ... ) @native()
-
val
commitLog: CommitLog
A log that records the batch ids that have completed.
A log that records the batch ids that have completed. This is used to check if a batch was fully processed, and its output was committed to the sink, hence no need to process it again. This is used (for instance) during restart, to help identify which batch to run next.
- Definition Classes
- StreamExecution
-
var
committedOffsets: StreamProgress
Tracks how much data we have processed and committed to the sink or state store from each input source.
Tracks how much data we have processed and committed to the sink or state store from each input source. Only the scheduler thread should modify this field, and only in atomic steps. Other threads should make a shallow copy if they are going to access this field more than once, since the field's value may change at any time.
- Definition Classes
- StreamExecution
-
def
createStreamingWrite(table: SupportsWrite, options: Map[String, String], inputPlan: LogicalPlan): StreamingWrite
- Attributes
- protected
- Definition Classes
- StreamExecution
-
var
currentBatchId: Long
The current batchId or -1 if execution has not yet been initialized.
The current batchId or -1 if execution has not yet been initialized.
- Attributes
- protected
- Definition Classes
- StreamExecution → ProgressReporter
-
val
currentStatus: StreamingQueryStatus
- Attributes
- protected
- Definition Classes
- ProgressReporter
- Annotations
- @volatile()
-
final
def
eq(arg0: AnyRef): Boolean
- Definition Classes
- AnyRef
-
def
equals(arg0: Any): Boolean
- Definition Classes
- AnyRef → Any
-
def
exception: Option[StreamingQueryException]
Returns the StreamingQueryException if the query was terminated by an exception.
Returns the StreamingQueryException if the query was terminated by an exception.
- Definition Classes
- StreamExecution → StreamingQuery
-
def
explain(): Unit
Prints the physical plan to the console for debugging purposes.
Prints the physical plan to the console for debugging purposes.
- Definition Classes
- StreamExecution → StreamingQuery
- Since
2.0.0
-
def
explain(extended: Boolean): Unit
Prints the physical plan to the console for debugging purposes.
Prints the physical plan to the console for debugging purposes.
- extended
whether to do extended explain or not
- Definition Classes
- StreamExecution → StreamingQuery
- Since
2.0.0
-
def
explainInternal(extended: Boolean): String
Expose for tests
Expose for tests
- Definition Classes
- StreamExecution
-
def
finalize(): Unit
- Attributes
- protected[lang]
- Definition Classes
- AnyRef
- Annotations
- @throws( classOf[java.lang.Throwable] )
-
def
finishTrigger(hasNewData: Boolean, hasExecuted: Boolean): Unit
Finalizes the query progress and adds it to list of recent status updates.
Finalizes the query progress and adds it to list of recent status updates.
- hasNewData
Whether the sources of this stream had new data for this trigger.
- hasExecuted
Whether any batch was executed during this trigger. Streaming queries that perform stateful aggregations with timeouts can still run batches even though the sources don't have any new data.
- Attributes
- protected
- Definition Classes
- ProgressReporter
-
def
formatTimestamp(millis: Long): String
- Attributes
- protected
- Definition Classes
- ProgressReporter
-
def
getBatchDescriptionString: String
- Attributes
- protected
- Definition Classes
- StreamExecution
-
final
def
getClass(): Class[_]
- Definition Classes
- AnyRef → Any
- Annotations
- @native()
-
def
hashCode(): Int
- Definition Classes
- AnyRef → Any
- Annotations
- @native()
-
val
id: UUID
Returns the unique id of this query that persists across restarts from checkpoint data.
Returns the unique id of this query that persists across restarts from checkpoint data. That is, this id is generated when a query is started for the first time, and will be the same every time it is restarted from checkpoint data. Also see runId.
- Definition Classes
- StreamExecution → ProgressReporter → StreamingQuery
- Since
2.1.0
-
def
initializeLogIfNecessary(isInterpreter: Boolean, silent: Boolean): Boolean
- Attributes
- protected
- Definition Classes
- Logging
-
def
initializeLogIfNecessary(isInterpreter: Boolean): Unit
- Attributes
- protected
- Definition Classes
- Logging
-
def
interruptAndAwaitExecutionThreadTermination(): Unit
Interrupts the query execution thread and awaits its termination until until it exceeds the timeout.
Interrupts the query execution thread and awaits its termination until until it exceeds the timeout. The timeout can be set on "spark.sql.streaming.stopTimeout".
- Attributes
- protected
- Definition Classes
- StreamExecution
- Annotations
- @throws( ... )
- Exceptions thrown
TimeoutException
If the thread cannot be stopped within the timeout
-
def
isActive: Boolean
Whether the query is currently active or not
Whether the query is currently active or not
- Definition Classes
- StreamExecution → StreamingQuery
-
final
def
isInstanceOf[T0]: Boolean
- Definition Classes
- Any
-
def
isTraceEnabled(): Boolean
- Attributes
- protected
- Definition Classes
- Logging
-
var
lastExecution: IncrementalExecution
- Definition Classes
- StreamExecution → ProgressReporter
-
def
lastProgress: StreamingQueryProgress
Returns the most recent query progress update or null if there were no progress updates.
Returns the most recent query progress update or null if there were no progress updates.
- Definition Classes
- ProgressReporter
-
def
log: Logger
- Attributes
- protected
- Definition Classes
- Logging
-
def
logDebug(msg: ⇒ String, throwable: Throwable): Unit
- Attributes
- protected
- Definition Classes
- Logging
-
def
logDebug(msg: ⇒ String): Unit
- Attributes
- protected
- Definition Classes
- Logging
-
def
logError(msg: ⇒ String, throwable: Throwable): Unit
- Attributes
- protected
- Definition Classes
- Logging
-
def
logError(msg: ⇒ String): Unit
- Attributes
- protected
- Definition Classes
- Logging
-
def
logInfo(msg: ⇒ String, throwable: Throwable): Unit
- Attributes
- protected
- Definition Classes
- Logging
-
def
logInfo(msg: ⇒ String): Unit
- Attributes
- protected
- Definition Classes
- Logging
-
def
logName: String
- Attributes
- protected
- Definition Classes
- Logging
-
def
logTrace(msg: ⇒ String, throwable: Throwable): Unit
- Attributes
- protected
- Definition Classes
- Logging
-
def
logTrace(msg: ⇒ String): Unit
- Attributes
- protected
- Definition Classes
- Logging
-
def
logWarning(msg: ⇒ String, throwable: Throwable): Unit
- Attributes
- protected
- Definition Classes
- Logging
-
def
logWarning(msg: ⇒ String): Unit
- Attributes
- protected
- Definition Classes
- Logging
-
lazy val
logicalPlan: LogicalPlan
- Definition Classes
- MicroBatchExecution → StreamExecution → ProgressReporter
-
val
minLogEntriesToMaintain: Int
- Attributes
- protected
- Definition Classes
- StreamExecution
-
val
name: String
Returns the user-specified name of the query, or null if not specified.
Returns the user-specified name of the query, or null if not specified. This name can be specified in the
org.apache.spark.sql.streaming.DataStreamWriter
asdataframe.writeStream.queryName("query").start()
. This name, if set, must be unique across all active queries.- Definition Classes
- StreamExecution → ProgressReporter → StreamingQuery
- Since
2.0.0
-
final
def
ne(arg0: AnyRef): Boolean
- Definition Classes
- AnyRef
-
var
newData: Map[SparkDataStream, LogicalPlan]
Holds the most recent input data for each source.
Holds the most recent input data for each source.
- Attributes
- protected
- Definition Classes
- StreamExecution → ProgressReporter
-
var
noNewData: Boolean
A flag to indicate that a batch has completed with no new data available.
A flag to indicate that a batch has completed with no new data available.
- Attributes
- protected
- Definition Classes
- StreamExecution
-
final
def
notify(): Unit
- Definition Classes
- AnyRef
- Annotations
- @native()
-
final
def
notifyAll(): Unit
- Definition Classes
- AnyRef
- Annotations
- @native()
-
val
offsetLog: OffsetSeqLog
A write-ahead-log that records the offsets that are present in each batch.
A write-ahead-log that records the offsets that are present in each batch. In order to ensure that a given batch will always consist of the same data, we write to this log *before* any processing is done. Thus, the Nth record in this log indicated data that is currently being processed and the N-1th entry indicates which offsets have been durably committed to the sink.
- Definition Classes
- StreamExecution
-
var
offsetSeqMetadata: OffsetSeqMetadata
Metadata associated with the offset seq of a batch in the query.
Metadata associated with the offset seq of a batch in the query.
- Attributes
- protected
- Definition Classes
- StreamExecution → ProgressReporter
-
val
outputMode: OutputMode
- Definition Classes
- StreamExecution
-
val
pollingDelayMs: Long
- Attributes
- protected
- Definition Classes
- StreamExecution
-
def
postEvent(event: Event): Unit
- Attributes
- protected
- Definition Classes
- StreamExecution → ProgressReporter
-
val
prettyIdString: String
Pretty identified string of printing in logs.
Pretty identified string of printing in logs. Format is If name is set "queryName [id = xyz, runId = abc]" else "[id = xyz, runId = abc]"
- Attributes
- protected
- Definition Classes
- StreamExecution
-
def
processAllAvailable(): Unit
Blocks until all available data in the source has been processed and committed to the sink.
Blocks until all available data in the source has been processed and committed to the sink. This method is intended for testing. Note that in the case of continually arriving data, this method may block forever. Additionally, this method is only guaranteed to block until data that has been synchronously appended data to a
org.apache.spark.sql.execution.streaming.Source
prior to invocation. (i.e.getOffset
must immediately reflect the addition).- Definition Classes
- StreamExecution → StreamingQuery
- Since
2.0.0
-
def
purge(threshold: Long): Unit
- Attributes
- protected
- Definition Classes
- StreamExecution
-
val
queryExecutionThread: QueryExecutionThread
The thread that runs the micro-batches of this stream.
The thread that runs the micro-batches of this stream. Note that this thread must be org.apache.spark.util.UninterruptibleThread to workaround KAFKA-1894: interrupting a running
KafkaConsumer
may cause endless loop.- Definition Classes
- StreamExecution
-
def
recentProgress: Array[StreamingQueryProgress]
Returns an array containing the most recent query progress updates.
Returns an array containing the most recent query progress updates.
- Definition Classes
- ProgressReporter
-
def
recordTriggerOffsets(from: StreamProgress, to: StreamProgress): Unit
Record the offsets range this trigger will process.
Record the offsets range this trigger will process. Call this before updating
committedOffsets
inStreamExecution
to make sure that the correct range is recorded.- Attributes
- protected
- Definition Classes
- ProgressReporter
-
def
reportTimeTaken[T](triggerDetailKey: String)(body: ⇒ T): T
Records the duration of running
body
for the next query progress update.Records the duration of running
body
for the next query progress update.- Attributes
- protected
- Definition Classes
- ProgressReporter
-
val
resolvedCheckpointRoot: String
- Definition Classes
- StreamExecution
-
def
runActivatedStream(sparkSessionForStream: SparkSession): Unit
Repeatedly attempts to run batches as data arrives.
Repeatedly attempts to run batches as data arrives.
- Attributes
- protected
- Definition Classes
- MicroBatchExecution → StreamExecution
-
val
runId: UUID
Returns the unique id of this run of the query.
Returns the unique id of this run of the query. That is, every start/restart of a query will generated a unique runId. Therefore, every time a query is restarted from checkpoint, it will have the same id but different runIds.
- Definition Classes
- StreamExecution → ProgressReporter → StreamingQuery
-
val
sink: Table
- Definition Classes
- StreamExecution → ProgressReporter
-
var
sinkCommitProgress: Option[StreamWriterCommitProgress]
- Definition Classes
- StreamExecution → ProgressReporter
-
var
sources: Seq[SparkDataStream]
- Attributes
- protected
- Definition Classes
- MicroBatchExecution → ProgressReporter
-
val
sparkSession: SparkSession
Returns the
SparkSession
associated withthis
.Returns the
SparkSession
associated withthis
.- Definition Classes
- StreamExecution → ProgressReporter → StreamingQuery
- Since
2.0.0
-
def
start(): Unit
Starts the execution.
Starts the execution. This returns only after the thread has started and QueryStartedEvent has been posted to all the listeners.
- Definition Classes
- StreamExecution
-
def
startTrigger(): Unit
Begins recording statistics about query progress for a given trigger.
Begins recording statistics about query progress for a given trigger.
- Attributes
- protected
- Definition Classes
- MicroBatchExecution → ProgressReporter
-
val
state: AtomicReference[State]
Defines the internal state of execution
Defines the internal state of execution
- Attributes
- protected
- Definition Classes
- StreamExecution
-
def
status: StreamingQueryStatus
Returns the current status of the query.
Returns the current status of the query.
- Definition Classes
- ProgressReporter
-
def
stop(): Unit
Signals to the thread executing micro-batches that it should stop running after the next batch.
Signals to the thread executing micro-batches that it should stop running after the next batch. This method blocks until the thread stops running.
- Definition Classes
- MicroBatchExecution → StreamingQuery
-
def
stopSources(): Unit
Stops all streaming sources safely.
Stops all streaming sources safely.
- Attributes
- protected
- Definition Classes
- StreamExecution
-
var
streamDeathCause: StreamingQueryException
- Attributes
- protected
- Definition Classes
- StreamExecution
-
val
streamMetadata: StreamMetadata
Metadata associated with the whole query
Metadata associated with the whole query
- Attributes
- protected
- Definition Classes
- StreamExecution
-
lazy val
streamMetrics: MetricsReporter
Used to report metrics to coda-hale.
Used to report metrics to coda-hale. This uses id for easier tracking across restarts.
- Definition Classes
- StreamExecution
-
final
def
synchronized[T0](arg0: ⇒ T0): T0
- Definition Classes
- AnyRef
-
def
toString(): String
- Definition Classes
- StreamExecution → AnyRef → Any
-
val
trigger: Trigger
- Definition Classes
- StreamExecution
-
val
triggerClock: Clock
- Definition Classes
- StreamExecution → ProgressReporter
-
var
uniqueSources: Map[SparkDataStream, ReadLimit]
A list of unique sources in the query plan.
A list of unique sources in the query plan. This will be set when generating logical plan.
- Attributes
- protected
- Definition Classes
- StreamExecution
-
def
updateStatusMessage(message: String): Unit
Updates the message returned in
status
.Updates the message returned in
status
.- Attributes
- protected
- Definition Classes
- ProgressReporter
-
final
def
wait(): Unit
- Definition Classes
- AnyRef
- Annotations
- @throws( ... )
-
final
def
wait(arg0: Long, arg1: Int): Unit
- Definition Classes
- AnyRef
- Annotations
- @throws( ... )
-
final
def
wait(arg0: Long): Unit
- Definition Classes
- AnyRef
- Annotations
- @throws( ... ) @native()
-
val
watermarkMsMap: Map[Int, Long]
A map of current watermarks, keyed by the position of the watermark operator in the physical plan.
A map of current watermarks, keyed by the position of the watermark operator in the physical plan.
This state is 'soft state', which does not affect the correctness and semantics of watermarks and is not persisted across query restarts. The fault-tolerant watermark state is in offsetSeqMetadata.
- Attributes
- protected
- Definition Classes
- StreamExecution