package state
- Alphabetic
- By Inheritance
- state
- AnyRef
- Any
- Hide All
- Show All
- Public
- All
Type Members
- abstract class BaseStateStoreRDD[T, U] extends RDD[U]
-
class
InvalidUnsafeRowException extends RuntimeException
An exception thrown when an invalid UnsafeRow is detected in state store.
-
trait
ReadStateStore extends AnyRef
Base trait for a versioned key-value store which provides read operations.
Base trait for a versioned key-value store which provides read operations. Each instance of a
ReadStateStore
represents a specific version of state data, and such instances are created through a StateStoreProvider.abort
method will be called when the task is completed - please clean up the resources in the method. -
class
ReadStateStoreRDD[T, U] extends BaseStateStoreRDD[T, U]
An RDD that allows computations to be executed against ReadStateStores.
An RDD that allows computations to be executed against ReadStateStores. It uses the StateStoreCoordinator to get the locations of loaded state stores and use that as the preferred locations.
- class StateSchemaCompatibilityChecker extends Logging
- case class StateSchemaNotCompatible(message: String) extends Exception with Product with Serializable
-
trait
StateStore extends ReadStateStore
Base trait for a versioned key-value store which provides both read and write operations.
Base trait for a versioned key-value store which provides both read and write operations. Each instance of a
StateStore
represents a specific version of state data, and such instances are created through a StateStoreProvider.Unlike ReadStateStore,
abort
method may not be called if thecommit
method succeeds to commit the change. (hasCommitted
returnstrue
.) Otherwise,abort
method will be called. Implementation should deal with resource cleanup in both methods, and also need to guard with double resource cleanup. -
class
StateStoreConf extends Serializable
A class that contains configuration parameters for StateStores.
-
class
StateStoreCoordinatorRef extends AnyRef
Reference to a StateStoreCoordinator that can be used to coordinate instances of StateStores across all the executors, and get their locations for job scheduling.
-
trait
StateStoreCustomMetric extends AnyRef
Name and description of custom implementation-specific metrics that a state store may wish to expose.
- case class StateStoreCustomSizeMetric(name: String, desc: String) extends StateStoreCustomMetric with Product with Serializable
- case class StateStoreCustomSumMetric(name: String, desc: String) extends StateStoreCustomMetric with Product with Serializable
- case class StateStoreCustomTimingMetric(name: String, desc: String) extends StateStoreCustomMetric with Product with Serializable
-
case class
StateStoreId(checkpointRootLocation: String, operatorId: Long, partitionId: Int, storeName: String = StateStoreId.DEFAULT_STORE_NAME) extends Product with Serializable
Unique identifier for a bunch of keyed state data.
Unique identifier for a bunch of keyed state data.
- checkpointRootLocation
Root directory where all the state data of a query is stored
- operatorId
Unique id of a stateful operator
- partitionId
Index of the partition of an operators state data
- storeName
Optional, name of the store. Each partition can optionally use multiple state stores, but they have to be identified by distinct names.
-
case class
StateStoreMetrics(numKeys: Long, memoryUsedBytes: Long, customMetrics: Map[StateStoreCustomMetric, Long]) extends Product with Serializable
Metrics reported by a state store
Metrics reported by a state store
- numKeys
Number of keys in the state store
- memoryUsedBytes
Memory used by the state store
- customMetrics
Custom implementation-specific metrics The metrics reported through this must have the same
name
as those reported byStateStoreProvider.customMetrics
.
- implicit class StateStoreOps[T] extends AnyRef
-
trait
StateStoreProvider extends AnyRef
Trait representing a provider that provide StateStore instances representing versions of state data.
Trait representing a provider that provide StateStore instances representing versions of state data.
The life cycle of a provider and its provide stores are as follows.
- A StateStoreProvider is created in a executor for each unique StateStoreId when the first batch of a streaming query is executed on the executor. All subsequent batches reuse this provider instance until the query is stopped.
- Every batch of streaming data request a specific version of the state data by invoking
getStore(version)
which returns an instance of StateStore through which the required version of the data can be accessed. It is the responsible of the provider to populate this store with context information like the schema of keys and values, etc.- After the streaming query is stopped, the created provider instances are lazily disposed off.
-
case class
StateStoreProviderId(storeId: StateStoreId, queryRunId: UUID) extends Product with Serializable
Unique identifier for a provider, used to identify when providers can be reused.
Unique identifier for a provider, used to identify when providers can be reused. Note that
queryRunId
is used uniquely identify a provider, so that the same provider instance is not reused across query restarts. -
class
StateStoreRDD[T, U] extends BaseStateStoreRDD[T, U]
An RDD that allows computations to be executed against StateStores.
An RDD that allows computations to be executed against StateStores. It uses the StateStoreCoordinator to get the locations of loaded state stores and use that as the preferred locations.
-
sealed
trait
StreamingAggregationStateManager extends Serializable
Base trait for state manager purposed to be used from streaming aggregations.
- abstract class StreamingAggregationStateManagerBaseImpl extends StreamingAggregationStateManager
-
class
StreamingAggregationStateManagerImplV1 extends StreamingAggregationStateManagerBaseImpl
The implementation of StreamingAggregationStateManager for state version 1.
The implementation of StreamingAggregationStateManager for state version 1. In state version 1, the schema of key and value in state are follow:
- key: Same as key expressions. - value: Same as input row attributes. The schema of value contains key expressions as well.
-
class
StreamingAggregationStateManagerImplV2 extends StreamingAggregationStateManagerBaseImpl
The implementation of StreamingAggregationStateManager for state version 2.
The implementation of StreamingAggregationStateManager for state version 2. In state version 2, the schema of key and value in state are follow:
- key: Same as key expressions. - value: The diff between input row attributes and key expressions.
The schema of value is changed to optimize the memory/space usage in state, via removing duplicated columns in key-value pair. Hence key columns are excluded from the schema of value.
-
class
SymmetricHashJoinStateManager extends Logging
Helper class to manage state required by a single side of org.apache.spark.sql.execution.streaming.StreamingSymmetricHashJoinExec.
Helper class to manage state required by a single side of org.apache.spark.sql.execution.streaming.StreamingSymmetricHashJoinExec. The interface of this class is basically that of a multi-map: - Get: Returns an iterator of multiple values for given key - Append: Append a new value to the given key - Remove Data by predicate: Drop any state using a predicate condition on keys or values
-
class
UnsafeRowPair extends AnyRef
Mutable, and reusable class for representing a pair of UnsafeRows.
-
class
WrappedReadStateStore extends ReadStateStore
Wraps the instance of StateStore to make the instance read-only.
Value Members
- object FlatMapGroupsWithStateExecHelper
-
object
SchemaHelper
Helper classes for reading/writing state schema.
- object StateSchemaCompatibilityChecker
-
object
StateStore extends Logging
Companion object to StateStore that provides helper methods to create and retrieve stores by their unique ids.
Companion object to StateStore that provides helper methods to create and retrieve stores by their unique ids. In addition, when a SparkContext is active (i.e. SparkEnv.get is not null), it also runs a periodic background task to do maintenance on the loaded stores. For each store, it uses the StateStoreCoordinator to ensure whether the current loaded instance of the store is the active instance. Accordingly, it either keeps it loaded and performs maintenance, or unloads the store.
- object StateStoreConf extends Serializable
-
object
StateStoreCoordinatorRef extends Logging
Helper object used to create reference to StateStoreCoordinator.
- object StateStoreId extends Serializable
- object StateStoreMetrics extends Serializable
- object StateStoreProvider
- object StateStoreProviderId extends Serializable
- object StreamingAggregationStateManager extends Logging with Serializable
- object SymmetricHashJoinStateManager