org.apache.spark.sql.execution.streaming

StreamingSymmetricHashJoinExec

case class StreamingSymmetricHashJoinExec(leftKeys: Seq[Expression], rightKeys: Seq[Expression], joinType: JoinType, condition: JoinConditionSplitPredicates, stateInfo: Option[StatefulOperatorStateInfo], eventTimeWatermarkForLateEvents: Option[Long], eventTimeWatermarkForEviction: Option[Long], stateWatermarkPredicates: JoinStateWatermarkPredicates, stateFormatVersion: Int, left: SparkPlan, right: SparkPlan) extends SparkPlan with BinaryExecNode with StateStoreWriter with Product with Serializable

Performs stream-stream join using symmetric hash join algorithm. It works as follows.

Each join side buffers past input rows as streaming state so that the past input can be joined with future input on the other side. This buffer state is effectively a multi-map: equi-join key -> list of past input rows received with the join key

For each input row in each side, the following operations take place. - Calculate join key from the row. - Use the join key to append the row to the buffer state of the side that the row came from. - Find past buffered values for the key from the other side. For each such value, emit the "joined row" (left-row, right-row) - Apply the optional condition to filter the joined rows as the final output.

If a timestamp column with event time watermark is present in the join keys or in the input data, then it uses the watermark to figure out which rows in the buffer will not join with the new data, and therefore can be discarded. Depending on the provided query conditions, we can define thresholds on both state key (i.e. joining keys) and state value (i.e. input rows). There are three kinds of queries possible regarding this as explained below. Assume that watermark has been defined on both leftTime and rightTime columns used below.

1. When timestamp/time-window + watermark is in the join keys. Example (pseudo-SQL):

SELECT * FROM leftTable, rightTable ON leftKey = rightKey AND window(leftTime, "1 hour") = window(rightTime, "1 hour") // 1hr tumbling windows

In this case, this operator will join rows newer than watermark which fall in the same 1 hour window. Say the event-time watermark is "12:34" (both left and right input). Then input rows can only have time > 12:34. Hence, they can only join with buffered rows where window >= 12:00 - 1:00 and all buffered rows with join window < 12:00 can be discarded. In other words, the operator will discard all state where window in state key (i.e. join key) < event time watermark. This threshold is called State Key Watermark.

2. When timestamp range conditions are provided (no time/window + watermark in join keys). E.g.

SELECT * FROM leftTable, rightTable ON leftKey = rightKey AND leftTime > rightTime - INTERVAL 8 MINUTES AND leftTime < rightTime + INTERVAL 1 HOUR

In this case, the event-time watermark and the BETWEEN condition can be used to calculate a state watermark, i.e., time threshold for the state rows that can be discarded. For example, say each join side has a time column, named "leftTime" and "rightTime", and there is a join condition "leftTime > rightTime - 8 min". While processing, say the watermark on right input is "12:34". This means that from henceforth, only right inputs rows with "rightTime > 12:34" will be processed, and any older rows will be considered as "too late" and therefore dropped. Then, the left side buffer only needs to keep rows where "leftTime > rightTime - 8 min > 12:34 - 8m > 12:26". That is, the left state watermark is 12:26, and any rows older than that can be dropped from the state. In other words, the operator will discard all state where timestamp in state value (input rows) < state watermark. This threshold is called State Value Watermark (to distinguish from the state key watermark).

Note:

The event watermark value of one side is used to calculate the state watermark of the other side. That is, a condition ~ "leftTime > rightTime + X" with right side event watermark is used to calculate the left side state watermark. Conversely, a condition ~ "left < rightTime + Y" with left side event watermark is used to calculate right side state watermark.
Depending on the conditions, the state watermark maybe different for the left and right side. In the above example, leftTime > 12:26 AND rightTime > 12:34 - 1 hour = 11:34.
State can be dropped from BOTH sides only when there are conditions of the above forms that define time bounds on timestamp in both directions.

3. When both window in join key and time range conditions are present, case 1 + 2. In this case, since window equality is a stricter condition than the time range, we can use the State Key Watermark = event time watermark to discard state (similar to case 1).

leftKeys: Expression to generate key rows for joining from left input
rightKeys: Expression to generate key rows for joining from right input
joinType: Type of join (inner, left outer, etc.)
condition: Conditions to filter rows, split by left, right, and joined. See JoinConditionSplitPredicates
stateInfo: Version information required to read join state (buffered rows)
eventTimeWatermarkForLateEvents: Watermark for filtering late events, same for both sides
eventTimeWatermarkForEviction: Watermark for state eviction
stateWatermarkPredicates: Predicates for removal of state, see JoinStateWatermarkPredicates
left: Left child plan
right: Right child plan

Linear Supertypes

StateStoreWriter, PythonSQLMetrics, StatefulOperator, BinaryExecNode, BinaryLike[SparkPlan], SparkPlan, Serializable, Logging, QueryPlan[SparkPlan], SQLConfHelper, TreeNode[SparkPlan], WithOrigin, TreePatternBits, Product, Equals, AnyRef, Any

Ordering

Alphabetic
By Inheritance

Inherited

StreamingSymmetricHashJoinExec
StateStoreWriter
PythonSQLMetrics
StatefulOperator
BinaryExecNode
BinaryLike
SparkPlan
Serializable
Logging
QueryPlan
SQLConfHelper
TreeNode
WithOrigin
TreePatternBits
Product
Equals
AnyRef
Any

Hide All
Show All

Visibility

Public
Protected

Instance Constructors

new StreamingSymmetricHashJoinExec(leftKeys: Seq[Expression], rightKeys: Seq[Expression], joinType: JoinType, condition: Option[Expression], stateFormatVersion: Int, left: SparkPlan, right: SparkPlan)
new StreamingSymmetricHashJoinExec(leftKeys: Seq[Expression], rightKeys: Seq[Expression], joinType: JoinType, condition: JoinConditionSplitPredicates, stateInfo: Option[StatefulOperatorStateInfo], eventTimeWatermarkForLateEvents: Option[Long], eventTimeWatermarkForEviction: Option[Long], stateWatermarkPredicates: JoinStateWatermarkPredicates, stateFormatVersion: Int, left: SparkPlan, right: SparkPlan)
leftKeys
Expression to generate key rows for joining from left input
rightKeys
Expression to generate key rows for joining from right input
joinType
Type of join (inner, left outer, etc.)
condition
Conditions to filter rows, split by left, right, and joined. See JoinConditionSplitPredicates
stateInfo
Version information required to read join state (buffered rows)
eventTimeWatermarkForLateEvents
Watermark for filtering late events, same for both sides
eventTimeWatermarkForEviction
Watermark for state eviction
stateWatermarkPredicates
Predicates for removal of state, see JoinStateWatermarkPredicates
left
Left child plan
right
Right child plan

Type Members

implicit class LogStringContext extends AnyRef
Definition Classes
Logging

Value Members

final def !=(arg0: Any): Boolean
Definition Classes
AnyRef → Any
final def ##: Int
Definition Classes
AnyRef → Any
final def ==(arg0: Any): Boolean
Definition Classes
AnyRef → Any
def allAttributes: AttributeSeq
Definition Classes
QueryPlan
def apply(number: Int): TreeNode[_]
Definition Classes
TreeNode
def applyRemovingRowsOlderThanWatermark(iter: Iterator[InternalRow], predicateDropRowByWatermark: BasePredicate): Iterator[InternalRow]
Attributes
protected
Definition Classes
StateStoreWriter
def argString(maxFields: Int): String
Definition Classes
TreeNode
def asCode: String
Definition Classes
TreeNode
final def asInstanceOf[T0]: T0
Definition Classes
Any
def canonicalized: SparkPlan
Definition Classes
QueryPlan
val checkpointInfoAccumulator: CollectionAccumulator[StatefulOpStateStoreCheckpointInfo]
Aggregator used for the executors to pass new state store checkpoints' IDs to driver.
Aggregator used for the executors to pass new state store checkpoints' IDs to driver. For the general checkpoint ID workflow, see comments of class class StatefulOperatorStateInfo
Definition Classes
StateStoreWriter
final lazy val children: Seq[SparkPlan]
Definition Classes
BinaryLike
Annotations
@transient()
def cleanupResources(): Unit
Cleans up the resources used by the physical operator (if any).
Cleans up the resources used by the physical operator (if any). In general, all the resources should be cleaned up when the task finishes but operators like SortMergeJoinExec and LimitExec may want eager cleanup to free up tight resources (e.g., memory).
Attributes
protected[sql]
Definition Classes
SparkPlan
def clone(): SparkPlan
Definition Classes
TreeNode → AnyRef
def collect[B](pf: PartialFunction[SparkPlan, B]): Seq[B]
Definition Classes
TreeNode
def collectFirst[B](pf: PartialFunction[SparkPlan, B]): Option[B]
Definition Classes
TreeNode
def collectLeaves(): Seq[SparkPlan]
Definition Classes
TreeNode
def collectWithSubqueries[B](f: PartialFunction[SparkPlan, B]): Seq[B]
Definition Classes
QueryPlan
val condition: JoinConditionSplitPredicates
def conf: SQLConf
Definition Classes
SparkPlan → SQLConfHelper
final def containsAllPatterns(patterns: TreePattern*): Boolean
Definition Classes
TreePatternBits
final def containsAnyPattern(patterns: TreePattern*): Boolean
Definition Classes
TreePatternBits
lazy val containsChild: Set[TreeNode[_]]
Definition Classes
TreeNode
final def containsPattern(t: TreePattern): Boolean
Definition Classes
TreePatternBits
Annotations
@inline()
def copyTagsFrom(other: SparkPlan): Unit
Definition Classes
TreeNode
def customStatefulOperatorMetrics: Seq[StatefulOperatorCustomMetric]
Set of stateful operator custom metrics.
Set of stateful operator custom metrics. These are captured as part of the generic key-value map StateOperatorProgress.customMetrics. Stateful operators can extend this method to provide their own unique custom metrics.
Definition Classes
StreamingSymmetricHashJoinExec → StateStoreWriter
def deterministic: Boolean
Definition Classes
QueryPlan
def doCanonicalize(): SparkPlan
Attributes
protected
Definition Classes
QueryPlan
def doExecute(): RDD[InternalRow]
Produces the result of the query as an RDD[InternalRow]
Produces the result of the query as an RDD[InternalRow]
Overridden by concrete implementations of SparkPlan.
Attributes
protected
Definition Classes
StreamingSymmetricHashJoinExec → SparkPlan
def doExecuteBroadcast[T](): Broadcast[T]
Produces the result of the query as a broadcast variable.
Produces the result of the query as a broadcast variable.
Overridden by concrete implementations of SparkPlan.
Attributes
protected[sql]
Definition Classes
SparkPlan
def doExecuteColumnar(): RDD[ColumnarBatch]
Produces the result of the query as an RDD[ColumnarBatch] if supportsColumnar returns true.
Produces the result of the query as an RDD[ColumnarBatch] if supportsColumnar returns true. By convention the executor that creates a ColumnarBatch is responsible for closing it when it is no longer needed. This allows input formats to be able to reuse batches if needed.
Attributes
protected
Definition Classes
SparkPlan
def doExecuteWrite(writeFilesSpec: WriteFilesSpec): RDD[WriterCommitMessage]
Produces the result of the writes as an RDD[WriterCommitMessage]
Produces the result of the writes as an RDD[WriterCommitMessage]
Overridden by concrete implementations of SparkPlan.
Attributes
protected
Definition Classes
SparkPlan
def doPrepare(): Unit
Overridden by concrete implementations of SparkPlan.
Overridden by concrete implementations of SparkPlan. It is guaranteed to run before any execute of SparkPlan. This is helpful if we want to set up some state before executing the query, e.g., BroadcastHashJoin uses it to broadcast asynchronously.
Attributes
protected
Definition Classes
SparkPlan
Note
prepare method has already walked down the tree, so the implementation doesn't have to call children's prepare methods. This will only be called once, protected by this.
final def eq(arg0: AnyRef): Boolean
Definition Classes
AnyRef
val eventTimeWatermarkForEviction: Option[Long]
val eventTimeWatermarkForLateEvents: Option[Long]
final def execute(): RDD[InternalRow]
Returns the result of this query as an RDD[InternalRow] by delegating to doExecute after preparations.
Returns the result of this query as an RDD[InternalRow] by delegating to doExecute after preparations.
Concrete implementations of SparkPlan should override doExecute.
Definition Classes
SparkPlan
final def executeBroadcast[T](): Broadcast[T]
Returns the result of this query as a broadcast variable by delegating to doExecuteBroadcast after preparations.
Returns the result of this query as a broadcast variable by delegating to doExecuteBroadcast after preparations.
Concrete implementations of SparkPlan should override doExecuteBroadcast.
Definition Classes
SparkPlan
def executeCollect(): Array[InternalRow]
Runs this query returning the result as an array.
Runs this query returning the result as an array.
Definition Classes
SparkPlan
def executeCollectPublic(): Array[Row]
Runs this query returning the result as an array, using external Row format.
Runs this query returning the result as an array, using external Row format.
Definition Classes
SparkPlan
final def executeColumnar(): RDD[ColumnarBatch]
Returns the result of this query as an RDD[ColumnarBatch] by delegating to doColumnarExecute after preparations.
Returns the result of this query as an RDD[ColumnarBatch] by delegating to doColumnarExecute after preparations.
Concrete implementations of SparkPlan should override doColumnarExecute if supportsColumnar returns true.
Definition Classes
SparkPlan
final def executeQuery[T](query: => T): T
Executes a query after preparing the query and adding query plan information to created RDDs for visualization.
Executes a query after preparing the query and adding query plan information to created RDDs for visualization.
Attributes
protected
Definition Classes
SparkPlan
def executeTail(n: Int): Array[InternalRow]
Runs this query returning the last n rows as an array.
Runs this query returning the last n rows as an array.
This is modeled after RDD.take but never runs any job locally on the driver.
Definition Classes
SparkPlan
def executeTake(n: Int): Array[InternalRow]
Runs this query returning the first n rows as an array.
Runs this query returning the first n rows as an array.
This is modeled after RDD.take but never runs any job locally on the driver.
Definition Classes
SparkPlan
def executeToIterator(): Iterator[InternalRow]
Runs this query returning the result as an iterator of InternalRow.
Runs this query returning the result as an iterator of InternalRow.
Definition Classes
SparkPlan
Note
Triggers multiple jobs (one for each partition).
def executeWrite(writeFilesSpec: WriteFilesSpec): RDD[WriterCommitMessage]
Returns the result of writes as an RDD[WriterCommitMessage] variable by delegating to doExecuteWrite after preparations.
Returns the result of writes as an RDD[WriterCommitMessage] variable by delegating to doExecuteWrite after preparations.
Concrete implementations of SparkPlan should override doExecuteWrite.
Definition Classes
SparkPlan
def exists(f: (SparkPlan) => Boolean): Boolean
Definition Classes
TreeNode
final def expressions: Seq[Expression]
Definition Classes
QueryPlan
def fastEquals(other: TreeNode[_]): Boolean
Definition Classes
TreeNode
def find(f: (SparkPlan) => Boolean): Option[SparkPlan]
Definition Classes
TreeNode
def flatMap[A](f: (SparkPlan) => IterableOnce[A]): Seq[A]
Definition Classes
TreeNode
def foreach(f: (SparkPlan) => Unit): Unit
Definition Classes
TreeNode
def foreachUp(f: (SparkPlan) => Unit): Unit
Definition Classes
TreeNode
def foreachWithSubqueries(f: (SparkPlan) => Unit): Unit
Definition Classes
QueryPlan
def formattedNodeName: String
Attributes
protected
Definition Classes
QueryPlan
def generateTreeString(depth: Int, lastChildren: ArrayList[Boolean], append: (String) => Unit, verbose: Boolean, prefix: String, addSuffix: Boolean, maxFields: Int, printNodeId: Boolean, indent: Int): Unit
Definition Classes
TreeNode
final def getClass(): Class[_ <: AnyRef]
Definition Classes
AnyRef → Any
Annotations
@IntrinsicCandidate() @native()
def getDefaultTreePatternBits: BitSet
Attributes
protected
Definition Classes
TreeNode
def getProgress(): StateOperatorProgress
Get the progress made by this stateful operator after execution.
Get the progress made by this stateful operator after execution. This should be called in the driver after this SparkPlan has been executed and metrics have been updated.
Definition Classes
StateStoreWriter
def getStateInfo: StatefulOperatorStateInfo
Definition Classes
StatefulOperator
def getStateStoreCheckpointInfo(): Array[StatefulOpStateStoreCheckpointInfo]
Get aggregated checkpoint ID info for all shuffle partitions For the general checkpoint ID workflow, see comments of class class StatefulOperatorStateInfo
Get aggregated checkpoint ID info for all shuffle partitions For the general checkpoint ID workflow, see comments of class class StatefulOperatorStateInfo
Definition Classes
StateStoreWriter
def getTagValue[T](tag: TreeNodeTag[T]): Option[T]
Definition Classes
TreeNode
def hashCode(): Int
Definition Classes
TreeNode → AnyRef → Any
lazy val height: Int
Definition Classes
TreeNode
val id: Int
Definition Classes
SparkPlan
def initializeLogIfNecessary(isInterpreter: Boolean, silent: Boolean): Boolean
Attributes
protected
Definition Classes
Logging
def initializeLogIfNecessary(isInterpreter: Boolean): Unit
Attributes
protected
Definition Classes
Logging
def innerChildren: Seq[QueryPlan[_]]
Definition Classes
QueryPlan → TreeNode
def inputSet: AttributeSet
Definition Classes
QueryPlan
def isCanonicalizedPlan: Boolean
Attributes
protected
Definition Classes
QueryPlan
final def isInstanceOf[T0]: Boolean
Definition Classes
Any
def isRuleIneffective(ruleId: RuleId): Boolean
Attributes
protected
Definition Classes
TreeNode
def isTagsEmpty: Boolean
Definition Classes
TreeNode
def isTraceEnabled(): Boolean
Attributes
protected
Definition Classes
Logging
val joinType: JoinType
def jsonFields: List[JField]
Attributes
protected
Definition Classes
TreeNode
val left: SparkPlan
Definition Classes
StreamingSymmetricHashJoinExec → BinaryLike
val leftKeys: Seq[Expression]
final def legacyWithNewChildren(newChildren: Seq[SparkPlan]): SparkPlan
Attributes
protected
Definition Classes
TreeNode
def log: Logger
Attributes
protected
Definition Classes
Logging
def logDebug(msg: => String, throwable: Throwable): Unit
Attributes
protected
Definition Classes
Logging
def logDebug(entry: LogEntry, throwable: Throwable): Unit
Attributes
protected
Definition Classes
Logging
def logDebug(entry: LogEntry): Unit
Attributes
protected
Definition Classes
Logging
def logDebug(msg: => String): Unit
Attributes
protected
Definition Classes
Logging
def logError(msg: => String, throwable: Throwable): Unit
Attributes
protected
Definition Classes
Logging
def logError(entry: LogEntry, throwable: Throwable): Unit
Attributes
protected
Definition Classes
Logging
def logError(entry: LogEntry): Unit
Attributes
protected
Definition Classes
Logging
def logError(msg: => String): Unit
Attributes
protected
Definition Classes
Logging
def logInfo(msg: => String, throwable: Throwable): Unit
Attributes
protected
Definition Classes
Logging
def logInfo(entry: LogEntry, throwable: Throwable): Unit
Attributes
protected
Definition Classes
Logging
def logInfo(entry: LogEntry): Unit
Attributes
protected
Definition Classes
Logging
def logInfo(msg: => String): Unit
Attributes
protected
Definition Classes
Logging
def logName: String
Attributes
protected
Definition Classes
Logging
def logTrace(msg: => String, throwable: Throwable): Unit
Attributes
protected
Definition Classes
Logging
def logTrace(entry: LogEntry, throwable: Throwable): Unit
Attributes
protected
Definition Classes
Logging
def logTrace(entry: LogEntry): Unit
Attributes
protected
Definition Classes
Logging
def logTrace(msg: => String): Unit
Attributes
protected
Definition Classes
Logging
def logWarning(msg: => String, throwable: Throwable): Unit
Attributes
protected
Definition Classes
Logging
def logWarning(entry: LogEntry, throwable: Throwable): Unit
Attributes
protected
Definition Classes
Logging
def logWarning(entry: LogEntry): Unit
Attributes
protected
Definition Classes
Logging
def logWarning(msg: => String): Unit
Attributes
protected
Definition Classes
Logging
def logicalLink: Option[LogicalPlan]
returns
The logical plan this plan is linked to.
Definition Classes
SparkPlan
def longMetric(name: String): SQLMetric
returns
SQLMetric for the name.
Definition Classes
SparkPlan
def makeCopy(newArgs: Array[AnyRef]): SparkPlan
Overridden make copy also propagates sqlContext to copied plan.
Overridden make copy also propagates sqlContext to copied plan.
Definition Classes
SparkPlan → TreeNode
def map[A](f: (SparkPlan) => A): Seq[A]
Definition Classes
TreeNode
final def mapChildren(f: (SparkPlan) => SparkPlan): SparkPlan
Definition Classes
BinaryLike
def mapExpressions(f: (Expression) => Expression): StreamingSymmetricHashJoinExec.this.type
Definition Classes
QueryPlan
def mapProductIterator[B](f: (Any) => B)(implicit arg0: ClassTag[B]): Array[B]
Attributes
protected
Definition Classes
TreeNode
def markRuleAsIneffective(ruleId: RuleId): Unit
Attributes
protected
Definition Classes
TreeNode
def metadataFilePath(): Path
Definition Classes
StatefulOperator
lazy val metrics: Map[String, SQLMetric]
returns
All metrics containing metrics of this SparkPlan.
Definition Classes
StateStoreWriter → PythonSQLMetrics → SparkPlan
final def missingInput: AttributeSet
Definition Classes
QueryPlan
def multiTransformDown(rule: PartialFunction[SparkPlan, Seq[SparkPlan]]): LazyList[SparkPlan]
Definition Classes
TreeNode
def multiTransformDownWithPruning(cond: (TreePatternBits) => Boolean, ruleId: RuleId)(rule: PartialFunction[SparkPlan, Seq[SparkPlan]]): LazyList[SparkPlan]
Definition Classes
TreeNode
final def ne(arg0: AnyRef): Boolean
Definition Classes
AnyRef
def nodeName: String
Definition Classes
TreeNode
val nodePatterns: Seq[TreePattern]
Attributes
protected
Definition Classes
TreeNode
final def notify(): Unit
Definition Classes
AnyRef
Annotations
@IntrinsicCandidate() @native()
final def notifyAll(): Unit
Definition Classes
AnyRef
Annotations
@IntrinsicCandidate() @native()
val nullLeft: GenericInternalRow
val nullRight: GenericInternalRow
def numberedTreeString: String
Definition Classes
TreeNode
def operatorStateMetadata(stateSchemaPaths: List[List[String]] = List.empty): OperatorStateMetadata
Metadata of this stateful operator and its states stores.
Metadata of this stateful operator and its states stores.
Definition Classes
StreamingSymmetricHashJoinExec → StateStoreWriter
def operatorStateMetadataVersion: Int
Definition Classes
StateStoreWriter
val origin: Origin
Definition Classes
TreeNode → WithOrigin
def otherCopyArgs: Seq[AnyRef]
Attributes
protected
Definition Classes
TreeNode
def output: Seq[Attribute]
Definition Classes
StreamingSymmetricHashJoinExec → QueryPlan
def outputOrdering: Seq[SortOrder]
Definition Classes
QueryPlan
def outputPartitioning: Partitioning
Specifies how data is partitioned across different nodes in the cluster.
Specifies how data is partitioned across different nodes in the cluster. Note this method may fail if it is invoked before EnsureRequirements is applied since PartitioningCollection requires all its partitionings to have the same number of partitions.
Definition Classes
StreamingSymmetricHashJoinExec → SparkPlan
def outputSet: AttributeSet
Definition Classes
QueryPlan
def p(number: Int): SparkPlan
Definition Classes
TreeNode
final def prepare(): Unit
Prepares this SparkPlan for execution.
Prepares this SparkPlan for execution. It's idempotent.
Definition Classes
SparkPlan
def prepareSubqueries(): Unit
Finds scalar subquery expressions in this plan node and starts evaluating them.
Finds scalar subquery expressions in this plan node and starts evaluating them.
Attributes
protected
Definition Classes
SparkPlan
def prettyJson: String
Definition Classes
TreeNode
def printSchema(): Unit
Definition Classes
QueryPlan
def produceOutputWatermark(inputWatermarkMs: Long): Option[Long]
Produce the output watermark for given input watermark (ms).
Produce the output watermark for given input watermark (ms).
In most cases, this is same as the criteria of state eviction, as most stateful operators produce the output from two different kinds:
1. without buffering 2. with buffering (state)
The state eviction happens when event time exceeds a "certain threshold of timestamp", which denotes a lower bound of event time values for output (output watermark).
The default implementation provides the input watermark as it is. Most built-in operators will evict based on min input watermark and ensure it will be minimum of the event time value for the output so far (including output from eviction). Operators which behave differently (e.g. different criteria on eviction) must override this method.
Note that the default behavior wil advance the watermark aggressively to simplify the logic, but it does not break the semantic of output watermark, which is following:
An operator guarantees that it will not emit record with an event timestamp lower than its output watermark.
For example, for 5 minutes time window aggregation, the advancement of watermark can happen "before" the window has been evicted and produced as output. Say, suppose there's an window in state: [0, 5) and input watermark = 3. Although there is no output for this operator, this operator will produce an output watermark as 3. It's still respecting the guarantee, as the operator will produce the window [0, 5) only when the output watermark is equal or greater than 5, and the downstream operator will process the input data, "and then" advance the watermark. Hence this window is considered as "non-late" record.
Definition Classes
StreamingSymmetricHashJoinExec → StateStoreWriter
def producedAttributes: AttributeSet
Definition Classes
QueryPlan
def productElementNames: Iterator[String]
Definition Classes
Product
val pythonMetrics: Map[String, SQLMetric]
Attributes
protected
Definition Classes
PythonSQLMetrics
def references: AttributeSet
Definition Classes
QueryPlan
def requiredChildDistribution: Seq[Distribution]
Specifies the data distribution requirements of all the children for this operator.
Specifies the data distribution requirements of all the children for this operator. By default it's UnspecifiedDistribution for each child, which means each child can have any distribution.
If an operator overwrites this method, and specifies distribution requirements(excluding UnspecifiedDistribution and BroadcastDistribution) for more than one child, Spark guarantees that the outputs of these children will have same number of partitions, so that the operator can safely zip partitions of these children's result RDDs. Some operators can leverage this guarantee to satisfy some interesting requirement, e.g., non-broadcast joins can specify HashClusteredDistribution(a,b) for its left child, and specify HashClusteredDistribution(c,d) for its right child, then it's guaranteed that left and right child are co-partitioned by a,b/c,d, which means tuples of same value are in the partitions of same index, e.g., (a=1,b=2) and (c=1,d=2) are both in the second partition of left and right child.
Definition Classes
StreamingSymmetricHashJoinExec → SparkPlan
def requiredChildOrdering: Seq[Seq[SortOrder]]
Specifies sort order for each partition requirements on the input data for this operator.
Specifies sort order for each partition requirements on the input data for this operator.
Definition Classes
SparkPlan
def resetMetrics(): Unit
Resets all the metrics.
Resets all the metrics.
Definition Classes
SparkPlan
def rewriteAttrs(attrMap: AttributeMap[Attribute]): SparkPlan
Definition Classes
QueryPlan
val right: SparkPlan
Definition Classes
StreamingSymmetricHashJoinExec → BinaryLike
val rightKeys: Seq[Expression]
final def sameResult(other: SparkPlan): Boolean
Definition Classes
QueryPlan
def schema: StructType
Definition Classes
QueryPlan
def schemaString: String
Definition Classes
QueryPlan
final def semanticHash(): Int
Definition Classes
QueryPlan
final val session: classic.SparkSession
Definition Classes
SparkPlan
def setLogicalLink(logicalPlan: LogicalPlan): Unit
Set logical plan link recursively if unset.
Set logical plan link recursively if unset.
Definition Classes
SparkPlan
def setOperatorMetrics(numStateStoreInstances: Int = 1): Unit
Set the operator level metrics
Set the operator level metrics
Attributes
protected
Definition Classes
StateStoreWriter
def setStateStoreCheckpointInfo(checkpointInfo: StatefulOpStateStoreCheckpointInfo): Unit
The executor reports its state store checkpoint ID, which would be sent back to the driver.
The executor reports its state store checkpoint ID, which would be sent back to the driver. For the general checkpoint ID workflow, see comments of class class StatefulOperatorStateInfo
Attributes
protected
Definition Classes
StateStoreWriter
def setStoreMetrics(store: StateStore): Unit
Set the SQL metrics related to the state store.
Set the SQL metrics related to the state store. This should be called in that task after the store has been updated.
Attributes
protected
Definition Classes
StateStoreWriter
def setTagValue[T](tag: TreeNodeTag[T], value: T): Unit
Definition Classes
TreeNode
def shortName: String
Name to output in StreamingOperatorProgress to identify operator type
Name to output in StreamingOperatorProgress to identify operator type
Definition Classes
StreamingSymmetricHashJoinExec → StateStoreWriter
def shouldRunAnotherBatch(newInputWatermark: Long): Boolean
Should the MicroBatchExecution run another batch based on this stateful operator and the new input watermark.
Should the MicroBatchExecution run another batch based on this stateful operator and the new input watermark.
Definition Classes
StreamingSymmetricHashJoinExec → StateStoreWriter
def simpleString(maxFields: Int): String
Definition Classes
QueryPlan → TreeNode
def simpleStringWithNodeId(): String
Definition Classes
QueryPlan → TreeNode
def sparkContext: SparkContext
Attributes
protected
Definition Classes
SparkPlan
val stateFormatVersion: Int
val stateInfo: Option[StatefulOperatorStateInfo]
Definition Classes
StreamingSymmetricHashJoinExec → StatefulOperator
def statePrefix: String
Attributes
protected
Definition Classes
QueryPlan
def stateSchemaDirPath(storeName: Option[String] = None): Path
Definition Classes
StateStoreWriter
def stateSchemaList(stateSchemaValidationResults: List[StateSchemaValidationResult], oldMetadata: Option[OperatorStateMetadata]): List[List[String]]
Definition Classes
StateStoreWriter
val stateWatermarkPredicates: JoinStateWatermarkPredicates
def stringArgs: Iterator[Any]
Attributes
protected
Definition Classes
TreeNode
def subqueries: Seq[SparkPlan]
Definition Classes
QueryPlan
def subqueriesAll: Seq[SparkPlan]
Definition Classes
QueryPlan
def supportsColumnar: Boolean
Return true if this stage of the plan supports columnar execution.
Return true if this stage of the plan supports columnar execution. A plan can also support row-based execution (see supportsRowBased). Spark will decide which execution to be called during query planning.
Definition Classes
SparkPlan
def supportsRowBased: Boolean
Return true if this stage of the plan supports row-based execution.
Return true if this stage of the plan supports row-based execution. A plan can also support columnar execution (see supportsColumnar). Spark will decide which execution to be called during query planning.
Definition Classes
SparkPlan
def supportsSchemaEvolution: Boolean
Definition Classes
StateStoreWriter
final def synchronized[T0](arg0: => T0): T0
Definition Classes
AnyRef
def timeTakenMs(body: => Unit): Long
Records the duration of running body for the next query progress update.
Records the duration of running body for the next query progress update.
Attributes
protected
Definition Classes
StateStoreWriter
def toJSON: String
Definition Classes
TreeNode
def toRowBased: SparkPlan
Converts the output of this plan to row-based if it is columnar plan.
Converts the output of this plan to row-based if it is columnar plan.
Definition Classes
SparkPlan
def toString(): String
Definition Classes
TreeNode → AnyRef → Any
def transform(rule: PartialFunction[SparkPlan, SparkPlan]): SparkPlan
Definition Classes
TreeNode
def transformAllExpressions(rule: PartialFunction[Expression, Expression]): StreamingSymmetricHashJoinExec.this.type
Definition Classes
QueryPlan
def transformAllExpressionsWithPruning(cond: (TreePatternBits) => Boolean, ruleId: RuleId)(rule: PartialFunction[Expression, Expression]): StreamingSymmetricHashJoinExec.this.type
Definition Classes
QueryPlan
def transformAllExpressionsWithSubqueries(rule: PartialFunction[Expression, Expression]): StreamingSymmetricHashJoinExec.this.type
Definition Classes
QueryPlan
def transformDown(rule: PartialFunction[SparkPlan, SparkPlan]): SparkPlan
Definition Classes
TreeNode
def transformDownWithPruning(cond: (TreePatternBits) => Boolean, ruleId: RuleId)(rule: PartialFunction[SparkPlan, SparkPlan]): SparkPlan
Definition Classes
TreeNode
def transformDownWithSubqueries(f: PartialFunction[SparkPlan, SparkPlan]): SparkPlan
Definition Classes
QueryPlan
def transformDownWithSubqueriesAndPruning(cond: (TreePatternBits) => Boolean, ruleId: RuleId)(f: PartialFunction[SparkPlan, SparkPlan]): SparkPlan
Definition Classes
QueryPlan
def transformExpressions(rule: PartialFunction[Expression, Expression]): StreamingSymmetricHashJoinExec.this.type
Definition Classes
QueryPlan
def transformExpressionsDown(rule: PartialFunction[Expression, Expression]): StreamingSymmetricHashJoinExec.this.type
Definition Classes
QueryPlan
def transformExpressionsDownWithPruning(cond: (TreePatternBits) => Boolean, ruleId: RuleId)(rule: PartialFunction[Expression, Expression]): StreamingSymmetricHashJoinExec.this.type
Definition Classes
QueryPlan
def transformExpressionsUp(rule: PartialFunction[Expression, Expression]): StreamingSymmetricHashJoinExec.this.type
Definition Classes
QueryPlan
def transformExpressionsUpWithPruning(cond: (TreePatternBits) => Boolean, ruleId: RuleId)(rule: PartialFunction[Expression, Expression]): StreamingSymmetricHashJoinExec.this.type
Definition Classes
QueryPlan
def transformExpressionsWithPruning(cond: (TreePatternBits) => Boolean, ruleId: RuleId)(rule: PartialFunction[Expression, Expression]): StreamingSymmetricHashJoinExec.this.type
Definition Classes
QueryPlan
def transformUp(rule: PartialFunction[SparkPlan, SparkPlan]): SparkPlan
Definition Classes
TreeNode
def transformUpWithBeforeAndAfterRuleOnChildren(cond: (SparkPlan) => Boolean, ruleId: RuleId)(rule: PartialFunction[(SparkPlan, SparkPlan), SparkPlan]): SparkPlan
Definition Classes
TreeNode
def transformUpWithNewOutput(rule: PartialFunction[SparkPlan, (SparkPlan, Seq[(Attribute, Attribute)])], skipCond: (SparkPlan) => Boolean, canGetOutput: (SparkPlan) => Boolean): SparkPlan
Definition Classes
QueryPlan
def transformUpWithPruning(cond: (TreePatternBits) => Boolean, ruleId: RuleId)(rule: PartialFunction[SparkPlan, SparkPlan]): SparkPlan
Definition Classes
TreeNode
def transformUpWithSubqueries(f: PartialFunction[SparkPlan, SparkPlan]): SparkPlan
Definition Classes
QueryPlan
def transformUpWithSubqueriesAndPruning(cond: (TreePatternBits) => Boolean, ruleId: RuleId)(f: PartialFunction[SparkPlan, SparkPlan]): SparkPlan
Definition Classes
QueryPlan
def transformWithPruning(cond: (TreePatternBits) => Boolean, ruleId: RuleId)(rule: PartialFunction[SparkPlan, SparkPlan]): SparkPlan
Definition Classes
TreeNode
def transformWithSubqueries(f: PartialFunction[SparkPlan, SparkPlan]): SparkPlan
Definition Classes
QueryPlan
lazy val treePatternBits: BitSet
Definition Classes
QueryPlan → TreeNode → TreePatternBits
def treeString(append: (String) => Unit, verbose: Boolean, addSuffix: Boolean, maxFields: Int, printOperatorId: Boolean): Unit
Definition Classes
TreeNode
final def treeString(verbose: Boolean, addSuffix: Boolean, maxFields: Int, printOperatorId: Boolean): String
Definition Classes
TreeNode
final def treeString: String
Definition Classes
TreeNode
def unsetTagValue[T](tag: TreeNodeTag[T]): Unit
Definition Classes
TreeNode
def updateOuterReferencesInSubquery(plan: SparkPlan, attrMap: AttributeMap[Attribute]): SparkPlan
Attributes
protected
Definition Classes
QueryPlan
def validateAndMaybeEvolveStateSchema(hadoopConf: Configuration, batchId: Long, stateSchemaVersion: Int): List[StateSchemaValidationResult]
Definition Classes
StreamingSymmetricHashJoinExec → StatefulOperator
def validateNewMetadata(oldMetadata: OperatorStateMetadata, newMetadata: OperatorStateMetadata): Unit
Definition Classes
StateStoreWriter
def vectorTypes: Option[Seq[String]]
The exact java types of the columns that are output in columnar processing mode.
The exact java types of the columns that are output in columnar processing mode. This is a performance optimization for code generation and is optional.
Definition Classes
SparkPlan
def verboseString(maxFields: Int): String
Definition Classes
QueryPlan → TreeNode
def verboseStringWithOperatorId(): String
Definition Classes
BinaryExecNode → QueryPlan
def verboseStringWithSuffix(maxFields: Int): String
Definition Classes
TreeNode
final def wait(arg0: Long, arg1: Int): Unit
Definition Classes
AnyRef
Annotations
@throws(classOf[java.lang.InterruptedException])
final def wait(arg0: Long): Unit
Definition Classes
AnyRef
Annotations
@throws(classOf[java.lang.InterruptedException]) @native()
final def wait(): Unit
Definition Classes
AnyRef
Annotations
@throws(classOf[java.lang.InterruptedException])
def waitForSubqueries(): Unit
Blocks the thread until all subqueries finish evaluation and update the results.
Blocks the thread until all subqueries finish evaluation and update the results.
Attributes
protected
Definition Classes
SparkPlan
def withLogContext(context: Map[String, String])(body: => Unit): Unit
Attributes
protected
Definition Classes
Logging
final def withNewChildren(newChildren: Seq[SparkPlan]): SparkPlan
Definition Classes
TreeNode
def withNewChildrenInternal(newLeft: SparkPlan, newRight: SparkPlan): StreamingSymmetricHashJoinExec
Attributes
protected
Definition Classes
StreamingSymmetricHashJoinExec → BinaryLike
final def withNewChildrenInternal(newChildren: IndexedSeq[SparkPlan]): SparkPlan
Definition Classes
BinaryLike
def withSQLConf[T](pairs: (String, String)*)(f: => T): T
Attributes
protected
Definition Classes
SQLConfHelper

Deprecated Value Members

def finalize(): Unit
Attributes
protected[lang]
Definition Classes
AnyRef
Annotations
@throws(classOf[java.lang.Throwable]) @Deprecated
Deprecated
(Since version 9)

Packages

StreamingSymmetricHashJoinExec

Instance Constructors

Type Members

Value Members

Deprecated Value Members

Inherited from StateStoreWriter

Inherited from PythonSQLMetrics

Inherited from StatefulOperator

Inherited from BinaryExecNode

Inherited from BinaryLike[SparkPlan]

Inherited from SparkPlan

Inherited from Serializable

Inherited from Logging

Inherited from QueryPlan[SparkPlan]

Inherited from SQLConfHelper

Inherited from TreeNode[SparkPlan]

Inherited from WithOrigin

Inherited from TreePatternBits

Inherited from Product

Inherited from Equals

Inherited from AnyRef

Inherited from Any

Ungrouped

Packages

StreamingSymmetricHashJoinExec

Instance Constructors

Type Members

Value Members

Deprecated Value Members

Inherited from StateStoreWriter

Inherited from PythonSQLMetrics

Inherited from StatefulOperator

Inherited from BinaryExecNode

Inherited from BinaryLike[SparkPlan]

Inherited from SparkPlan

Inherited from Serializable

Inherited from Logging

Inherited from QueryPlan[SparkPlan]

Inherited from SQLConfHelper

Inherited from TreeNode[SparkPlan]

Inherited from WithOrigin

Inherited from TreePatternBits

Inherited from Product

Inherited from Equals

Inherited from AnyRef

Inherited from Any

Ungrouped

StreamingSymmetricHashJoinExec