abstract class SparkStrategies extends QueryPlanner[SparkPlan]
- Alphabetic
- By Inheritance
- SparkStrategies
- QueryPlanner
- AnyRef
- Any
- Hide All
- Show All
- Public
- Protected
Instance Constructors
- new SparkStrategies()
Type Members
- case class StreamingGlobalLimitStrategy(outputMode: OutputMode) extends Strategy with Product with Serializable
Used to plan the streaming global limit operator for streams in append mode.
Used to plan the streaming global limit operator for streams in append mode. We need to check for either a direct Limit or a Limit wrapped in a ReturnAnswer operator, following the example of the SpecialLimits Strategy above.
Abstract Value Members
- abstract def collectPlaceholders(plan: SparkPlan): Seq[(SparkPlan, LogicalPlan)]
- Attributes
- protected
- Definition Classes
- QueryPlanner
- abstract def prunePlans(plans: Iterator[SparkPlan]): Iterator[SparkPlan]
- Attributes
- protected
- Definition Classes
- QueryPlanner
- abstract def strategies: Seq[GenericStrategy[SparkPlan]]
- Definition Classes
- QueryPlanner
Concrete Value Members
- final def !=(arg0: Any): Boolean
- Definition Classes
- AnyRef → Any
- final def ##: Int
- Definition Classes
- AnyRef → Any
- final def ==(arg0: Any): Boolean
- Definition Classes
- AnyRef → Any
- final def asInstanceOf[T0]: T0
- Definition Classes
- Any
- def clone(): AnyRef
- Attributes
- protected[lang]
- Definition Classes
- AnyRef
- Annotations
- @throws(classOf[java.lang.CloneNotSupportedException]) @native()
- final def eq(arg0: AnyRef): Boolean
- Definition Classes
- AnyRef
- def equals(arg0: AnyRef): Boolean
- Definition Classes
- AnyRef → Any
- def finalize(): Unit
- Attributes
- protected[lang]
- Definition Classes
- AnyRef
- Annotations
- @throws(classOf[java.lang.Throwable])
- final def getClass(): Class[_ <: AnyRef]
- Definition Classes
- AnyRef → Any
- Annotations
- @native()
- def hashCode(): Int
- Definition Classes
- AnyRef → Any
- Annotations
- @native()
- final def isInstanceOf[T0]: Boolean
- Definition Classes
- Any
- final def ne(arg0: AnyRef): Boolean
- Definition Classes
- AnyRef
- final def notify(): Unit
- Definition Classes
- AnyRef
- Annotations
- @native()
- final def notifyAll(): Unit
- Definition Classes
- AnyRef
- Annotations
- @native()
- def plan(plan: LogicalPlan): Iterator[SparkPlan]
- Definition Classes
- SparkStrategies → QueryPlanner
- lazy val singleRowRdd: RDD[InternalRow]
- Attributes
- protected
- final def synchronized[T0](arg0: => T0): T0
- Definition Classes
- AnyRef
- def toString(): String
- Definition Classes
- AnyRef → Any
- final def wait(): Unit
- Definition Classes
- AnyRef
- Annotations
- @throws(classOf[java.lang.InterruptedException])
- final def wait(arg0: Long, arg1: Int): Unit
- Definition Classes
- AnyRef
- Annotations
- @throws(classOf[java.lang.InterruptedException])
- final def wait(arg0: Long): Unit
- Definition Classes
- AnyRef
- Annotations
- @throws(classOf[java.lang.InterruptedException]) @native()
- object Aggregation extends Strategy
Used to plan the aggregate operator for expressions based on the AggregateFunction2 interface.
- object BasicOperators extends Strategy
- object FlatMapGroupsWithStateStrategy extends Strategy
Strategy to convert FlatMapGroupsWithState logical operator to physical operator in streaming plans.
Strategy to convert FlatMapGroupsWithState logical operator to physical operator in streaming plans. Conversion for batch plans is handled by BasicOperators.
- object InMemoryScans extends Strategy
- object JoinSelection extends Strategy with PredicateHelper with JoinSelectionHelper
Select the proper physical plan for join based on join strategy hints, the availability of equi-join keys and the sizes of joining relations.
Select the proper physical plan for join based on join strategy hints, the availability of equi-join keys and the sizes of joining relations. Below are the existing join strategies, their characteristics and their limitations.
- Broadcast hash join (BHJ): Only supported for equi-joins, while the join keys do not need to be sortable. Supported for all join types except full outer joins. BHJ usually performs faster than the other join algorithms when the broadcast side is small. However, broadcasting tables is a network-intensive operation and it could cause OOM or perform badly in some cases, especially when the build/broadcast side is big.
- Shuffle hash join: Only supported for equi-joins, while the join keys do not need to be sortable. Supported for all join types. Building hash map from table is a memory-intensive operation and it could cause OOM when the build side is big.
- Shuffle sort merge join (SMJ): Only supported for equi-joins and the join keys have to be sortable. Supported for all join types.
- Broadcast nested loop join (BNLJ): Supports both equi-joins and non-equi-joins. Supports all the join types, but the implementation is optimized for: 1) broadcasting the left side in a right outer join; 2) broadcasting the right side in a left outer, left semi, left anti or existence join; 3) broadcasting either side in an inner-like join. For other cases, we need to scan the data multiple times, which can be rather slow.
- Shuffle-and-replicate nested loop join (a.k.a. cartesian product join): Supports both equi-joins and non-equi-joins. Supports only inner like joins.
- object PythonEvals extends Strategy
Strategy to convert EvalPython logical operator to physical operator.
- object SparkScripts extends Strategy
- object SpecialLimits extends Strategy
Plans special cases of limit operators.
- object StatefulAggregationStrategy extends Strategy
Used to plan streaming aggregation queries that are computed incrementally as part of a org.apache.spark.sql.streaming.StreamingQuery.
Used to plan streaming aggregation queries that are computed incrementally as part of a org.apache.spark.sql.streaming.StreamingQuery. Currently this rule is injected into the planner on-demand, only when planning in a org.apache.spark.sql.execution.streaming.StreamExecution
- object StreamingDeduplicationStrategy extends Strategy
Used to plan the streaming deduplicate operator.
- object StreamingJoinStrategy extends Strategy
- object StreamingRelationStrategy extends Strategy
This strategy is just for explaining
Dataset/DataFrame
created byspark.readStream
.This strategy is just for explaining
Dataset/DataFrame
created byspark.readStream
. It won't affect the execution, becauseStreamingRelation
will be replaced withStreamingExecutionRelation
inStreamingQueryManager
andStreamingExecutionRelation
will be replaced with the real relation using theSource
inStreamExecution
. - object Window extends Strategy