class SparkPlanner extends SparkStrategies with SQLConfHelper
- Alphabetic
- By Inheritance
- SparkPlanner
- SQLConfHelper
- SparkStrategies
- QueryPlanner
- AnyRef
- Any
- Hide All
- Show All
- Public
- Protected
Instance Constructors
- new SparkPlanner(session: SparkSession, experimentalMethods: ExperimentalMethods)
Type Members
- case class StreamingGlobalLimitStrategy(outputMode: OutputMode) extends Strategy with Product with Serializable
Used to plan the streaming global limit operator for streams in append mode.
Used to plan the streaming global limit operator for streams in append mode. We need to check for either a direct Limit or a Limit wrapped in a ReturnAnswer operator, following the example of the SpecialLimits Strategy above.
- Definition Classes
- SparkStrategies
Value Members
- final def !=(arg0: Any): Boolean
- Definition Classes
- AnyRef → Any
- final def ##: Int
- Definition Classes
- AnyRef → Any
- final def ==(arg0: Any): Boolean
- Definition Classes
- AnyRef → Any
- final def asInstanceOf[T0]: T0
- Definition Classes
- Any
- def clone(): AnyRef
- Attributes
- protected[lang]
- Definition Classes
- AnyRef
- Annotations
- @throws(classOf[java.lang.CloneNotSupportedException]) @native()
- def collectPlaceholders(plan: SparkPlan): Seq[(SparkPlan, LogicalPlan)]
- Attributes
- protected
- Definition Classes
- SparkPlanner → QueryPlanner
- def conf: SQLConf
- Definition Classes
- SQLConfHelper
- final def eq(arg0: AnyRef): Boolean
- Definition Classes
- AnyRef
- def equals(arg0: AnyRef): Boolean
- Definition Classes
- AnyRef → Any
- val experimentalMethods: ExperimentalMethods
- def extraPlanningStrategies: Seq[Strategy]
Override to add extra planning strategies to the planner.
Override to add extra planning strategies to the planner. These strategies are tried after the strategies defined in ExperimentalMethods, and before the regular strategies.
- def finalize(): Unit
- Attributes
- protected[lang]
- Definition Classes
- AnyRef
- Annotations
- @throws(classOf[java.lang.Throwable])
- final def getClass(): Class[_ <: AnyRef]
- Definition Classes
- AnyRef → Any
- Annotations
- @native()
- def hashCode(): Int
- Definition Classes
- AnyRef → Any
- Annotations
- @native()
- final def isInstanceOf[T0]: Boolean
- Definition Classes
- Any
- final def ne(arg0: AnyRef): Boolean
- Definition Classes
- AnyRef
- final def notify(): Unit
- Definition Classes
- AnyRef
- Annotations
- @native()
- final def notifyAll(): Unit
- Definition Classes
- AnyRef
- Annotations
- @native()
- def numPartitions: Int
- def plan(plan: LogicalPlan): Iterator[SparkPlan]
- Definition Classes
- SparkStrategies → QueryPlanner
- def pruneFilterProject(projectList: Seq[NamedExpression], filterPredicates: Seq[Expression], prunePushedDownFilters: (Seq[Expression]) => Seq[Expression], scanBuilder: (Seq[Attribute]) => SparkPlan): SparkPlan
Used to build table scan operators where complex projection and filtering are done using separate physical operators.
Used to build table scan operators where complex projection and filtering are done using separate physical operators. This function returns the given scan operator with Project and Filter nodes added only when needed. For example, a Project operator is only used when the final desired output requires complex expressions to be evaluated or when columns can be further eliminated out after filtering has been done.
The
prunePushedDownFilters
parameter is used to remove those filters that can be optimized away by the filter pushdown optimization.The required attributes for both filtering and expression evaluation are passed to the provided
scanBuilder
function so that it can avoid unnecessary column materialization. - def prunePlans(plans: Iterator[SparkPlan]): Iterator[SparkPlan]
- Attributes
- protected
- Definition Classes
- SparkPlanner → QueryPlanner
- val session: SparkSession
- lazy val singleRowRdd: RDD[InternalRow]
- Attributes
- protected
- Definition Classes
- SparkStrategies
- def strategies: Seq[Strategy]
- Definition Classes
- SparkPlanner → QueryPlanner
- final def synchronized[T0](arg0: => T0): T0
- Definition Classes
- AnyRef
- def toString(): String
- Definition Classes
- AnyRef → Any
- final def wait(): Unit
- Definition Classes
- AnyRef
- Annotations
- @throws(classOf[java.lang.InterruptedException])
- final def wait(arg0: Long, arg1: Int): Unit
- Definition Classes
- AnyRef
- Annotations
- @throws(classOf[java.lang.InterruptedException])
- final def wait(arg0: Long): Unit
- Definition Classes
- AnyRef
- Annotations
- @throws(classOf[java.lang.InterruptedException]) @native()
- object Aggregation extends Strategy
Used to plan the aggregate operator for expressions based on the AggregateFunction2 interface.
Used to plan the aggregate operator for expressions based on the AggregateFunction2 interface.
- Definition Classes
- SparkStrategies
- object BasicOperators extends Strategy
- Definition Classes
- SparkStrategies
- object FlatMapGroupsWithStateStrategy extends Strategy
Strategy to convert FlatMapGroupsWithState logical operator to physical operator in streaming plans.
Strategy to convert FlatMapGroupsWithState logical operator to physical operator in streaming plans. Conversion for batch plans is handled by BasicOperators.
- Definition Classes
- SparkStrategies
- object InMemoryScans extends Strategy
- Definition Classes
- SparkStrategies
- object JoinSelection extends Strategy with PredicateHelper with JoinSelectionHelper
Select the proper physical plan for join based on join strategy hints, the availability of equi-join keys and the sizes of joining relations.
Select the proper physical plan for join based on join strategy hints, the availability of equi-join keys and the sizes of joining relations. Below are the existing join strategies, their characteristics and their limitations.
- Broadcast hash join (BHJ): Only supported for equi-joins, while the join keys do not need to be sortable. Supported for all join types except full outer joins. BHJ usually performs faster than the other join algorithms when the broadcast side is small. However, broadcasting tables is a network-intensive operation and it could cause OOM or perform badly in some cases, especially when the build/broadcast side is big.
- Shuffle hash join: Only supported for equi-joins, while the join keys do not need to be sortable. Supported for all join types. Building hash map from table is a memory-intensive operation and it could cause OOM when the build side is big.
- Shuffle sort merge join (SMJ): Only supported for equi-joins and the join keys have to be sortable. Supported for all join types.
- Broadcast nested loop join (BNLJ): Supports both equi-joins and non-equi-joins. Supports all the join types, but the implementation is optimized for: 1) broadcasting the left side in a right outer join; 2) broadcasting the right side in a left outer, left semi, left anti or existence join; 3) broadcasting either side in an inner-like join. For other cases, we need to scan the data multiple times, which can be rather slow.
- Shuffle-and-replicate nested loop join (a.k.a. cartesian product join): Supports both equi-joins and non-equi-joins. Supports only inner like joins.
- Definition Classes
- SparkStrategies
- object PythonEvals extends Strategy
Strategy to convert EvalPython logical operator to physical operator.
Strategy to convert EvalPython logical operator to physical operator.
- Definition Classes
- SparkStrategies
- object SparkScripts extends Strategy
- Definition Classes
- SparkStrategies
- object SpecialLimits extends Strategy
Plans special cases of limit operators.
Plans special cases of limit operators.
- Definition Classes
- SparkStrategies
- object StatefulAggregationStrategy extends Strategy
Used to plan streaming aggregation queries that are computed incrementally as part of a org.apache.spark.sql.streaming.StreamingQuery.
Used to plan streaming aggregation queries that are computed incrementally as part of a org.apache.spark.sql.streaming.StreamingQuery. Currently this rule is injected into the planner on-demand, only when planning in a org.apache.spark.sql.execution.streaming.StreamExecution
- Definition Classes
- SparkStrategies
- object StreamingDeduplicationStrategy extends Strategy
Used to plan the streaming deduplicate operator.
Used to plan the streaming deduplicate operator.
- Definition Classes
- SparkStrategies
- object StreamingJoinStrategy extends Strategy
- Definition Classes
- SparkStrategies
- object StreamingRelationStrategy extends Strategy
This strategy is just for explaining
Dataset/DataFrame
created byspark.readStream
.This strategy is just for explaining
Dataset/DataFrame
created byspark.readStream
. It won't affect the execution, becauseStreamingRelation
will be replaced withStreamingExecutionRelation
inStreamingQueryManager
andStreamingExecutionRelation
will be replaced with the real relation using theSource
inStreamExecution
.- Definition Classes
- SparkStrategies
- object Window extends Strategy
- Definition Classes
- SparkStrategies