Packages

package adaptive

Ordering
  1. Alphabetic
Visibility
  1. Public
  2. All

Type Members

  1. case class AdaptiveExecutionContext(session: SparkSession, qe: QueryExecution) extends Product with Serializable

    The execution context shared between the main query and all sub-queries.

  2. case class AdaptiveSparkPlanExec(initialPlan: SparkPlan, context: AdaptiveExecutionContext, preprocessingRules: Seq[Rule[SparkPlan]], isSubquery: Boolean) extends SparkPlan with LeafExecNode with Product with Serializable

    A root node to execute the query plan adaptively.

    A root node to execute the query plan adaptively. It splits the query plan into independent stages and executes them in order according to their dependencies. The query stage materializes its output at the end. When one stage completes, the data statistics of the materialized output will be used to optimize the remainder of the query.

    To create query stages, we traverse the query tree bottom up. When we hit an exchange node, and if all the child query stages of this exchange node are materialized, we create a new query stage for this exchange node. The new stage is then materialized asynchronously once it is created.

    When one query stage finishes materialization, the rest query is re-optimized and planned based on the latest statistics provided by all materialized stages. Then we traverse the query plan again and create more stages if possible. After all stages have been materialized, we execute the rest of the plan.

  3. trait AdaptiveSparkPlanHelper extends AnyRef

    This class provides utility methods related to tree traversal of an AdaptiveSparkPlanExec plan.

    This class provides utility methods related to tree traversal of an AdaptiveSparkPlanExec plan. Unlike their counterparts in org.apache.spark.sql.catalyst.trees.TreeNode or org.apache.spark.sql.catalyst.plans.QueryPlan, these methods traverse down leaf nodes of adaptive plans, i.e., AdaptiveSparkPlanExec and QueryStageExec.

  4. case class BroadcastQueryStageExec(id: Int, plan: SparkPlan) extends QueryStageExec with Product with Serializable

    A broadcast query stage whose child is a BroadcastExchangeLike or ReusedExchangeExec.

  5. case class CoalesceShufflePartitions(session: SparkSession) extends Rule[SparkPlan] with Product with Serializable

    A rule to coalesce the shuffle partitions based on the map output statistics, which can avoid many small reduce tasks that hurt performance.

  6. trait Cost extends Ordered[Cost]

    Represents the cost of a plan.

  7. trait CostEvaluator extends AnyRef

    Evaluates the cost of a physical plan.

  8. case class CustomShuffleReaderExec extends SparkPlan with UnaryExecNode with Product with Serializable

    A wrapper of shuffle query stage, which follows the given partition arrangement.

  9. case class DemoteBroadcastHashJoin(conf: SQLConf) extends Rule[LogicalPlan] with Product with Serializable

    This optimization rule detects a join child that has a high ratio of empty partitions and adds a no-broadcast-hash-join hint to avoid it being broadcast.

  10. case class InsertAdaptiveSparkPlan(adaptiveExecutionContext: AdaptiveExecutionContext) extends Rule[SparkPlan] with Product with Serializable

    This rule wraps the query plan with an AdaptiveSparkPlanExec, which executes the query plan and re-optimize the plan during execution based on runtime data statistics.

    This rule wraps the query plan with an AdaptiveSparkPlanExec, which executes the query plan and re-optimize the plan during execution based on runtime data statistics.

    Note that this rule is stateful and thus should not be reused across query executions.

  11. case class LogicalQueryStage(logicalPlan: LogicalPlan, physicalPlan: SparkPlan) extends LeafNode with Product with Serializable

    The LogicalPlan wrapper for a QueryStageExec, or a snippet of physical plan containing a QueryStageExec, in which all ancestor nodes of the QueryStageExec are linked to the same logical node.

    The LogicalPlan wrapper for a QueryStageExec, or a snippet of physical plan containing a QueryStageExec, in which all ancestor nodes of the QueryStageExec are linked to the same logical node.

    For example, a logical Aggregate can be transformed into FinalAgg - Shuffle - PartialAgg, in which the Shuffle will be wrapped into a QueryStageExec, thus the LogicalQueryStage will have FinalAgg - QueryStageExec as its physical plan.

  12. case class OptimizeLocalShuffleReader(conf: SQLConf) extends Rule[SparkPlan] with Product with Serializable

    A rule to optimize the shuffle reader to local reader iff no additional shuffles will be introduced: 1.

    A rule to optimize the shuffle reader to local reader iff no additional shuffles will be introduced: 1. if the input plan is a shuffle, add local reader directly as we can never introduce extra shuffles in this case. 2. otherwise, add local reader to the probe side of broadcast hash join and then run EnsureRequirements to check whether additional shuffle introduced. If introduced, we will revert all the local readers.

  13. case class OptimizeSkewedJoin(conf: SQLConf) extends Rule[SparkPlan] with Product with Serializable

    A rule to optimize skewed joins to avoid straggler tasks whose share of data are significantly larger than those of the rest of the tasks.

    A rule to optimize skewed joins to avoid straggler tasks whose share of data are significantly larger than those of the rest of the tasks.

    The general idea is to divide each skew partition into smaller partitions and replicate its matching partition on the other side of the join so that they can run in parallel tasks. Note that when matching partitions from the left side and the right side both have skew, it will become a cartesian product of splits from left and right joining together.

    For example, assume the Sort-Merge join has 4 partitions: left: [L1, L2, L3, L4] right: [R1, R2, R3, R4]

    Let's say L2, L4 and R3, R4 are skewed, and each of them get split into 2 sub-partitions. This is scheduled to run 4 tasks at the beginning: (L1, R1), (L2, R2), (L3, R3), (L4, R4). This rule expands it to 9 tasks to increase parallelism: (L1, R1), (L2-1, R2), (L2-2, R2), (L3, R3-1), (L3, R3-2), (L4-1, R4-1), (L4-2, R4-1), (L4-1, R4-2), (L4-2, R4-2)

    Note that, when this rule is enabled, it also coalesces non-skewed partitions like CoalesceShufflePartitions does.

  14. case class PlanAdaptiveSubqueries(subqueryMap: Map[Long, SubqueryExec]) extends Rule[SparkPlan] with Product with Serializable
  15. abstract class QueryStageExec extends SparkPlan with LeafExecNode

    A query stage is an independent subgraph of the query plan.

    A query stage is an independent subgraph of the query plan. Query stage materializes its output before proceeding with further operators of the query plan. The data statistics of the materialized output can be used to optimize subsequent query stages.

    There are 2 kinds of query stages:

    1. Shuffle query stage. This stage materializes its output to shuffle files, and Spark launches another job to execute the further operators. 2. Broadcast query stage. This stage materializes its output to an array in driver JVM. Spark broadcasts the array before executing the further operators.
  16. case class ReuseAdaptiveSubquery(conf: SQLConf, reuseMap: TrieMap[SparkPlan, BaseSubqueryExec]) extends Rule[SparkPlan] with Product with Serializable
  17. case class ShuffleQueryStageExec(id: Int, plan: SparkPlan) extends QueryStageExec with Product with Serializable

    A shuffle query stage whose child is a ShuffleExchangeLike or ReusedExchangeExec.

  18. case class SimpleCost(value: Long) extends Cost with Product with Serializable

    A simple implementation of Cost, which takes a number of Long as the cost value.

  19. case class StageFailure(stage: QueryStageExec, error: Throwable) extends StageMaterializationEvent with Product with Serializable

    The materialization of a query stage hit an error and failed.

  20. sealed trait StageMaterializationEvent extends AnyRef

    The event type for stage materialization.

  21. case class StageSuccess(stage: QueryStageExec, result: Any) extends StageMaterializationEvent with Product with Serializable

    The materialization of a query stage completed with success.

Value Members

  1. object AdaptiveSparkPlanExec extends Serializable
  2. object BroadcastQueryStageExec extends Serializable
  3. object CoalesceShufflePartitions extends Serializable
  4. object LogicalQueryStageStrategy extends Strategy with PredicateHelper

    Strategy for plans containing LogicalQueryStage nodes: 1.

    Strategy for plans containing LogicalQueryStage nodes: 1. Transforms LogicalQueryStage to its corresponding physical plan that is either being executed or has already completed execution. 2. Transforms Join which has one child relation already planned and executed as a BroadcastQueryStageExec. This is to prevent reversing a broadcast stage into a shuffle stage in case of the larger join child relation finishes before the smaller relation. Note that this rule needs to applied before regular join strategies.

  5. object OptimizeLocalShuffleReader extends Serializable
  6. object ShufflePartitionsUtil extends Logging
  7. object SimpleCostEvaluator extends CostEvaluator

    A simple implementation of CostEvaluator, which counts the number of ShuffleExchangeLike nodes in the plan.

Ungrouped