Spark Project SQL 3.3.0 API < Back

Packages

package root
Definition Classes
root
package org
Definition Classes
root
package apache
Definition Classes
org
package spark
Definition Classes
apache
package sql
Allows the execution of relational queries, including those expressed in SQL using Spark.
Allows the execution of relational queries, including those expressed in SQL using Spark.
Definition Classes
spark
package execution
The physical execution component of Spark SQL.
The physical execution component of Spark SQL. Note that this is a private package. All classes in catalyst are considered an internal API to Spark SQL and are subject to change between minor releases.
Definition Classes
sql
package adaptive
Definition Classes
execution
package aggregate
Definition Classes
execution
package analysis
Definition Classes
execution
package arrow
Definition Classes
execution
package bucketing
Definition Classes
execution
CoalesceBucketsInJoin
DisableUnnecessaryBucketedScan
ExtractJoinWithBuckets
package columnar
Definition Classes
execution
package command
Definition Classes
execution
package datasources
Definition Classes
execution
package debug
Contains methods for debugging query execution.
Contains methods for debugging query execution.
Usage:
```
import org.apache.spark.sql.execution.debug._
sql("SELECT 1").debug()
sql("SELECT 1").debugCodegen()
```
or for streaming case (structured streaming):
```
import org.apache.spark.sql.execution.debug._
val query = df.writeStream.<...>.start()
query.debugCodegen()
```
Note that debug in structured streaming is not supported, because it doesn't make sense for streaming to execute batch once while main query is running concurrently.
Definition Classes
execution
package dynamicpruning
Definition Classes
execution
package exchange
Definition Classes
execution
package joins
Definition Classes
execution
package metric
Definition Classes
execution
package python
Definition Classes
execution
package r
Definition Classes
execution
package reuse
Definition Classes
execution
package stat
Definition Classes
execution
package streaming
Definition Classes
execution
package ui
Definition Classes
execution
package vectorized
Definition Classes
execution
package window
Definition Classes
execution

org.apache.spark.sql.execution

bucketing

package bucketing

Ordering

Alphabetic

Visibility

Public
Protected

Value Members

object CoalesceBucketsInJoin extends Rule[SparkPlan]
This rule coalesces one side of the SortMergeJoin and ShuffledHashJoin if the following conditions are met:
This rule coalesces one side of the SortMergeJoin and ShuffledHashJoin if the following conditions are met:
- Two bucketed tables are joined.
- Join keys match with output partition expressions on their respective sides.
- The larger bucket number is divisible by the smaller bucket number.
- COALESCE_BUCKETS_IN_JOIN_ENABLED is set to true.
- The ratio of the number of buckets is less than the value set in COALESCE_BUCKETS_IN_JOIN_MAX_BUCKET_RATIO.
object DisableUnnecessaryBucketedScan extends Rule[SparkPlan]
Disable unnecessary bucketed table scan based on actual physical query plan.
Disable unnecessary bucketed table scan based on actual physical query plan. NOTE: this rule is designed to be applied right after EnsureRequirements, where all ShuffleExchangeExec and SortExec have been added to plan properly.
When BUCKETING_ENABLED and AUTO_BUCKETED_SCAN_ENABLED are set to true, go through query plan to check where bucketed table scan is unnecessary, and disable bucketed table scan if:
1. The sub-plan from root to bucketed table scan, does not contain hasInterestingPartition operator.
2. The sub-plan from the nearest downstream hasInterestingPartition operator to the bucketed table scan, contains only isAllowedUnaryExecNode operators and at least one Exchange.
Examples: 1. no hasInterestingPartition operator: Project | Filter | Scan(t1: i, j) (bucketed on column j, DISABLE bucketed scan)
2. join: SortMergeJoin(t1.i = t2.j) / \ Sort(i) Sort(j) / \ Shuffle(i) Scan(t2: i, j) / (bucketed on column j, enable bucketed scan) Scan(t1: i, j) (bucketed on column j, DISABLE bucketed scan)
3. aggregate: HashAggregate(i, ..., Final) | Shuffle(i) | HashAggregate(i, ..., Partial) | Filter | Scan(t1: i, j) (bucketed on column j, DISABLE bucketed scan)
The idea of hasInterestingPartition is inspired from "interesting order" in the paper "Access Path Selection in a Relational Database Management System" (https://dl.acm.org/doi/10.1145/582095.582099).
object ExtractJoinWithBuckets
An extractor that extracts SortMergeJoinExec and ShuffledHashJoin, where both sides of the join have the bucketed tables, are consisted of only the scan operation, and numbers of buckets are not equal but divisible.

Packages

bucketing

package bucketing

Value Members

Ungrouped

bucketing