Packages

package bucketing

Ordering
  1. Alphabetic
Visibility
  1. Public
  2. Protected

Value Members

  1. object CoalesceBucketsInJoin extends Rule[SparkPlan]

    This rule coalesces one side of the SortMergeJoin and ShuffledHashJoin if the following conditions are met:

    This rule coalesces one side of the SortMergeJoin and ShuffledHashJoin if the following conditions are met:

    • Two bucketed tables are joined.
    • Join keys match with output partition expressions on their respective sides.
    • The larger bucket number is divisible by the smaller bucket number.
    • COALESCE_BUCKETS_IN_JOIN_ENABLED is set to true.
    • The ratio of the number of buckets is less than the value set in COALESCE_BUCKETS_IN_JOIN_MAX_BUCKET_RATIO.
  2. object DisableUnnecessaryBucketedScan extends Rule[SparkPlan]

    Disable unnecessary bucketed table scan based on actual physical query plan.

    Disable unnecessary bucketed table scan based on actual physical query plan. NOTE: this rule is designed to be applied right after EnsureRequirements, where all ShuffleExchangeExec and SortExec have been added to plan properly.

    When BUCKETING_ENABLED and AUTO_BUCKETED_SCAN_ENABLED are set to true, go through query plan to check where bucketed table scan is unnecessary, and disable bucketed table scan if:

    1. The sub-plan from root to bucketed table scan, does not contain hasInterestingPartition operator.

    2. The sub-plan from the nearest downstream hasInterestingPartition operator to the bucketed table scan, contains only isAllowedUnaryExecNode operators and at least one Exchange.

    Examples: 1. no hasInterestingPartition operator: Project | Filter | Scan(t1: i, j) (bucketed on column j, DISABLE bucketed scan)

    2. join: SortMergeJoin(t1.i = t2.j) / \ Sort(i) Sort(j) / \ Shuffle(i) Scan(t2: i, j) / (bucketed on column j, enable bucketed scan) Scan(t1: i, j) (bucketed on column j, DISABLE bucketed scan)

    3. aggregate: HashAggregate(i, ..., Final) | Shuffle(i) | HashAggregate(i, ..., Partial) | Filter | Scan(t1: i, j) (bucketed on column j, DISABLE bucketed scan)

    The idea of hasInterestingPartition is inspired from "interesting order" in the paper "Access Path Selection in a Relational Database Management System" (https://dl.acm.org/doi/10.1145/582095.582099).

  3. object ExtractJoinWithBuckets

    An extractor that extracts SortMergeJoinExec and ShuffledHashJoin, where both sides of the join have the bucketed tables, are consisted of only the scan operation, and numbers of buckets are not equal but divisible.

Ungrouped