Packages

object JoinSelection extends Strategy with PredicateHelper

Select the proper physical plan for join based on join strategy hints, the availability of equi-join keys and the sizes of joining relations. Below are the existing join strategies, their characteristics and their limitations.

- Broadcast hash join (BHJ): Only supported for equi-joins, while the join keys do not need to be sortable. Supported for all join types except full outer joins. BHJ usually performs faster than the other join algorithms when the broadcast side is small. However, broadcasting tables is a network-intensive operation and it could cause OOM or perform badly in some cases, especially when the build/broadcast side is big.

- Shuffle hash join: Only supported for equi-joins, while the join keys do not need to be sortable. Supported for all join types except full outer joins.

- Shuffle sort merge join (SMJ): Only supported for equi-joins and the join keys have to be sortable. Supported for all join types.

- Broadcast nested loop join (BNLJ): Supports both equi-joins and non-equi-joins. Supports all the join types, but the implementation is optimized for: 1) broadcasting the left side in a right outer join; 2) broadcasting the right side in a left outer, left semi, left anti or existence join; 3) broadcasting either side in an inner-like join. For other cases, we need to scan the data multiple times, which can be rather slow.

- Shuffle-and-replicate nested loop join (a.k.a. cartesian product join): Supports both equi-joins and non-equi-joins. Supports only inner like joins.

Linear Supertypes
PredicateHelper, SparkStrategy, GenericStrategy[SparkPlan], Logging, AnyRef, Any
Ordering
  1. Alphabetic
  2. By Inheritance
Inherited
  1. JoinSelection
  2. PredicateHelper
  3. SparkStrategy
  4. GenericStrategy
  5. Logging
  6. AnyRef
  7. Any
  1. Hide All
  2. Show All
Visibility
  1. Public
  2. All

Value Members

  1. def apply(plan: LogicalPlan): Seq[SparkPlan]
    Definition Classes
    JoinSelection → GenericStrategy
  2. def findExpressionAndTrackLineageDown(exp: Expression, plan: LogicalPlan): Option[(Expression, LogicalPlan)]
    Definition Classes
    PredicateHelper