Package

org.apache.spark.sql.catalyst

optimizer

Permalink

package optimizer

Visibility
  1. Public
  2. All

Type Members

  1. case class CheckCartesianProducts(conf: SQLConf) extends Rule[LogicalPlan] with PredicateHelper with Product with Serializable

    Permalink

    Check if there any cartesian products between joins of any type in the optimized plan tree.

    Check if there any cartesian products between joins of any type in the optimized plan tree. Throw an error if a cartesian product is found without an explicit cross join specified. This rule is effectively disabled if the CROSS_JOINS_ENABLED flag is true.

    This rule must be run AFTER the ReorderJoin rule since the join conditions for each join must be collected before checking if it is a cartesian product. If you have SELECT * from R, S where R.r = S.s, the join between R and S is not a cartesian product and therefore should be allowed. The predicate R.r = S.s is not recognized as a join condition until the ReorderJoin rule.

  2. case class Cost(card: BigInt, size: BigInt) extends Product with Serializable

    Permalink

    This class defines the cost model for a plan.

    This class defines the cost model for a plan.

    card

    Cardinality (number of rows).

    size

    Size in bytes.

  3. case class CostBasedJoinReorder(conf: SQLConf) extends Rule[LogicalPlan] with PredicateHelper with Product with Serializable

    Permalink

    Cost-based join reorder.

    Cost-based join reorder. We may have several join reorder algorithms in the future. This class is the entry of these algorithms, and chooses which one to use.

  4. case class DecimalAggregates(conf: SQLConf) extends Rule[LogicalPlan] with Product with Serializable

    Permalink

    Speeds up aggregates on fixed-precision decimals by executing them on unscaled Long values.

    Speeds up aggregates on fixed-precision decimals by executing them on unscaled Long values.

    This uses the same rules for increasing the precision and scale of the output as org.apache.spark.sql.catalyst.analysis.DecimalPrecision.

  5. case class EliminateOuterJoin(conf: SQLConf) extends Rule[LogicalPlan] with PredicateHelper with Product with Serializable

    Permalink

    Elimination of outer joins, if the predicates can restrict the result sets so that all null-supplying rows are eliminated

    Elimination of outer joins, if the predicates can restrict the result sets so that all null-supplying rows are eliminated

    - full outer -> inner if both sides have such predicates - left outer -> inner if the right side has such predicates - right outer -> inner if the left side has such predicates - full outer -> left outer if only the left side has such predicates - full outer -> right outer if only the right side has such predicates

    This rule should be executed before pushing down the Filter

  6. case class GetCurrentDatabase(sessionCatalog: SessionCatalog) extends Rule[LogicalPlan] with Product with Serializable

    Permalink

    Replaces the expression of CurrentDatabase with the current database name.

  7. case class InferFiltersFromConstraints(conf: SQLConf) extends Rule[LogicalPlan] with PredicateHelper with Product with Serializable

    Permalink

    Generate a list of additional filters from an operator's existing constraint but remove those that are either already part of the operator's condition or are part of the operator's child constraints.

    Generate a list of additional filters from an operator's existing constraint but remove those that are either already part of the operator's condition or are part of the operator's child constraints. These filters are currently inserted to the existing conditions in the Filter operators and on either side of Join operators.

    Note: While this optimization is applicable to all types of join, it primarily benefits Inner and LeftSemi joins.

  8. case class JoinGraphInfo(starJoins: Set[Int], nonStarJoins: Set[Int]) extends Product with Serializable

    Permalink

    Helper class that keeps information about the join graph as sets of item/plan ids.

    Helper class that keeps information about the join graph as sets of item/plan ids. It currently stores the star/non-star plans. It can be extended with the set of connected/unconnected plans.

  9. case class LimitPushDown(conf: SQLConf) extends Rule[LogicalPlan] with Product with Serializable

    Permalink

    Pushes down LocalLimit beneath UNION ALL and beneath the streamed inputs of outer joins.

  10. case class NullPropagation(conf: SQLConf) extends Rule[LogicalPlan] with Product with Serializable

    Permalink

    Replaces Expressions that can be statically evaluated with equivalent Literal values.

    Replaces Expressions that can be statically evaluated with equivalent Literal values. This rule is more specific with Null value propagation from bottom to top of the expression tree.

  11. case class OptimizeCodegen(conf: SQLConf) extends Rule[LogicalPlan] with Product with Serializable

    Permalink

    Optimizes expressions by replacing according to CodeGen configuration.

  12. case class OptimizeIn(conf: SQLConf) extends Rule[LogicalPlan] with Product with Serializable

    Permalink

    Optimize IN predicates: 1.

    Optimize IN predicates: 1. Removes literal repetitions. 2. Replaces (value, seq[Literal]) with optimized version (value, HashSet[Literal]) which is much faster.

  13. abstract class Optimizer extends RuleExecutor[LogicalPlan]

    Permalink

    Abstract class all optimizers should inherit of, contains the standard batches (extending Optimizers can override this.

  14. case class OrderedJoin(left: LogicalPlan, right: LogicalPlan, joinType: JoinType, condition: Option[Expression]) extends BinaryNode with Product with Serializable

    Permalink

    This is a mimic class for a join node that has been ordered.

  15. case class PruneFilters(conf: SQLConf) extends Rule[LogicalPlan] with PredicateHelper with Product with Serializable

    Permalink

    Removes filters that can be evaluated trivially.

    Removes filters that can be evaluated trivially. This can be done through the following ways: 1) by eliding the filter for cases where it will always evaluate to true. 2) by substituting a dummy empty relation when the filter will always evaluate to false. 3) by eliminating the always-true conditions given the constraints on the child's output.

  16. case class ReorderJoin(conf: SQLConf) extends Rule[LogicalPlan] with PredicateHelper with Product with Serializable

    Permalink

    Reorder the joins and push all the conditions into join, so that the bottom ones have at least one condition.

    Reorder the joins and push all the conditions into join, so that the bottom ones have at least one condition.

    The order of joins will not be changed if all of them already have at least one condition.

    If star schema detection is enabled, reorder the star join plans based on heuristics.

  17. class SimpleTestOptimizer extends Optimizer

    Permalink
  18. case class StarSchemaDetection(conf: SQLConf) extends PredicateHelper with Product with Serializable

    Permalink

    Encapsulates star-schema detection logic.

Value Members

  1. object BooleanSimplification extends Rule[LogicalPlan] with PredicateHelper

    Permalink

    Simplifies boolean expressions: 1.

    Simplifies boolean expressions: 1. Simplifies expressions whose answer can be determined without evaluating both sides. 2. Eliminates / extracts common factors. 3. Merge same expressions 4. Removes Not operator.

  2. object CollapseProject extends Rule[LogicalPlan]

    Permalink

    Combines two adjacent Project operators into one and perform alias substitution, merging the expressions into one single expression.

  3. object CollapseRepartition extends Rule[LogicalPlan]

    Permalink

    Combines adjacent RepartitionOperation operators

  4. object CollapseWindow extends Rule[LogicalPlan]

    Permalink

    Collapse Adjacent Window Expression.

    Collapse Adjacent Window Expression. - If the partition specs and order specs are the same and the window expression are independent, collapse into the parent.

  5. object ColumnPruning extends Rule[LogicalPlan]

    Permalink

    Attempts to eliminate the reading of unneeded columns from the query plan.

    Attempts to eliminate the reading of unneeded columns from the query plan.

    Since adding Project before Filter conflicts with PushPredicatesThroughProject, this rule will remove the Project p2 in the following pattern:

    p1 @ Project(_, Filter(_, p2 @ Project(_, child))) if p2.outputSet.subsetOf(p2.inputSet)

    p2 is usually inserted by this rule and useless, p1 could prune the columns anyway.

  6. object CombineFilters extends Rule[LogicalPlan] with PredicateHelper

    Permalink

    Combines two adjacent Filter operators into one, merging the non-redundant conditions into one conjunctive predicate.

  7. object CombineLimits extends Rule[LogicalPlan]

    Permalink

    Combines two adjacent Limit operators into one, merging the expressions into one single expression.

  8. object CombineTypedFilters extends Rule[LogicalPlan]

    Permalink

    Combines two adjacent TypedFilters, which operate on same type object in condition, into one, merging the filter functions into one conjunctive function.

  9. object CombineUnions extends Rule[LogicalPlan]

    Permalink

    Combines all adjacent Union operators into a single Union.

  10. object ComputeCurrentTime extends Rule[LogicalPlan]

    Permalink

    Computes the current date and time to make sure we return the same result in a single query.

  11. object ConstantFolding extends Rule[LogicalPlan]

    Permalink

    Replaces Expressions that can be statically evaluated with equivalent Literal values.

  12. object ConvertToLocalRelation extends Rule[LogicalPlan]

    Permalink

    Converts local operations (i.e.

    Converts local operations (i.e. ones that don't require data exchange) on LocalRelation to another LocalRelation.

    This is relatively simple as it currently handles only a single case: Project.

  13. object EliminateMapObjects extends Rule[LogicalPlan]

    Permalink

    Removes MapObjects when the following conditions are satisfied

    Removes MapObjects when the following conditions are satisfied

    1. Mapobject(... lambdavariable(..., false) ...), which means types for input and output are primitive types with non-nullable 2. no custom collection class specified representation of data item.
  14. object EliminateSerialization extends Rule[LogicalPlan]

    Permalink

    Removes cases where we are unnecessarily going between the object and serialized (InternalRow) representation of data item.

    Removes cases where we are unnecessarily going between the object and serialized (InternalRow) representation of data item. For example back to back map operations.

  15. object EliminateSorts extends Rule[LogicalPlan]

    Permalink

    Removes no-op SortOrder from Sort

  16. object FoldablePropagation extends Rule[LogicalPlan]

    Permalink

    Propagate foldable expressions: Replace attributes with aliases of the original foldable expressions if possible.

    Propagate foldable expressions: Replace attributes with aliases of the original foldable expressions if possible. Other optimizations will take advantage of the propagated foldable expressions.

    SELECT 1.0 x, 'abc' y, Now() z ORDER BY x, y, 3
    ==>  SELECT 1.0 x, 'abc' y, Now() z ORDER BY 1.0, 'abc', Now()
  17. object JoinReorderDP extends PredicateHelper with Logging

    Permalink

    Reorder the joins using a dynamic programming algorithm.

    Reorder the joins using a dynamic programming algorithm. This implementation is based on the paper: Access Path Selection in a Relational Database Management System. http://www.inf.ed.ac.uk/teaching/courses/adbs/AccessPath.pdf

    First we put all items (basic joined nodes) into level 0, then we build all two-way joins at level 1 from plans at level 0 (single items), then build all 3-way joins from plans at previous levels (two-way joins and single items), then 4-way joins ... etc, until we build all n-way joins and pick the best plan among them.

    When building m-way joins, we only keep the best plan (with the lowest cost) for the same set of m items. E.g., for 3-way joins, we keep only the best plan for items {A, B, C} among plans (A J B) J C, (A J C) J B and (B J C) J A. We also prune cartesian product candidates when building a new plan if there exists no join condition involving references from both left and right. This pruning strategy significantly reduces the search space. E.g., given A J B J C J D with join conditions A.k1 = B.k1 and B.k2 = C.k2 and C.k3 = D.k3, plans maintained for each level are as follows: level 0: p({A}), p({B}), p({C}), p({D}) level 1: p({A, B}), p({B, C}), p({C, D}) level 2: p({A, B, C}), p({B, C, D}) level 3: p({A, B, C, D}) where p({A, B, C, D}) is the final output plan.

    For cost evaluation, since physical costs for operators are not available currently, we use cardinalities and sizes to compute costs.

  18. object JoinReorderDPFilters extends PredicateHelper

    Permalink

    Implements optional filters to reduce the search space for join enumeration.

    Implements optional filters to reduce the search space for join enumeration.

    1) Star-join filters: Plan star-joins together since they are assumed to have an optimal execution based on their RI relationship. 2) Cartesian products: Defer their planning later in the graph to avoid large intermediate results (expanding joins, in general). 3) Composite inners: Don't generate "bushy tree" plans to avoid materializing intermediate results.

    Filters (2) and (3) are not implemented.

  19. object LikeSimplification extends Rule[LogicalPlan]

    Permalink

    Simplifies LIKE expressions that do not need full regular expressions to evaluate the condition.

    Simplifies LIKE expressions that do not need full regular expressions to evaluate the condition. For example, when the expression is just checking to see if a string starts with a given pattern.

  20. object PropagateEmptyRelation extends Rule[LogicalPlan] with PredicateHelper

    Permalink

    Collapse plans consisting empty local relations generated by PruneFilters.

    Collapse plans consisting empty local relations generated by PruneFilters. 1. Binary(or Higher)-node Logical Plans

    • Union with all empty children.
    • Join with one or two empty children (including Intersect/Except). 2. Unary-node Logical Plans
    • Project/Filter/Sample/Join/Limit/Repartition with all empty children.
    • Aggregate with all empty children and at least one grouping expression.
    • Generate(Explode) with all empty children. Others like Hive UDTF may return results.
  21. object PullupCorrelatedPredicates extends Rule[LogicalPlan] with PredicateHelper

    Permalink

    Pull out all (outer) correlated predicates from a given subquery.

    Pull out all (outer) correlated predicates from a given subquery. This method removes the correlated predicates from subquery Filters and adds the references of these predicates to all intermediate Project and Aggregate clauses (if they are missing) in order to be able to evaluate the predicates at the top level.

    TODO: Look to merge this rule with RewritePredicateSubquery.

  22. object PushDownPredicate extends Rule[LogicalPlan] with PredicateHelper

    Permalink

    Pushes Filter operators through many operators iff: 1) the operator is deterministic 2) the predicate is deterministic and the operator will not change any of rows.

    Pushes Filter operators through many operators iff: 1) the operator is deterministic 2) the predicate is deterministic and the operator will not change any of rows.

    This heuristic is valid assuming the expression evaluation cost is minimal.

  23. object PushPredicateThroughJoin extends Rule[LogicalPlan] with PredicateHelper

    Permalink

    Pushes down Filter operators where the condition can be evaluated using only the attributes of the left or right side of a join.

    Pushes down Filter operators where the condition can be evaluated using only the attributes of the left or right side of a join. Other Filter conditions are moved into the condition of the Join.

    And also pushes down the join filter, where the condition can be evaluated using only the attributes of the left or right side of sub query when applicable.

    Check https://cwiki.apache.org/confluence/display/Hive/OuterJoinBehavior for more details

  24. object PushProjectionThroughUnion extends Rule[LogicalPlan] with PredicateHelper

    Permalink

    Pushes Project operator to both sides of a Union operator.

    Pushes Project operator to both sides of a Union operator. Operations that are safe to pushdown are listed as follows. Union: Right now, Union means UNION ALL, which does not de-duplicate rows. So, it is safe to pushdown Filters and Projections through it. Filter pushdown is handled by another rule PushDownPredicate. Once we add UNION DISTINCT, we will not be able to pushdown Projections.

  25. object RemoveDispensableExpressions extends Rule[LogicalPlan]

    Permalink

    Removes nodes that are not necessary.

  26. object RemoveLiteralFromGroupExpressions extends Rule[LogicalPlan]

    Permalink

    Removes literals from group expressions in Aggregate, as they have no effect to the result but only makes the grouping key bigger.

  27. object RemoveRedundantAliases extends Rule[LogicalPlan]

    Permalink

    Remove redundant aliases from a query plan.

    Remove redundant aliases from a query plan. A redundant alias is an alias that does not change the name or metadata of a column, and does not deduplicate it.

  28. object RemoveRedundantProject extends Rule[LogicalPlan]

    Permalink

    Remove projections from the query plan that do not make any modifications.

  29. object RemoveRepetitionFromGroupExpressions extends Rule[LogicalPlan]

    Permalink

    Removes repetition from group expressions in Aggregate, as they have no effect to the result but only makes the grouping key bigger.

  30. object ReorderAssociativeOperator extends Rule[LogicalPlan]

    Permalink

    Reorder associative integral-type operators and fold all constants into one.

  31. object ReplaceDeduplicateWithAggregate extends Rule[LogicalPlan]

    Permalink

    Replaces logical Deduplicate operator with an Aggregate operator.

  32. object ReplaceDistinctWithAggregate extends Rule[LogicalPlan]

    Permalink

    Replaces logical Distinct operator with an Aggregate operator.

    Replaces logical Distinct operator with an Aggregate operator.

    SELECT DISTINCT f1, f2 FROM t  ==>  SELECT f1, f2 FROM t GROUP BY f1, f2
  33. object ReplaceExceptWithAntiJoin extends Rule[LogicalPlan]

    Permalink

    Replaces logical Except operator with a left-anti Join operator.

    Replaces logical Except operator with a left-anti Join operator.

    SELECT a1, a2 FROM Tab1 EXCEPT SELECT b1, b2 FROM Tab2
    ==>  SELECT DISTINCT a1, a2 FROM Tab1 LEFT ANTI JOIN Tab2 ON a1<=>b1 AND a2<=>b2

    Note: 1. This rule is only applicable to EXCEPT DISTINCT. Do not use it for EXCEPT ALL. 2. This rule has to be done after de-duplicating the attributes; otherwise, the generated join conditions will be incorrect.

  34. object ReplaceExpressions extends Rule[LogicalPlan]

    Permalink

    Finds all RuntimeReplaceable expressions and replace them with the expressions that can be evaluated.

    Finds all RuntimeReplaceable expressions and replace them with the expressions that can be evaluated. This is mainly used to provide compatibility with other databases. For example, we use this to support "nvl" by replacing it with "coalesce".

  35. object ReplaceIntersectWithSemiJoin extends Rule[LogicalPlan]

    Permalink

    Replaces logical Intersect operator with a left-semi Join operator.

    Replaces logical Intersect operator with a left-semi Join operator.

    SELECT a1, a2 FROM Tab1 INTERSECT SELECT b1, b2 FROM Tab2
    ==>  SELECT DISTINCT a1, a2 FROM Tab1 LEFT SEMI JOIN Tab2 ON a1<=>b1 AND a2<=>b2

    Note: 1. This rule is only applicable to INTERSECT DISTINCT. Do not use it for INTERSECT ALL. 2. This rule has to be done after de-duplicating the attributes; otherwise, the generated join conditions will be incorrect.

  36. object RewriteCorrelatedScalarSubquery extends Rule[LogicalPlan]

    Permalink

    This rule rewrites correlated ScalarSubquery expressions into LEFT OUTER joins.

  37. object RewriteDistinctAggregates extends Rule[LogicalPlan]

    Permalink

    This rule rewrites an aggregate query with distinct aggregations into an expanded double aggregation in which the regular aggregation expressions and every distinct clause is aggregated in a separate group.

    This rule rewrites an aggregate query with distinct aggregations into an expanded double aggregation in which the regular aggregation expressions and every distinct clause is aggregated in a separate group. The results are then combined in a second aggregate.

    For example (in scala):

    val data = Seq(
      ("a", "ca1", "cb1", 10),
      ("a", "ca1", "cb2", 5),
      ("b", "ca1", "cb1", 13))
      .toDF("key", "cat1", "cat2", "value")
    data.createOrReplaceTempView("data")
    
    val agg = data.groupBy($"key")
      .agg(
        countDistinct($"cat1").as("cat1_cnt"),
        countDistinct($"cat2").as("cat2_cnt"),
        sum($"value").as("total"))

    This translates to the following (pseudo) logical plan:

    Aggregate(
       key = ['key]
       functions = [COUNT(DISTINCT 'cat1),
                    COUNT(DISTINCT 'cat2),
                    sum('value)]
       output = ['key, 'cat1_cnt, 'cat2_cnt, 'total])
      LocalTableScan [...]

    This rule rewrites this logical plan to the following (pseudo) logical plan:

    Aggregate(
       key = ['key]
       functions = [count(if (('gid = 1)) 'cat1 else null),
                    count(if (('gid = 2)) 'cat2 else null),
                    first(if (('gid = 0)) 'total else null) ignore nulls]
       output = ['key, 'cat1_cnt, 'cat2_cnt, 'total])
      Aggregate(
         key = ['key, 'cat1, 'cat2, 'gid]
         functions = [sum('value)]
         output = ['key, 'cat1, 'cat2, 'gid, 'total])
        Expand(
           projections = [('key, null, null, 0, cast('value as bigint)),
                          ('key, 'cat1, null, 1, null),
                          ('key, null, 'cat2, 2, null)]
           output = ['key, 'cat1, 'cat2, 'gid, 'value])
          LocalTableScan [...]

    The rule does the following things here: 1. Expand the data. There are three aggregation groups in this query:

    1. the non-distinct group; ii. the distinct 'cat1 group; iii. the distinct 'cat2 group. An expand operator is inserted to expand the child data for each group. The expand will null out all unused columns for the given group; this must be done in order to ensure correctness later on. Groups can by identified by a group id (gid) column added by the expand operator. 2. De-duplicate the distinct paths and aggregate the non-aggregate path. The group by clause of this aggregate consists of the original group by clause, all the requested distinct columns and the group id. Both de-duplication of distinct column and the aggregation of the non-distinct group take advantage of the fact that we group by the group id (gid) and that we have nulled out all non-relevant columns the given group. 3. Aggregating the distinct groups and combining this with the results of the non-distinct aggregation. In this step we use the group id to filter the inputs for the aggregate functions. The result of the non-distinct group are 'aggregated' by using the first operator, it might be more elegant to use the native UDAF merge mechanism for this in the future.

    This rule duplicates the input data by two or more times (# distinct groups + an optional non-distinct group). This will put quite a bit of memory pressure of the used aggregate and exchange operators. Keeping the number of distinct groups as low a possible should be priority, we could improve this in the current rule by applying more advanced expression canonicalization techniques.

  38. object RewritePredicateSubquery extends Rule[LogicalPlan] with PredicateHelper

    Permalink

    This rule rewrites predicate sub-queries into left semi/anti joins.

    This rule rewrites predicate sub-queries into left semi/anti joins. The following predicates are supported: a. EXISTS/NOT EXISTS will be rewritten as semi/anti join, unresolved conditions in Filter will be pulled out as the join conditions. b. IN/NOT IN will be rewritten as semi/anti join, unresolved conditions in the Filter will be pulled out as join conditions, value = selected column will also be used as join condition.

  39. object SimpleTestOptimizer extends SimpleTestOptimizer

    Permalink

    An optimizer used in test code.

    An optimizer used in test code.

    To ensure extendability, we leave the standard rules in the abstract optimizer rules, while specific rules go to the subclasses

  40. object SimplifyBinaryComparison extends Rule[LogicalPlan] with PredicateHelper

    Permalink

    Simplifies binary comparisons with semantically-equal expressions: 1) Replace '<=>' with 'true' literal.

    Simplifies binary comparisons with semantically-equal expressions: 1) Replace '<=>' with 'true' literal. 2) Replace '=', '<=', and '>=' with 'true' literal if both operands are non-nullable. 3) Replace '<' and '>' with 'false' literal if both operands are non-nullable.

  41. object SimplifyCaseConversionExpressions extends Rule[LogicalPlan]

    Permalink

    Removes the inner case conversion expressions that are unnecessary because the inner conversion is overwritten by the outer one.

  42. object SimplifyCasts extends Rule[LogicalPlan]

    Permalink

    Removes Casts that are unnecessary because the input is already the correct type.

  43. object SimplifyConditionals extends Rule[LogicalPlan] with PredicateHelper

    Permalink

    Simplifies conditional expressions (if / case).

  44. object SimplifyCreateArrayOps extends Rule[LogicalPlan]

    Permalink

    push down operations into CreateArray.

  45. object SimplifyCreateMapOps extends Rule[LogicalPlan]

    Permalink

    push down operations into CreateMap.

  46. object SimplifyCreateStructOps extends Rule[LogicalPlan]

    Permalink

    push down operations into CreateNamedStructLike.

Ungrouped