Replaces the expression of CurrentDatabase with the current database name.
Optimizes expressions by replacing according to CodeGen configuration.
Replaces (value, seq[Literal]) with optimized version(value, HashSet[Literal]) which is much faster
Abstract class all optimizers should inherit of, contains the standard batches (extending Optimizers can override this.
Simplifies boolean expressions: 1.
Combines two adjacent Project operators into one and perform alias substitution, merging the expressions into one single expression.
Combines adjacent Repartition operators by keeping only the last one.
Attempts to eliminate the reading of unneeded columns from the query plan.
Attempts to eliminate the reading of unneeded columns from the query plan.
Since adding Project before Filter conflicts with PushPredicatesThroughProject, this rule will remove the Project p2 in the following pattern:
p1 @ Project(_, Filter(_, p2 @ Project(_, child))) if p2.outputSet.subsetOf(p2.inputSet)
p2 is usually inserted by this rule and useless, p1 could prune the columns anyway.
Combines two adjacent Filter operators into one, merging the non-redundant conditions into one conjunctive predicate.
Combines two adjacent Limit operators into one, merging the expressions into one single expression.
Combines all adjacent Union operators into a single Union.
Computes the current date and time to make sure we return the same result in a single query.
Replaces Expressions that can be statically evaluated with equivalent Literal values.
Converts local operations (i.e.
Converts local operations (i.e. ones that don't require data exchange) on LocalRelation to another LocalRelation.
This is relatively simple as it currently handles only a single case: Project.
Speeds up aggregates on fixed-precision decimals by executing them on unscaled Long values.
Speeds up aggregates on fixed-precision decimals by executing them on unscaled Long values.
This uses the same rules for increasing the precision and scale of the output as org.apache.spark.sql.catalyst.analysis.DecimalPrecision.
Elimination of outer joins, if the predicates can restrict the result sets so that all null-supplying rows are eliminated
Elimination of outer joins, if the predicates can restrict the result sets so that all null-supplying rows are eliminated
- full outer -> inner if both sides have such predicates - left outer -> inner if the right side has such predicates - right outer -> inner if the left side has such predicates - full outer -> left outer if only the left side has such predicates - full outer -> right outer if only the right side has such predicates
This rule should be executed before pushing down the Filter
Removes cases where we are unnecessarily going between the object and serialized (InternalRow) representation of data item.
Removes cases where we are unnecessarily going between the object and serialized (InternalRow) representation of data item. For example back to back map operations.
Removes no-op SortOrder from Sort
Typed Filter is by default surrounded by a DeserializeToObject beneath it and a SerializeFromObject above it.
Typed Filter is by default surrounded by a DeserializeToObject beneath it and a SerializeFromObject above it. If these serializations can't be eliminated, we should embed the deserializer in filter condition to save the extra serialization at last.
Propagate foldable expressions: Replace attributes with aliases of the original foldable expressions if possible.
Propagate foldable expressions: Replace attributes with aliases of the original foldable expressions if possible. Other optimizations will take advantage of the propagated foldable expressions.
SELECT 1.0 x, 'abc' y, Now() z ORDER BY x, y, 3 ==> SELECT 1.0 x, 'abc' y, Now() z ORDER BY 1.0, 'abc', Now()
Generate a list of additional filters from an operator's existing constraint but remove those that are either already part of the operator's condition or are part of the operator's child constraints.
Generate a list of additional filters from an operator's existing constraint but remove those that are either already part of the operator's condition or are part of the operator's child constraints. These filters are currently inserted to the existing conditions in the Filter operators and on either side of Join operators.
Note: While this optimization is applicable to all types of join, it primarily benefits Inner and LeftSemi joins.
Simplifies LIKE expressions that do not need full regular expressions to evaluate the condition.
Simplifies LIKE expressions that do not need full regular expressions to evaluate the condition. For example, when the expression is just checking to see if a string starts with a given pattern.
Pushes down LocalLimit beneath UNION ALL and beneath the streamed inputs of outer joins.
Replaces Expressions that can be statically evaluated with equivalent Literal values.
Replaces Expressions that can be statically evaluated with equivalent Literal values. This rule is more specific with Null value propagation from bottom to top of the expression tree.
Removes filters that can be evaluated trivially.
Removes filters that can be evaluated trivially. This can be done through the following ways:
1) by eliding the filter for cases where it will always evaluate to true
.
2) by substituting a dummy empty relation when the filter will always evaluate to false
.
3) by eliminating the always-true conditions given the constraints on the child's output.
Pushes Filter operators through many operators iff: 1) the operator is deterministic 2) the predicate is deterministic and the operator will not change any of rows.
Pushes Filter operators through many operators iff: 1) the operator is deterministic 2) the predicate is deterministic and the operator will not change any of rows.
This heuristic is valid assuming the expression evaluation cost is minimal.
Pushes down Filter operators where the condition
can be
evaluated using only the attributes of the left or right side of a join.
Pushes down Filter operators where the condition
can be
evaluated using only the attributes of the left or right side of a join. Other
Filter conditions are moved into the condition
of the Join.
And also pushes down the join filter, where the condition
can be evaluated using only the
attributes of the left or right side of sub query when applicable.
Check https://cwiki.apache.org/confluence/display/Hive/OuterJoinBehavior for more details
Pushes projects down beneath Sample to enable column pruning with sampling.
Pushes certain operations to both sides of a Union operator.
Pushes certain operations to both sides of a Union operator. Operations that are safe to pushdown are listed as follows. Union: Right now, Union means UNION ALL, which does not de-duplicate rows. So, it is safe to pushdown Filters and Projections through it. Once we add UNION DISTINCT, we will not be able to pushdown Projections.
Removes the Project only conducting Alias of its child node.
Removes the Project only conducting Alias of its child node. It is created mainly for removing extra Project added in EliminateSerialization rule, but can also benefit other operators.
Removes nodes that are not necessary.
Removes literals from group expressions in Aggregate, as they have no effect to the result but only makes the grouping key bigger.
Removes repetition from group expressions in Aggregate, as they have no effect to the result but only makes the grouping key bigger.
Reorder the joins and push all the conditions into join, so that the bottom ones have at least one condition.
Reorder the joins and push all the conditions into join, so that the bottom ones have at least one condition.
The order of joins will not be changed if all of them already have at least one condition.
Replaces logical Distinct operator with an Aggregate operator.
Replaces logical Distinct operator with an Aggregate operator.
SELECT DISTINCT f1, f2 FROM t ==> SELECT f1, f2 FROM t GROUP BY f1, f2
Replaces logical Except operator with a left-anti Join operator.
Replaces logical Except operator with a left-anti Join operator.
SELECT a1, a2 FROM Tab1 EXCEPT SELECT b1, b2 FROM Tab2 ==> SELECT DISTINCT a1, a2 FROM Tab1 LEFT ANTI JOIN Tab2 ON a1<=>b1 AND a2<=>b2
Note: 1. This rule is only applicable to EXCEPT DISTINCT. Do not use it for EXCEPT ALL. 2. This rule has to be done after de-duplicating the attributes; otherwise, the generated join conditions will be incorrect.
Finds all RuntimeReplaceable expressions and replace them with the expressions that can be evaluated.
Finds all RuntimeReplaceable expressions and replace them with the expressions that can be evaluated. This is mainly used to provide compatibility with other databases. For example, we use this to support "nvl" by replacing it with "coalesce".
Replaces logical Intersect operator with a left-semi Join operator.
Replaces logical Intersect operator with a left-semi Join operator.
SELECT a1, a2 FROM Tab1 INTERSECT SELECT b1, b2 FROM Tab2 ==> SELECT DISTINCT a1, a2 FROM Tab1 LEFT SEMI JOIN Tab2 ON a1<=>b1 AND a2<=>b2
Note: 1. This rule is only applicable to INTERSECT DISTINCT. Do not use it for INTERSECT ALL. 2. This rule has to be done after de-duplicating the attributes; otherwise, the generated join conditions will be incorrect.
This rule rewrites correlated ScalarSubquery expressions into LEFT OUTER joins.
This rule rewrites predicate sub-queries into left semi/anti joins.
This rule rewrites predicate sub-queries into left semi/anti joins. The following predicates are supported: a. EXISTS/NOT EXISTS will be rewritten as semi/anti join, unresolved conditions in Filter will be pulled out as the join conditions. b. IN/NOT IN will be rewritten as semi/anti join, unresolved conditions in the Filter will be pulled out as join conditions, value = selected column will also be used as join condition.
An optimizer used in test code.
An optimizer used in test code.
To ensure extendability, we leave the standard rules in the abstract optimizer rules, while specific rules go to the subclasses
Simplifies binary comparisons with semantically-equal expressions: 1) Replace '<=>' with 'true' literal.
Simplifies binary comparisons with semantically-equal expressions: 1) Replace '<=>' with 'true' literal. 2) Replace '=', '<=', and '>=' with 'true' literal if both operands are non-nullable. 3) Replace '<' and '>' with 'false' literal if both operands are non-nullable.
Removes the inner case conversion expressions that are unnecessary because the inner conversion is overwritten by the outer one.
Removes Casts that are unnecessary because the input is already the correct type.
Simplifies conditional expressions (if / case).
Simplifies boolean expressions: 1. Simplifies expressions whose answer can be determined without evaluating both sides. 2. Eliminates / extracts common factors. 3. Merge same expressions 4. Removes
Not
operator.