Simplifies boolean expressions: 1.
Attempts to eliminate the reading of unneeded columns from the query plan using the following transformations:
Attempts to eliminate the reading of unneeded columns from the query plan using the following transformations:
Combines two adjacent Filter operators into one, merging the conditions into one conjunctive predicate.
Combines two adjacent Limit operators into one, merging the expressions into one single expression.
Replaces Expressions that can be statically evaluated with equivalent Literal values.
Converts local operations (i.e.
Converts local operations (i.e. ones that don't require data exchange) on LocalRelation to another LocalRelation.
This is relatively simple as it currently handles only a single case: Project.
Speeds up aggregates on fixed-precision decimals by executing them on unscaled Long values.
Speeds up aggregates on fixed-precision decimals by executing them on unscaled Long values.
This uses the same rules for increasing the precision and scale of the output as org.apache.spark.sql.catalyst.analysis.HiveTypeCoercion.DecimalPrecision.
Simplifies LIKE expressions that do not need full regular expressions to evaluate the condition.
Simplifies LIKE expressions that do not need full regular expressions to evaluate the condition. For example, when the expression is just checking to see if a string starts with a given pattern.
Replaces Expressions that can be statically evaluated with equivalent Literal values.
Replaces Expressions that can be statically evaluated with equivalent Literal values. This rule is more specific with Null value propagation from bottom to top of the expression tree.
Replaces (value, seq[Literal]) with optimized version(value, HashSet[Literal]) which is much faster
Combines two adjacent Project operators into one and perform alias substitution, merging the expressions into one single expression.
Push Filter operators through Generate operators.
Push Filter operators through Generate operators. Parts of the predicate that reference attributes generated in Generate will remain above, and the rest should be pushed beneath.
Pushes down Filter operators where the condition
can be
evaluated using only the attributes of the left or right side of a join.
Pushes down Filter operators where the condition
can be
evaluated using only the attributes of the left or right side of a join. Other
Filter conditions are moved into the condition
of the Join.
And also Pushes down the join filter, where the condition
can be evaluated using only the
attributes of the left or right side of sub query when applicable.
Check https://cwiki.apache.org/confluence/display/Hive/OuterJoinBehavior for more details
Pushes Filter operators through Project operators, in-lining any Aliases that were defined in the projection.
Pushes Filter operators through Project operators, in-lining any Aliases that were defined in the projection.
This heuristic is valid assuming the expression evaluation cost is minimal.
Removes literals from group expressions in Aggregate, as they have no effect to the result but only makes the grouping key bigger.
Removes UnaryPositive identify function
Replaces logical Distinct operator with an Aggregate operator.
Replaces logical Distinct operator with an Aggregate operator.
SELECT DISTINCT f1, f2 FROM t ==> SELECT f1, f2 FROM t GROUP BY f1, f2
Pushes operations down into a Sample.
Pushes certain operations to both sides of a Union, Intersect or Except operator.
Pushes certain operations to both sides of a Union, Intersect or Except operator. Operations that are safe to pushdown are listed as follows. Union: Right now, Union means UNION ALL, which does not de-duplicate rows. So, it is safe to pushdown Filters and Projections through it. Once we add UNION DISTINCT, we will not be able to pushdown Projections.
Intersect: It is not safe to pushdown Projections through it because we need to get the intersect of rows by comparing the entire rows. It is fine to pushdown Filters with deterministic condition.
Except: It is not safe to pushdown Projections through it because we need to get the intersect of rows by comparing the entire rows. It is fine to pushdown Filters with deterministic condition.
Removes the inner case conversion expressions that are unnecessary because the inner conversion is overwritten by the outer one.
Removes Casts that are unnecessary because the input is already the correct type.
Removes filters that can be evaluated trivially.
Removes filters that can be evaluated trivially. This is done either by eliding the filter for
cases where it will always evaluate to true
, or substituting a dummy empty relation when the
filter will always evaluate to false
.
Simplifies boolean expressions: 1. Simplifies expressions whose answer can be determined without evaluating both sides. 2. Eliminates / extracts common factors. 3. Merge same expressions 4. Removes
Not
operator.