A batch of rules.
A batch of rules.
A strategy that runs until fix point or maxIterations times, whichever comes first.
A strategy that runs until fix point or maxIterations times, whichever comes first.
An execution strategy for rules that indicates the maximum number of executions.
An execution strategy for rules that indicates the maximum number of executions. If the execution reaches fix point (i.e. converge) before maxIterations, it will stop.
Casts to/from BooleanType are transformed into comparisons since the JVM does not consider Booleans to be numeric types.
Casts to/from BooleanType are transformed into comparisons since the JVM does not consider Booleans to be numeric types.
Changes Boolean values to Bytes so that expressions like true < false can be Evaluated.
Changes Boolean values to Bytes so that expressions like true < false can be Evaluated.
Coerces the type of different branches of a CASE WHEN statement to a common type.
Coerces the type of different branches of a CASE WHEN statement to a common type.
Converts string "NaN"s that are in binary operators with a NaN-able types (Float / Double) to the appropriate numeric equivalent.
Converts string "NaN"s that are in binary operators with a NaN-able types (Float / Double) to the appropriate numeric equivalent.
Calculates and propagates precision for fixed-precision decimals.
Calculates and propagates precision for fixed-precision decimals. Hive has a number of rules for this based on the SQL standard and MS SQL: https://cwiki.apache.org/confluence/download/attachments/27362075/Hive_Decimal_Precision_Scale_Support.pdf
In particular, if we have expressions e1 and e2 with precision/scale p1/s2 and p2/s2 respectively, then the following operations have the following precision / scale:
Operation Result Precision Result Scale ------------------------------------------------------------------------ e1 + e2 max(s1, s2) + max(p1-s1, p2-s2) + 1 max(s1, s2) e1 - e2 max(s1, s2) + max(p1-s1, p2-s2) + 1 max(s1, s2) e1 * e2 p1 + p2 + 1 s1 + s2 e1 / e2 p1 - s1 + s2 + max(6, s1 + p2 + 1) max(6, s1 + p2 + 1) e1 % e2 min(p1-s1, p2-s2) + max(s1, s2) max(s1, s2) sum(e1) p1 + 10 s1 avg(e1) p1 + 4 s1 + 4
Catalyst also has unlimited-precision decimals. For those, all ops return unlimited precision.
To implement the rules for fixed-precision types, we introduce casts to turn them to unlimited precision, do the math on unlimited-precision numbers, then introduce casts back to the required fixed precision. This allows us to do all rounding and overflow handling in the cast-to-fixed-precision operator.
In addition, when mixing non-decimal types with decimals, we use the following rules: - BYTE gets turned into DECIMAL(3, 0) - SHORT gets turned into DECIMAL(5, 0) - INT gets turned into DECIMAL(10, 0) - LONG gets turned into DECIMAL(20, 0) - FLOAT and DOUBLE cause fixed-length decimals to turn into DOUBLE (this is the same as Hive, but note that unlimited decimals are considered bigger than doubles in WidenTypes)
Hive only performs integral division with the DIV operator.
Hive only performs integral division with the DIV operator. The arguments to / are always converted to fractional types.
This ensure that the types for various functions are as expected.
This ensure that the types for various functions are as expected.
Turns projections that contain aggregate expressions into aggregations.
Turns projections that contain aggregate expressions into aggregations.
When a SELECT clause has only a single expression and that expression is a Generator we convert the Project to a Generate.
A strategy that only runs once.
A strategy that only runs once.
Promotes strings that appear in arithmetic expressions.
Promotes strings that appear in arithmetic expressions.
Applies any changes to AttributeReference data types that are made by other rules to instances higher in the query tree.
Applies any changes to AttributeReference data types that are made by other rules to instances higher in the query tree.
Replaces UnresolvedFunctions with concrete Expressions.
Replaces UnresolvedFunctions with concrete Expressions.
Replaces UnresolvedAttributes with concrete AttributeReferences from a logical plan node's children.
Replaces UnresolvedAttributes with concrete AttributeReferences from a logical plan node's children.
Replaces UnresolvedRelations with concrete relations from the catalog.
Replaces UnresolvedRelations with concrete relations from the catalog.
In many dialects of SQL it is valid to sort by attributes that are not present in the SELECT clause.
In many dialects of SQL it is valid to sort by attributes that are not present in the SELECT clause. This rule detects such queries and adds the required attributes to the original projection, so that they will be available during sorting. Another projection is added to remove these attributes after sorting.
When encountering a cast from a string representing a valid fractional number to an integral
type the jvm will throw a java.lang.NumberFormatException
.
When encountering a cast from a string representing a valid fractional number to an integral
type the jvm will throw a java.lang.NumberFormatException
. Hive, in contrast, returns the
truncated version of this number.
Removes no-op Alias expressions from the plan.
Removes no-op Alias expressions from the plan.
This rule finds expressions in HAVING clause filters that depend on unresolved attributes.
This rule finds expressions in HAVING clause filters that depend on unresolved attributes. It pushes these expressions down to the underlying aggregates and then projects them away above the filter.
Widens numeric types and converts strings to numbers when appropriate.
Widens numeric types and converts strings to numbers when appropriate.
Loosely based on rules from "Hadoop: The Definitive Guide" 2nd edition, by Tom White
The implicit conversion rules can be summarized as follows:
Additionally, all types when UNION-ed with strings will be promoted to strings. Other string conversions are handled by PromoteStrings.
Widening types might result in loss of precision in the following cases: - IntegerType to FloatType - LongType to FloatType - LongType to DoubleType
Executes the batches of rules defined by the subclass.
Executes the batches of rules defined by the subclass. The batches are executed serially using the defined execution strategy. Within each batch, rules are also executed serially.
Defines a sequence of rule batches, to be overridden by the implementation.
Defines a sequence of rule batches, to be overridden by the implementation.
Override to provide additional rules for the "Resolution" batch.
Override to provide additional rules for the "Resolution" batch.
A trivial Analyzer with an EmptyCatalog and EmptyFunctionRegistry. Used for testing when all relations are already filled in and the analyser needs only to resolve attribute references.