Wrap any TokenizedLiteral expression with this so that we can invoke literal
initialization code within the .init()
method of the generated class.
Wrap any TokenizedLiteral expression with this so that we can invoke literal
initialization code within the .init()
method of the generated class.
This pushes itself as reference object and uses a call to eval() on itself for actual
evaluation and avoids embedding any generated code. This allows it to keep the
generated code identical regardless of the constant expression (and in addition
DynamicReplacableConstant trait casts to itself rather than actual object type).
We try to locate first foldable expression in a query tree such that all its child is foldable
but parent isn't. That way we locate the exact point where an expression is safe to evaluate
once instead of evaluating every row.
Expressions like select c from tab where
case col2 when 1 then col3 else 'y' end = 22
like queries don't convert literal evaluation into init method.
minimal expression tree that can be evaluated only once and turn into a constant.
Unlike Spark's InSet expression, this allows for TokenizedLiterals that can change dynamically in executions.
In addition to TokenLiteral, this class can also be used in plan caching so allows for internal value to be updated in subsequent runs when the plan is re-used with different constants.
In addition to TokenLiteral, this class can also be used in plan caching so allows for internal value to be updated in subsequent runs when the plan is re-used with different constants. For that reason this does not extend Literal (to avoid Analyzer/Optimizer etc doing constant propagation for example) and its hash/equals ignores the value matching and only the position of the literal in the plan is used with the data type.
Where ever ParamLiteral case matching is required, it must match for DynamicReplacableConstant and use .eval(..) for code generation. see SNAP-1597 for more details. For cases of common-subexpression elimination that depend on constant values being equal in different parts of the tree, a new RefParamLiteral has been added that points to a ParamLiteral and is always equal to it, see SNAP-2462 for more details.
This class is used as a substitution for ParamLiteral when two ParamLiterals have same constant values during parsing.
This class is used as a substitution for ParamLiteral when two ParamLiterals have same constant values during parsing. This behaves like being equal to the ParamLiteral it points to in all respects but will be different from other ParamLiterals. Two RefParamLiterals will be equal iff their respective ParamLiterals are.
The above policy allows an expression like "a = 4 and b = 4" to be equal to "a = 5 and b = 5" after tokenization but will be different from "a = 5 and b = 6". This distinction is required because former can lead to a different execution plan after common-subexpression processing etc that can apply on because the actual values for the two tokenized values are equal in this instance. Hence it can lead to a different plan in case where actual constants are different, so after tokenization they should act as different expressions. See TPCH Q19 for an example where equal values in two different positions lead to an optimized plan due to common-subexpression being pulled out of OR conditions as a separate AND condition which leads to further filter push down which is not possible if the actual values are different.
Note: This class maintains its own copy of value since it can change in execution (e.g. ROUND can change precision of underlying Decimal value) which should not lead to a change of value of referenced ParamLiteral or vice-versa. However, during planning, code generation and other phases before runJob, the value and dataType should match exactly which is checked by referenceEquals. After deserialization on remote executor, the class no longer maintains a reference and falls back to behaving like a regular ParamLiteral since the required analysis and other phases are already done, and final code generation requires a copy of the values.
A Literal that passes its value as a reference object in generated code instead of embedding as a constant to allow generated code reuse.