A function that get the absolute value of the numeric value.
Adds an item to a set.
Adds an item to a set. For performance, this expression mutates its input during evaluation.
A specific implementation of an aggregate function.
A specific implementation of an aggregate function. Used to wrap a generic AggregateExpression with an algorithm that will be used to compute one specific result.
Used to assign a new name to a computation.
Used to assign a new name to a computation. For example the SQL expression "1 + 1 AS a" could be represented as follows: Alias(Add(Literal(1), Literal(1)), "a")()
Note that exprId and qualifiers are in a separate parameter list because we only pattern match on child and name.
the computation being performed
the name to be associated with the result of computing child.
A globally unique id used to check if an AttributeReference refers to this alias. Auto-assigned if left blank.
Returns the array of value of fields in the Array of Struct child
.
A reference to an attribute produced by another operator in the tree.
A reference to an attribute produced by another operator in the tree.
The name of this attribute, should only be used during analysis or for debugging.
The DataType of this attribute.
True if null is a valid value for this attribute.
The metadata of this attribute.
A globally unique id used to check if different AttributeReferences refer to the same attribute.
a list of strings that can be used to referred to this attribute in a fully qualified way. Consider the examples tableName.name, subQueryAlias.name. tableName and subQueryAlias are possible qualifiers.
A Set designed to hold AttributeReference objects, that performs equality checking using expression id instead of standard java equality.
A Set designed to hold AttributeReference objects, that performs equality checking using expression id instead of standard java equality. Using expression id means that these sets will correctly test for membership, even when the AttributeReferences in question differ cosmetically (e.g., the names have different capitalizations).
Note that we do not override equality for Attribute references as it is really weird when
AttributeReference("a"...) == AttrributeReference("b", ...)
. This tactic leads to broken tests,
and also makes doing transformations hard (we always try keep older trees instead of new ones
when the transformation was a no-op).
A function that calculates bitwise and(&) of two numbers.
A function that calculates bitwise not(~) of a number.
A function that calculates bitwise or(|) of two numbers.
A function that calculates bitwise xor(^) of two numbers.
A bound reference points to a specific slot in the input tuple, allowing the actual value to be retrieved more efficiently.
A bound reference points to a specific slot in the input tuple, allowing the actual value to be retrieved more efficiently. However, since operations like column pruning can change the layout of intermediate tuples, BindReferences should be run after all such transformations.
Case statements of the form "CASE WHEN a THEN b [WHEN c THEN d]* [ELSE e] END".
Case statements of the form "CASE WHEN a THEN b [WHEN c THEN d]* [ELSE e] END". Refer to this link for the corresponding semantics: https://cwiki.apache.org/confluence/display/Hive/LanguageManual+UDF#LanguageManualUDF-ConditionalFunctions
The other form of case statements "CASE a WHEN b THEN c [WHEN d THEN e]* [ELSE f] END" gets translated to this form at parsing time. Namely, such a statement gets translated to "CASE WHEN a=b THEN c [WHEN a=d THEN e]* [ELSE f] END".
Note that branches
are considered in consecutive pairs (cond, val), and the optional last
element is the value for the default catch-all case (if provided). Hence, branches
consists of
at least two elements, and can have an odd or even length.
Cast the child expression to the target data type.
Combines the elements of two sets.
Combines the elements of two sets. For performance, this expression mutates its left input set during evaluation.
A function that returns true if the string left
contains the string right
.
Returns the number of elements in the input set.
Returns an Array containing the evaluation of all children expressions.
A function that returns true if the string left
ends with the string right
.
Given an input array produces a sequence of rows for each value in the array.
A globally unique (within this JVM) id for a given named expression.
A globally unique (within this JVM) id for a given named expression. Used to identify which attribute output by a relation is being referenced in a subsequent computation.
An expression that produces zero or more rows given a single input row.
An expression that produces zero or more rows given a single input row.
Generators produce multiple output rows instead of a single value like other expressions, and thus they must have a schema to associate with the rows that are output.
However, unlike row producing relational operators, which are either leaves or determine their
output schema functionally from their input, generators can contain other expressions that
might result in their modification by rules. This structure means that they might be copied
multiple times after first determining their output schema. If a new output schema is created for
each copy references up the tree might be rendered invalid. As a result generators must
instead define a function makeOutput
which is called only once when the schema is first
requested. The attributes produced by this function will be automatically copied anytime rules
result in changes to the Generator or its children.
A row implementation that uses an array of objects as the underlying storage.
A row implementation that uses an array of objects as the underlying storage. Note that, while the array is not copied, and thus could technically be mutated after creation, this is not allowed.
Returns the item at ordinal
in the Array child
or the Key ordinal
in Map child
.
Evaluates to true
if list
contains value
.
Optimized version of In clause, when all filter values of In clause are static.
A MutableProjection that is calculated by calling eval
on each of the specified
expressions.
A MutableProjection that is calculated by calling eval
on each of the specified
expressions.
a sequence of expressions that determine the value of each column of the output row.
A Projection that is calculated by calling the eval
of each of the specified expressions.
A mutable wrapper that makes two rows appear as a single concatenated row.
A mutable wrapper that makes two rows appear as a single concatenated row. Designed to be instantiated once per thread and reused.
JIT HACK: Replace with macros
The JoinedRow
class is used in many performance critical situation.
JIT HACK: Replace with macros
The JoinedRow
class is used in many performance critical situation. Unfortunately, since there
are multiple different types of Rows
that could be stored as row1
and row2
most of the
calls in the critical path are polymorphic. By creating special versions of this class that are
used in only a single location of the code, we increase the chance that only a single type of
Row will be referenced, increasing the opportunity for the JIT to play tricks. This sounds
crazy but in benchmarks it had noticeable effects.
JIT HACK: Replace with macros
JIT HACK: Replace with macros
JIT HACK: Replace with macros
Simple RegEx pattern matching function
A function that converts the characters of a string to lowercase.
Create a Decimal from an unscaled Long value
Converts a Row to another Row given a sequence of expression that define each column of the new row.
Converts a Row to another Row given a sequence of expression that define each column of the new row. If the schema of the input row is specified, then the given expression will be bound to that schema.
In contrast to a normal projection, a MutableProjection reuses the same underlying row object
each time an input row is added. This significantly reduces the cost of calculating the
projection, but means that it is not safe to hold on to a reference to a Row after next()
has been called on the Iterator that produced it. Instead, the user must call Row.copy()
and hold on to the returned Row before calling next()
.
An extended interface to Row that allows the values for each column to be updated.
An extended interface to Row that allows the values for each column to be updated. Setting a value through a primitive function implicitly marks that column as not null.
A parent class for mutable container objects that are reused when the values are changed, resulting in less garbage.
A parent class for mutable container objects that are reused when the values are changed, resulting in less garbage. These values are held by a SpecificMutableRow.
The following code was roughly used to generate these objects:
val types = "Int,Float,Boolean,Double,Short,Long,Byte,Any".split(",") types.map {tpe => s""" final class Mutable$tpe extends MutableValue { var value: $tpe = 0 def boxed = if (isNull) null else value def update(v: Any) = value = { isNull = false v.asInstanceOf[$tpe] } def copy() = { val newCopy = new Mutable$tpe newCopy.isNull = isNull newCopy.value = value newCopy.asInstanceOf[this.type] } }""" }.foreach(println) types.map { tpe => s""" override def set$tpe(ordinal: Int, value: $tpe): Unit = { val currentValue = values(ordinal).asInstanceOf[Mutable$tpe] currentValue.isNull = false currentValue.value = value } override def get$tpe(i: Int): $tpe = { values(i).asInstanceOf[Mutable$tpe].value }""" }.foreach(println)
Creates a new set of the specified type
An AggregateExpression that can be partially computed without seeing all relevant tuples.
An AggregateExpression that can be partially computed without seeing all relevant tuples. These partial evaluations can then be combined to compute the actual answer.
A place holder used when printing expressions without debugging information such as the expression id or the unresolved indicator.
Converts a Row to another Row given a sequence of expression that define each column of the new row.
Converts a Row to another Row given a sequence of expression that define each column of the new row. If the schema of the input row is specified, then the given expression will be bound to that schema.
User-defined function.
User-defined function.
Return type of function.
An expression that can be used to sort a tuple.
An expression that can be used to sort a tuple. This class extends expression primarily so that transformations over expression will descend into its child.
A row type that holds an array specialized container objects, of type MutableValue, chosen based on the dataTypes of each column.
A row type that holds an array specialized container objects, of type MutableValue, chosen based on the dataTypes of each column. The intent is to decrease garbage when modifying the values of primitive columns.
Represents an aggregation that has been rewritten to be performed in two steps.
Represents an aggregation that has been rewritten to be performed in two steps.
an aggregate expression that evaluates to same final result as the original aggregation.
A sequence of NamedExpressions that can be computed on partial
data sets and are required to compute the finalEvaluation
.
A function that returns true if the string left
starts with the string right
.
A base trait for functions that compare two strings, returning a boolean.
Returns the value of fields in the Struct child
.
A function that takes a substring of its first argument starting at a given position.
A function that takes a substring of its first argument starting at a given position. Defined for String and Binary types.
Return the unscaled Long value of a Decimal, assuming it fits in a Long
A function that converts the characters of a string to uppercase.
A generator that produces its output using the provided lambda function.
Builds a map that is keyed by an Attribute's expression id.
Builds a map that is keyed by an Attribute's expression id. Using the expression id allows values to be looked up even when the attributes used differ cosmetically (i.e., the capitalization of the name, or the expected nullability).
A row with no data.
A row with no data. Calling any methods will result in an error. Can be used as a placeholder.
Extractor for retrieving Int literals.
An extractor that matches non-null literal values
A collection of generators that build custom bytecode at runtime for performing the evaluation of catalyst expression.
A set of classes that can be used to represent trees of relational expressions. A key goal of the expression library is to hide the details of naming and scoping from developers who want to manipulate trees of relational operators. As such, the library defines a special type of expression, a NamedExpression in addition to the standard collection of expressions.
Standard Expressions
A library of standard expressions (e.g., Add, EqualTo), aggregates (e.g., SUM, COUNT), and other computations (e.g. UDFs). Each expression type is capable of determining its output schema as a function of its children's output schema.
Named Expressions
Some expression are named and thus can be referenced by later operators in the dataflow graph. The two types of named expressions are AttributeReferences and Aliases. AttributeReferences refer to attributes of the input tuple for a given operator and form the leaves of some expression trees. Aliases assign a name to intermediate computations. For example, in the SQL statement
SELECT a+b AS c FROM ...
, the expressionsa
andb
would be represented byAttributeReferences
andc
would be represented by anAlias
.During analysis, all named expressions are assigned a globally unique expression id, which can be used for equality comparisons. While the original names are kept around for debugging purposes, they should never be used to check if two attributes refer to the same value, as plan transformations can result in the introduction of naming ambiguity. For example, consider a plan that contains subqueries, both of which are reading from the same table. If an optimization removes the subqueries, scoping information would be destroyed, eliminating the ability to reason about which subquery produced a given attribute.
Evaluation
The result of expressions can be evaluated using the
Expression.apply(Row)
method.