A specific implementation of an aggregate function.
Used to assign a new name to a computation.
A reference to an attribute produced by another operator in the tree.
A bound reference points to a specific slot in the input tuple, allowing the actual value to be retrieved more efficiently.
Case statements of the form "CASE WHEN a THEN b [WHEN c THEN d]* [ELSE e] END".
Cast the child expression to the target data type.
Given an input array produces a sequence of rows for each value in the array.
A globally (within this JVM) id for a given named expression.
An expression that produces zero or more rows given a single input row.
A row implementation that uses an array of objects as the underlying storage.
Returns the value of fields in the Struct child
.
Returns the item at ordinal
in the Array child
or the Key ordinal
in Map child
.
Evaluates to true
if list
contains value
.
A mutable wrapper that makes two rows appear appear as a single concatenated row.
Simple RegEx pattern matching function
A function that converts the characters of a string to lowercase.
Converts a Row to another Row given a sequence of expression that define each column of th new row.
An extended interface to Row that allows the values for each column to be updated.
Used to denote operators that do their own binding of attributes internally.
An AggregateExpression that can be partially computed without seeing all relevant tuples.
Converts a Row to another Row given a sequence of expression that define each column of the new row.
Represents one row of output from a relational operator.
An expression that can be used to sort a tuple.
Represents an aggregation that has been rewritten to be performed in two steps.
A function that converts the characters of a string to uppercase.
A row with no data.
Extractor for retrieving Int literals.
A set of classes that can be used to represent trees of relational expressions. A key goal of the expression library is to hide the details of naming and scoping from developers who want to manipulate trees of relational operators. As such, the library defines a special type of expression, a NamedExpression in addition to the standard collection of expressions.
Standard Expressions
A library of standard expressions (e.g., Add, EqualTo), aggregates (e.g., SUM, COUNT), and other computations (e.g. UDFs). Each expression type is capable of determining its output schema as a function of its children's output schema.
Named Expressions
Some expression are named and thus can be referenced by later operators in the dataflow graph. The two types of named expressions are AttributeReferences and Aliases. AttributeReferences refer to attributes of the input tuple for a given operator and form the leaves of some expression trees. Aliases assign a name to intermediate computations. For example, in the SQL statement
SELECT a+b AS c FROM ...
, the expressionsa
andb
would be represented byAttributeReferences
andc
would be represented by anAlias
.During analysis, all named expressions are assigned a globally unique expression id, which can be used for equality comparisons. While the original names are kept around for debugging purposes, they should never be used to check if two attributes refer to the same value, as plan transformations can result in the introduction of naming ambiguity. For example, consider a plan that contains subqueries, both of which are reading from the same table. If an optimization removes the subqueries, scoping information would be destroyed, eliminating the ability to reason about which subquery produced a given attribute.
Evaluation
The result of expressions can be evaluated using the
Expression.apply(Row)
method.