A function that get the absolute value of the numeric value.
Returns the date that is num_months after start_date.
Used to assign a new name to a computation.
Checks if the array (left) has the element (right)
Returns the numeric value of the first character of str.
Asserts that input values of a non-nullable child expression are not null.
A predicate that is evaluated to be true if there are at least n
non-null and non-NaN values.
A reference to an attribute produced by another operator in the tree.
Helper functions for working with Seq[Attribute]
.
A Set designed to hold AttributeReference objects, that performs equality checking using expression id instead of standard java equality.
Converts the argument from binary to a base 64 string.
An extended version of InternalRow that implements all special getters, toString
and equals/hashCode by genericGet
.
An expression with two inputs and one output.
A binary expression specifically for math functions that take two Double
s as input and returns
a Double
.
A BinaryExpression that is an operator, with two properties:
A function that calculates bitwise and(&) of two numbers.
A function that calculates bitwise not(~) of a number.
A function that calculates bitwise or(|) of two numbers.
A function that calculates bitwise xor of two numbers.
A bound reference points to a specific slot in the input tuple, allowing the actual value to be retrieved more efficiently.
Case statements of the form "CASE a WHEN b THEN c [WHEN d THEN e]* [ELSE f] END".
Case statements of the form "CASE WHEN a THEN b [WHEN c THEN d]* [ELSE e] END".
Cast the child expression to the target data type.
Rounds the decimal to given scale and check whether the decimal can fit in provided precision or not, returns null if not.
An expression that is evaluated to the first non-null input.
An expression that concatenates multiple input strings into a single string.
An expression that concatenates multiple input strings or array of strings into a single string, using a given separator (the first child).
A function that returns true if the string left
contains the string right
.
Convert a num from one base to another
A function that computes a cyclic redundancy check value and returns it as a bigint For input of type BinaryType
Returns an Array containing the evaluation of all children expressions.
Constructs a new external row, using the result of evaluating the specified expressions as content.
Creates a struct with the given field names and values
Creates a struct with the given field names and values.
Returns a Row containing the evaluation of all children expressions.
Returns a Row containing the evaluation of all children expressions.
Returns the current date at the start of query evaluation.
Returns the current timestamp at the start of query evaluation.
Adds a number of days to startdate.
Returns the number of days from startDate to endDate.
Subtracts a number of days to startdate.
Decodes the first argument into a String using the provided character set (one of 'US-ASCII', 'ISO-8859-1', 'UTF-8', 'UTF-16BE', 'UTF-16LE', 'UTF-16').
Serializes an input object using a generic serializer (Kryo or Java).
Encodes the first argument into a BINARY using the provided character set (one of 'US-ASCII', 'ISO-8859-1', 'UTF-8', 'UTF-16BE', 'UTF-16LE', 'UTF-16').
Serializes an input object using a generic serializer (Kryo or Java).
A function that returns true if the string left
ends with the string right
.
This class is used to compute equality of (sub)expression trees.
Euler's number.
An trait that gets mixin to define the expected input types of an expression.
Given an input array produces a sequence of rows for each value in the array.
A globally unique id for a given named expression.
An expression in Catalyst.
A function that returns the index (1-based) of the given string (left) in the comma- delimited list (right).
Formats the number X to a format like '#,###,###.
Returns the input formatted according do printf-style format strings
The trait used to represent the type of a Window Frame Boundary.
The trait used to represent the type of a Window Frame.
Assumes given timestamp is UTC and converts to given timezone.
Converts the number of seconds from unix epoch (1970-01-01 00:00:00 UTC) to a string representing the timestamp of that moment in the current system time zone in the given format.
An expression that produces zero or more rows given a single input row.
A internal row implementation that uses an array of objects as the underlying storage.
This is used for serialization of Python DataFrame
A row implementation that uses an array of objects as the underlying storage.
Returns the field at ordinal
in the Array child
.
Returns the array of value of fields in the Array of Struct child
.
Extracts json object from a json string based on json path specified, and returns json string of the extracted json object.
Returns the value of key key
in Map child
.
Returns the value of fields in the Struct child
.
A function that returns the greatest value of all parameters, skipping null values.
If the argument is an INT or binary, hex returns the number as a STRING in hexadecimal format.
A mixin for the analyzer to perform implicit type casting using ImplicitTypeCasts.
Evaluates to true
if list
contains value
.
Optimized version of In clause, when all filter values of In clause are static.
Returns string, with the first letter of each word in uppercase.
Initialize a Java Bean instance by setting its field values via setters.
Expression that returns the name of the current file being read in using SqlNewHadoopRDD
A MutableProjection that is calculated by calling eval
on each of the specified
expressions.
An interpreted row ordering comparator.
A Projection that is calculated by calling the eval
of each of the specified expressions.
Calls the specified function on an object, optionally passing arguments.
Evaluates to true
iff it's NaN.
An expression that is evaluated to true if the input is not null.
An expression that is evaluated to true if the input is null.
A mutable wrapper that makes two rows appear as a single concatenated row.
A place holder for the loop variable used in MapObjects.
Returns the last day of the month which the date belongs to.
A leaf expression, i.
A leaf expression specifically for math constants.
A function that returns the least value of all parameters, skipping null values.
A function that return the length of the given string or binary expression.
A function that return the Levenshtein distance between the two given strings.
Simple RegEx pattern matching function
In order to do type checking, use Literal.
Computes the logarithm of a number.
A function that converts the characters of a string to lowercase.
Create a Decimal from an unscaled Long value.
Applies the given expression to every element of a collection of items, returning the result as an ArrayType.
A function that calculates an MD5 128-bit checksum and returns it as a hex string For input of type BinaryType
Returns number of months between dates date1 and date2.
Converts a InternalRow to another Row given a sequence of expression that define each column of the new row.
An extended interface to InternalRow that allows the values for each column to be updated.
A parent class for mutable container objects that are reused when the values are changed, resulting in less garbage.
An Expression evaluates to left
iff it's not NaN, or evaluates to right
otherwise.
An Expression that is named.
Constructs a new instance of the given class, using the result of evaluating the specified expressions as arguments.
Returns the first date which is later than startDate and named as dayOfWeek.
An expression that is nondeterministic.
Pi.
An Expression that returns a boolean value.
A place holder used when printing expressions without debugging information such as the expression id or the unresolved indicator.
Converts a InternalRow to another Row given a sequence of expression that define each column of the new row.
An expression used to wrap the children when promote the precision of DecimalType to avoid promote multiple times.
A Random distribution generating expression.
Generate a random column with i.
Generate a random column with i.
Extract a specific(idx) group identified by a Java regex.
Replace all substrings of str that match regexp with rep.
Round the child
's result to scale
decimal place when scale
>= 0
or round at integral part when scale
< 0.
User-defined function.
A function that calculates a sha1 hash value and returns it as a hex string For input of type BinaryType or StringType
A function that calculates the SHA-2 family of functions (SHA-224, SHA-256, SHA-384, and SHA-512) and returns it as a hex string.
Bitwise unsigned left shift.
Bitwise unsigned left shift.
Bitwise unsigned right shift, for integer and long data type.
Given an array or map, returns its size.
Sorts the input array in ascending / descending order according to the natural ordering of the array elements and returns it.
An expression that can be used to sort a tuple.
An expression to generate a 64-bit long prefix used in sorting.
A function that return soundex code of the given string expression.
A row type that holds an array specialized container objects, of type MutableValue, chosen based on the dataTypes of each column.
A specified Window Frame.
A function that returns true if the string left
starts with the string right
.
Invokes a static function, returning the result.
A function that returns the position of the first occurrence of substr in the given string.
Returns str, left-padded with pad to a length of len.
A function that returns the position of the first occurrence of substr in given string after position pos.
A base trait for functions that compare two strings, returning a boolean.
Returns str, right-padded with pad to a length of len.
Returns the string which repeat the given string value n times.
Returns the reversed given string.
Returns a n spaces string.
Splits str around pat (pattern is a regular expression).
A function translate any character in the srcExpr
by a character in replaceExpr
.
A function that trim the spaces from both ends for the specified string.
A function that trim the spaces from left end for given string.
A function that trim the spaces from right end for given string.
A function that takes a substring of its first argument starting at a given position.
Returns the substring from string str before count occurrences of the delimiter delim.
An expression with three inputs and one output.
Adds an interval to timestamp.
Subtracts an interval from timestamp.
Returns the date part of a timestamp or string.
Assumes given timestamp is in given timezone and converts to UTC.
Converts time string with given pattern.
Returns date truncated to the unit specified by the format.
Converts the argument from a base 64 string to BINARY.
An expression with one input and one output.
A unary expression specifically for math functions.
An expression that cannot be evaluated.
Performs the inverse operation of HEX.
Converts time string with given pattern.
A projection that returns UnsafeRow.
Return the unscaled Long value of a Decimal, assuming it fits in a Long.
Given an expression that returns on object of type Option[_]
, this expression unwraps the
option into the specified Spark SQL datatype.
Cast the child expression to the target data type, but will throw error if the cast might truncate, e.
A function that converts the characters of a string to uppercase.
A generator that produces its output using the provided lambda function.
<value> FOLLOWING boundary.
<value> PRECEDING boundary.
The trait used to represent the a Window Frame.
Every window function needs to maintain a output buffer for its output.
The trait of the Window Specification (specified in the OVER clause or WINDOW clause) for Window Functions.
The specification for a window function.
A Window specification reference that refers to the WindowSpecDefinition defined
under the name name
.
Converts the result of evaluating child
into an option, checking both the isNull bit and
(in the case of reference types) equality with null.
Builds a map that is keyed by an Attribute's expression id.
CURRENT ROW boundary.
Used as input into expressions whose output does not depend on any input value.
Extractor for making working with frame boundaries easier.
A projection that could turn UnsafeRow into GenericInternalRow
Extractor for retrieving Int literals.
An extractor that matches non-null literal values
RangeFrame treats rows in a partition as groups of peers.
RowFrame treats rows in a partition individually.
UNBOUNDED FOLLOWING boundary.
UNBOUNDED PRECEDING boundary.
Used as a place holder when a frame specification is not defined.
A collection of generators that build custom bytecode at runtime for performing the evaluation of catalyst expression.
A set of classes that can be used to represent trees of relational expressions. A key goal of the expression library is to hide the details of naming and scoping from developers who want to manipulate trees of relational operators. As such, the library defines a special type of expression, a NamedExpression in addition to the standard collection of expressions.
Standard Expressions
A library of standard expressions (e.g., Add, EqualTo), aggregates (e.g., SUM, COUNT), and other computations (e.g. UDFs). Each expression type is capable of determining its output schema as a function of its children's output schema.
Named Expressions
Some expression are named and thus can be referenced by later operators in the dataflow graph. The two types of named expressions are AttributeReferences and Aliases. AttributeReferences refer to attributes of the input tuple for a given operator and form the leaves of some expression trees. Aliases assign a name to intermediate computations. For example, in the SQL statement
SELECT a+b AS c FROM ...
, the expressionsa
andb
would be represented byAttributeReferences
andc
would be represented by anAlias
.During analysis, all named expressions are assigned a globally unique expression id, which can be used for equality comparisons. While the original names are kept around for debugging purposes, they should never be used to check if two attributes refer to the same value, as plan transformations can result in the introduction of naming ambiguity. For example, consider a plan that contains subqueries, both of which are reading from the same table. If an optimization removes the subqueries, scoping information would be destroyed, eliminating the ability to reason about which subquery produced a given attribute.
Evaluation
The result of expressions can be evaluated using the
Expression.apply(Row)
method.