A relation produced by applying func
to each element of the child
, concatenating the
resulting columns at the end of the input row.
An optimized version of AppendColumns, that can be executed on deserialized object directly.
A logical plan node with a left and right child.
A hint for the optimizer that we should broadcast the child
if used in a join operator.
A relation produced by applying func
to each grouping key and associated values from left and
right children.
A logical node that represents a non-query command to be executed by the system.
A logical node that represents a non-query command to be executed by the system. For example, commands can be used by parsers to represent DDL operations. Commands, unlike queries, are eagerly executed.
Takes the input row from child and turns it into object using the given deserializer expression.
Returns a new logical plan that dedups input rows.
Apply a number of projections to every input row, hence we will get multiple output rows for an input row.
Apply a number of projections to every input row, hence we will get multiple output rows for an input row.
to apply
of all projections.
operator.
Applies a Generator to a stream of input rows, combining the output of each into a new stream of rows.
Applies a Generator to a stream of input rows, combining the
output of each into a new stream of rows. This operation is similar to a flatMap
in functional
programming with one important additional feature, which allows the input rows to be joined with
their output.
the generator expression
when true, each output row is implicitly joined with the input tuple that produced it.
when true, each input row will be output at least once, even if the output of the
given generator
is empty. outer
has no effect when join
is false.
Qualifier for the attributes of generator(UDTF)
The output schema of the Generator.
Children logical plan node
A GROUP BY clause with GROUPING SETS can generate a result set equivalent to generated by a UNION ALL of multiple simple GROUP BY clauses.
A GROUP BY clause with GROUPING SETS can generate a result set equivalent to generated by a UNION ALL of multiple simple GROUP BY clauses.
We will transform GROUPING SETS into logical plan Aggregate(.., Expand) in Analyzer
A list of bitmasks, each of the bitmask indicates the selected GroupBy expressions
The Group By expressions candidates, take effective only if the associated bit in the bitmask set to 1.
Child operator
The Aggregation expressions, those non selected group by expressions will be considered as constant null if it appears in the expressions
A logical plan node with no children.
A relation produced by applying func
to each element of the child
.
Applies func to each unique group in child
, based on the evaluation of groupingAttributes
.
Applies func to each unique group in child
, based on the evaluation of groupingAttributes
.
Func is invoked with an object representation of the grouping key an iterator containing the
object representation of all the rows with that key.
used to extract the key object for each group.
used to extract the items in the iterator from an input row.
A relation produced by applying func
to each partition of the child
.
A relation produced by applying a serialized R function func
to each partition of the child
.
A trait for logical operators that consumes domain objects as input.
A trait for logical operators that consumes domain objects as input. The output of its child must be a single-field row containing the input object.
A trait for logical operators that produces domain objects as output.
A trait for logical operators that produces domain objects as output. The output of this operator is a single-field safe row containing the produced object.
Performs a physical redistribution of the data.
Performs a physical redistribution of the data. Used when the consumer of the query result have expectations about the distribution and ordering of partitioned input data.
Returns a new RDD that has exactly numPartitions
partitions.
Returns a new RDD that has exactly numPartitions
partitions. Differs from
RepartitionByExpression as this method is called directly by DataFrame's, because the user
asked for coalesce
or repartition
. RepartitionByExpression is used when the consumer
of the output requires some specific ordering or distribution of the data.
This method repartitions data using Expressions into numPartitions
, and receives
information about the number of partitions during execution.
This method repartitions data using Expressions into numPartitions
, and receives
information about the number of partitions during execution. Used when a specific ordering or
distribution is expected by the consumer of the query result. Use Repartition for RDD-like
coalesce
and repartition
.
If numPartitions
is not specified, the number of partitions will be the number set by
spark.sql.shuffle.partitions
.
When planning take() or collect() operations, this special node that is inserted at the top of the logical plan before invoking the query planner.
When planning take() or collect() operations, this special node that is inserted at the top of the logical plan before invoking the query planner.
Rules can pattern-match on this node in order to apply transformations that only take effect at the top of the logical query plan.
Sample the dataset.
Sample the dataset.
Lower-bound of the sampling probability (usually 0.0)
Upper-bound of the sampling probability. The expected fraction sampled will be ub - lb.
Whether to sample with replacement.
the random seed
the LogicalPlan
Is created from TABLESAMPLE in the parser.
Input and output properties when passing data to a script.
Input and output properties when passing data to a script. For example, in Hive this would specify which SerDes to use.
Transforms the input by forking and running the specified script.
Transforms the input by forking and running the specified script.
the set of expression that should be passed to the script.
the command that should be executed.
the attributes that are produced by the script.
the input and output schema applied in the execution of the script.
Takes the input object from child and turns it into unsafe row using the given serializer expression.
The ordering expressions
True means global sorting apply for entire data set, False means sorting only apply within the partition.
Child logical plan
A logical plan node with single child.
A container for holding named common table expressions (CTEs) and a query plan.
A container for holding named common table expressions (CTEs) and a query plan. This operator will be removed during analysis and the relations will be substituted into child.
The final query of this CTE.
Queries that this CTE defined, key is the alias of the CTE definition, value is the CTE definition.
Factory for constructing new AppendColumn
nodes.
Factory for constructing new CoGroup
nodes.
Factory for constructing new FlatMapGroupsInR
nodes.
Factory for constructing new MapGroups
nodes.
A relation with one row.
A relation with one row. This is used in "SELECT ..." without a from clause.
Factory for constructing new Range
nodes.
Factory for constructing new Union
nodes.
A relation produced by applying
func
to each element of thechild
, concatenating the resulting columns at the end of the input row.used to extract the input to
func
from an input row.use to serialize the output of
func
.