A logical plan node with a left and right child.
A logical node that represents a non-query command to be executed by the system.
Cube is a syntactic sugar for GROUPING SETS, and will be transformed to GroupingSets, and eventually will be transformed to Aggregate(.., Expand) in Analyzer
Cube is a syntactic sugar for GROUPING SETS, and will be transformed to GroupingSets, and eventually will be transformed to Aggregate(.., Expand) in Analyzer
The Group By expressions candidates.
Child operator
The Aggregation expressions, those non selected group by expressions will be considered as constant null if it appears in the expressions
The attribute represents the virtual column GROUPINGID, and it's also the bitmask indicates the selected GroupBy Expressions for each aggregating output row.
Apply the all of the GroupExpressions to every input row, hence we will get multiple output rows for a input row.
Apply the all of the GroupExpressions to every input row, hence we will get multiple output rows for a input row.
The group of expressions, all of the group expressions should
output the same schema specified by the parameter output
The output Schema
Child operator
Applies a Generator to a stream of input rows, combining the output of each into a new stream of rows.
Applies a Generator to a stream of input rows, combining the
output of each into a new stream of rows. This operation is similar to a flatMap
in functional
programming with one important additional feature, which allows the input rows to be joined with
their output.
the generator expression
when true, each output row is implicitly joined with the input tuple that produced it.
when true, each input row will be output at least once, even if the output of the
given generator
is empty. outer
has no effect when join
is false.
Qualifier for the attributes of generator(UDTF)
The output schema of the Generator.
Children logical plan node
A GROUP BY clause with GROUPING SETS can generate a result set equivalent to generated by a UNION ALL of multiple simple GROUP BY clauses.
A GROUP BY clause with GROUPING SETS can generate a result set equivalent to generated by a UNION ALL of multiple simple GROUP BY clauses.
We will transform GROUPING SETS into logical plan Aggregate(.., Expand) in Analyzer
A list of bitmasks, each of the bitmask indicates the selected GroupBy expressions
The Group By expressions candidates, take effective only if the associated bit in the bitmask set to 1.
Child operator
The Aggregation expressions, those non selected group by expressions will be considered as constant null if it appears in the expressions
The attribute represents the virtual column GROUPINGID, and it's also
the bitmask indicates the selected GroupBy Expressions for each
aggregating output row.
The associated output will be one of the value in bitmasks
A logical plan node with no children.
Performs a physical redistribution of the data.
Performs a physical redistribution of the data. Used when the consumer of the query result have expectations about the distribution and ordering of partitioned input data.
Return a new RDD that has exactly numPartitions
partitions.
Return a new RDD that has exactly numPartitions
partitions. Differs from
RepartitionByExpression as this method is called directly by DataFrame's, because the user
asked for coalesce
or repartition
. RepartitionByExpression is used when the consumer
of the output requires some specific ordering or distribution of the data.
This method repartitions data using Expressions, and receives information about the number of partitions during execution.
This method repartitions data using Expressions, and receives information about the
number of partitions during execution. Used when a specific ordering or distribution is
expected by the consumer of the query result. Use Repartition for RDD-like
coalesce
and repartition
.
Rollup is a syntactic sugar for GROUPING SETS, and will be transformed to GroupingSets, and eventually will be transformed to Aggregate(.., Expand) in Analyzer
Rollup is a syntactic sugar for GROUPING SETS, and will be transformed to GroupingSets, and eventually will be transformed to Aggregate(.., Expand) in Analyzer
The Group By expressions candidates, take effective only if the associated bit in the bitmask set to 1.
Child operator
The Aggregation expressions, those non selected group by expressions will be considered as constant null if it appears in the expressions
The attribute represents the virtual column GROUPINGID, and it's also the bitmask indicates the selected GroupBy Expressions for each aggregating output row.
Sample the dataset.
Sample the dataset.
Lower-bound of the sampling probability (usually 0.0)
Upper-bound of the sampling probability. The expected fraction sampled will be ub - lb.
Whether to sample with replacement.
the random seed
the LogicalPlan
A placeholder for implementation specific input and output properties when passing data to a script.
A placeholder for implementation specific input and output properties when passing data to a script. For example, in Hive this would specify which SerDes to use.
Transforms the input by forking and running the specified script.
Transforms the input by forking and running the specified script.
the set of expression that should be passed to the script.
the command that should be executed.
the attributes that are produced by the script.
the input and output schema applied in the execution of the script.
The ordering expressions
True means global sorting apply for entire data set, False means sorting only apply within the partition.
Child logical plan
A logical plan node with single child.
A container for holding named common table expressions (CTEs) and a query plan.
A container for holding named common table expressions (CTEs) and a query plan. This operator will be removed during analysis and the relations will be substituted into child.
The final query of this CTE.
Queries that this CTE defined, key is the alias of the CTE definition, value is the CTE definition.
A relation with one row.
A relation with one row. This is used in "SELECT ..." without a from clause.
A logical node that represents a non-query command to be executed by the system. For example, commands can be used by parsers to represent DDL operations. Commands, unlike queries, are eagerly executed.