execution

Type Members

case class AppendColumns[T, U](func: (T) ⇒ U, tEncoder: ExpressionEncoder[T], uEncoder: ExpressionEncoder[U], newColumns: Seq[Attribute], child: SparkPlan) extends SparkPlan with UnaryNode with Product with Serializable

Applies the given function to each input row, appending the encoded result at the end of the row.
case class BatchPythonEvaluation(udf: PythonUDF, output: Seq[Attribute], child: SparkPlan) extends SparkPlan with Product with Serializable

Uses PythonRDD to evaluate a PythonUDF, one partition of tuples at a time.
Uses PythonRDD to evaluate a PythonUDF, one partition of tuples at a time.
Python evaluation works by sending the necessary (projected) input data via a socket to an external Python process, and combine the result from the Python process with the original row.
For each row we send to Python, we also put it in a queue. For each output row from Python, we drain the queue to find the original input row. Note that if the Python process is way too slow, this could lead to the queue growing unbounded and eventually run out of memory.
case class CacheTableCommand(tableName: String, plan: Option[LogicalPlan], isLazy: Boolean) extends LogicalPlan with RunnableCommand with Product with Serializable
case class CoGroup[Key, Left, Right, Result](func: (Key, Iterator[Left], Iterator[Right]) ⇒ TraversableOnce[Result], keyEnc: ExpressionEncoder[Key], leftEnc: ExpressionEncoder[Left], rightEnc: ExpressionEncoder[Right], resultEnc: ExpressionEncoder[Result], output: Seq[Attribute], leftGroup: Seq[Attribute], rightGroup: Seq[Attribute], left: SparkPlan, right: SparkPlan) extends SparkPlan with BinaryNode with Product with Serializable

Co-groups the data from left and right children, and calls the function with each group and 2 iterators containing all elements in the group from left and right side.
Co-groups the data from left and right children, and calls the function with each group and 2 iterators containing all elements in the group from left and right side. The result of this function is encoded and flattened before being output.
class CoGroupedIterator extends Iterator[(InternalRow, Iterator[InternalRow], Iterator[InternalRow])]

Iterates over GroupedIterators and returns the cogrouped data, i.e.
Iterates over GroupedIterators and returns the cogrouped data, i.e. each record is a grouping key with its associated values from all GroupedIterators. Note: we assume the output of each GroupedIterator is ordered by the grouping key.
case class Coalesce(numPartitions: Int, child: SparkPlan) extends SparkPlan with UnaryNode with Product with Serializable

Return a new RDD that has exactly numPartitions partitions.
Return a new RDD that has exactly numPartitions partitions. Similar to coalesce defined on an RDD, this operation results in a narrow dependency, e.g. if you go from 1000 partitions to 100 partitions, there will not be a shuffle, instead each of the 100 new partitions will claim 10 of the current partitions.
class CoalescedPartitioner extends Partitioner

A Partitioner that might group together one or more partitions from the parent.
case class ConvertToSafe(child: SparkPlan) extends SparkPlan with UnaryNode with Product with Serializable

Converts UnsafeRows back into Java-object-based rows.
case class ConvertToUnsafe(child: SparkPlan) extends SparkPlan with UnaryNode with Product with Serializable

Converts Java-object-based rows into UnsafeRows.
case class DescribeCommand(child: SparkPlan, output: Seq[Attribute], isExtended: Boolean) extends LogicalPlan with RunnableCommand with Product with Serializable
case class DescribeFunction(functionName: String, isExtended: Boolean) extends LogicalPlan with RunnableCommand with Product with Serializable

A command for users to get the usage of a registered function.
A command for users to get the usage of a registered function. The syntax of using this command in SQL is
```
DESCRIBE FUNCTION [EXTENDED] upper;
```
case class EvaluatePython(udf: PythonUDF, child: LogicalPlan, resultAttribute: AttributeReference) extends catalyst.plans.logical.UnaryNode with Product with Serializable

Evaluates a PythonUDF, appending the result to the end of the input tuple.
case class Except(left: SparkPlan, right: SparkPlan) extends SparkPlan with BinaryNode with Product with Serializable

Returns a table with the elements from left that are not in right using the built-in spark subtract function.
case class Exchange(newPartitioning: Partitioning, child: SparkPlan, coordinator: Option[ExchangeCoordinator]) extends SparkPlan with UnaryNode with Product with Serializable

Performs a shuffle that will result in the desired newPartitioning.
case class Expand(projections: Seq[Seq[Expression]], output: Seq[Attribute], child: SparkPlan) extends SparkPlan with UnaryNode with Product with Serializable

Apply the all of the GroupExpressions to every input row, hence we will get multiple output rows for a input row.
Apply the all of the GroupExpressions to every input row, hence we will get multiple output rows for a input row.
projections
The group of expressions, all of the group expressions should output the same schema specified bye the parameter output
output
The output Schema
child
Child operator
case class ExplainCommand(logicalPlan: LogicalPlan, output: Seq[Attribute] = ..., extended: Boolean = false) extends LogicalPlan with RunnableCommand with Product with Serializable

An explain command for users to see how a command will be executed.
An explain command for users to see how a command will be executed.
Note that this command takes in a logical plan, runs the optimizer on the logical plan (but do NOT actually execute it).
case class Filter(condition: Expression, child: SparkPlan) extends SparkPlan with UnaryNode with Product with Serializable
case class Generate(generator: Generator, join: Boolean, outer: Boolean, output: Seq[Attribute], child: SparkPlan) extends SparkPlan with UnaryNode with Product with Serializable

Applies a Generator to a stream of input rows, combining the output of each into a new stream of rows.
Applies a Generator to a stream of input rows, combining the output of each into a new stream of rows. This operation is similar to a flatMap in functional programming with one important additional feature, which allows the input rows to be joined with their output.
generator
the generator expression
join
when true, each output row is implicitly joined with the input tuple that produced it.
outer
when true, each input row will be output at least once, even if the output of the given generator is empty. outer has no effect when join is false.
output
the output attributes of this node, which constructed in analysis phase, and we can not change it, as the parent node bound with it already.
class GroupedIterator extends Iterator[(InternalRow, Iterator[InternalRow])]

Iterates over a presorted set of rows, chunking it up by the grouping expression.
Iterates over a presorted set of rows, chunking it up by the grouping expression. Each call to next will return a pair containing the current group and an iterator that will return all the elements of that group. Iterators for each group are lazily constructed by extracting rows from the input iterator. As such, full groups are never materialized by this class.
Example input:
```
Input: [a, 1], [b, 2], [b, 3]
Grouping: x#1
InputSchema: x#1, y#2
```
Result:
```
First call to next():  ([a], Iterator([a, 1])
Second call to next(): ([b], Iterator([b, 2], [b, 3])
```
Note, the class does not handle the case of an empty input for simplicity of implementation. Use the factory to construct a new instance.
case class Intersect(left: SparkPlan, right: SparkPlan) extends SparkPlan with BinaryNode with Product with Serializable

Returns the rows in left that also appear in right using the built in spark intersection function.
case class Limit(limit: Int, child: SparkPlan) extends SparkPlan with UnaryNode with Product with Serializable

Take the first limit elements.
Take the first limit elements. Note that the implementation is different depending on whether this is a terminal operator or not. If it is terminal and is invoked using executeCollect, this operator uses something similar to Spark's take method on the Spark driver. If it is not terminal or is invoked using execute, we first take the limit on each partition, and then repartition all the data to a single partition to compute the global limit.
case class MapGroups[K, T, U](func: (K, Iterator[T]) ⇒ TraversableOnce[U], kEncoder: ExpressionEncoder[K], tEncoder: ExpressionEncoder[T], uEncoder: ExpressionEncoder[U], groupingAttributes: Seq[Attribute], output: Seq[Attribute], child: SparkPlan) extends SparkPlan with UnaryNode with Product with Serializable

Groups the input rows together and calls the function with each group and an iterator containing all elements in the group.
Groups the input rows together and calls the function with each group and an iterator containing all elements in the group. The result of this function is encoded and flattened before being output.
case class MapPartitions[T, U](func: (Iterator[T]) ⇒ Iterator[U], tEncoder: ExpressionEncoder[T], uEncoder: ExpressionEncoder[U], output: Seq[Attribute], child: SparkPlan) extends SparkPlan with UnaryNode with Product with Serializable

Applies the given function to each input row and encodes the result.
case class OutputFaker(output: Seq[Attribute], child: SparkPlan) extends SparkPlan with Product with Serializable

A plan node that does nothing but lie about the output of its child.
A plan node that does nothing but lie about the output of its child. Used to spice a (hopefully structurally equivalent) tree from a different optimization sequence into an already resolved tree.
case class Project(projectList: Seq[NamedExpression], child: SparkPlan) extends SparkPlan with UnaryNode with Product with Serializable
class QueryExecution extends AnyRef

The primary workflow for executing relational queries using Spark.
The primary workflow for executing relational queries using Spark. Designed to allow easy access to the intermediate phases of query execution for developers.
While this is not a public class, we should avoid changing the function names for the sake of changing them, because a lot of developers use the feature for debugging.
class QueryExecutionException extends Exception
case class Sample(lowerBound: Double, upperBound: Double, withReplacement: Boolean, seed: Long, child: SparkPlan) extends SparkPlan with UnaryNode with Product with Serializable

Sample the dataset.
Sample the dataset.
lowerBound
Lower-bound of the sampling probability (usually 0.0)
upperBound
Upper-bound of the sampling probability. The expected fraction sampled will be ub - lb.
withReplacement
Whether to sample with replacement.
seed
the random seed
child
the SparkPlan
case class SetCommand(kv: Option[(String, Option[String])]) extends LogicalPlan with RunnableCommand with Logging with Product with Serializable
case class ShowFunctions(db: Option[String], pattern: Option[String]) extends LogicalPlan with RunnableCommand with Product with Serializable

A command for users to list all of the registered functions.
A command for users to list all of the registered functions. The syntax of using this command in SQL is:
```
SHOW FUNCTIONS
```
TODO currently we are simply ignore the db
case class ShowTablesCommand(databaseName: Option[String]) extends LogicalPlan with RunnableCommand with Product with Serializable

A command for users to get tables in the given database.
A command for users to get tables in the given database. If a databaseName is not given, the current database will be used. The syntax of using this command in SQL is:
```
SHOW TABLES [IN databaseName]
```
class ShuffledRowRDD extends RDD[InternalRow]

This is a specialized version of org.apache.spark.rdd.ShuffledRDD that is optimized for shuffling rows instead of Java key-value pairs.
This is a specialized version of org.apache.spark.rdd.ShuffledRDD that is optimized for shuffling rows instead of Java key-value pairs. Note that something like this should eventually be implemented in Spark core, but that is blocked by some more general refactorings to shuffle interfaces / internals.
This RDD takes a ShuffleDependency (dependency), and a optional array of partition start indices as input arguments (specifiedPartitionStartIndices).
The dependency has the parent RDD of this RDD, which represents the dataset before shuffle (i.e. map output). Elements of this RDD are (partitionId, Row) pairs. Partition ids should be in the range [0, numPartitions - 1]. dependency.partitioner is the original partitioner used to partition map output, and dependency.partitioner.numPartitions is the number of pre-shuffle partitions (i.e. the number of partitions of the map output).
When specifiedPartitionStartIndices is defined, specifiedPartitionStartIndices.length will be the number of post-shuffle partitions. For this case, the ith post-shuffle partition includes specifiedPartitionStartIndices[i] to specifiedPartitionStartIndices[i+1] - 1 (inclusive).
When specifiedPartitionStartIndices is not defined, there will be dependency.partitioner.numPartitions post-shuffle partitions. For this case, a post-shuffle partition is created for every pre-shuffle partition.
case class Sort(sortOrder: Seq[SortOrder], global: Boolean, child: SparkPlan, testSpillFrequency: Int = 0) extends SparkPlan with UnaryNode with Product with Serializable

Performs (external) sorting.
Performs (external) sorting.
global
when true performs a global sort of all partitions by shuffling the data first if necessary.
testSpillFrequency
Method for configuring periodic spilling in unit tests. If set, will spill every frequency records.
abstract class SparkPlan extends QueryPlan[SparkPlan] with Logging with Serializable

The base class for physical operators.
class SparkPlanner extends SparkStrategies
class SparkSQLParser extends AbstractSparkSQLParser

The top level Spark SQL parser.
The top level Spark SQL parser. This parser recognizes syntaxes that are available for all SQL dialects supported by Spark SQL, and delegates all the other syntaxes to the fallback parser.
case class TakeOrderedAndProject(limit: Int, sortOrder: Seq[SortOrder], projectList: Option[Seq[NamedExpression]], child: SparkPlan) extends SparkPlan with UnaryNode with Product with Serializable

Take the first limit elements as defined by the sortOrder, and do projection if needed.
Take the first limit elements as defined by the sortOrder, and do projection if needed. This is logically equivalent to having a Limit operator after a Sort operator, or having a Project operator between them. This could have been named TopK, but Spark's top operator does the opposite in ordering so we name it TakeOrdered to avoid confusion.
case class UncacheTableCommand(tableName: String) extends LogicalPlan with RunnableCommand with Product with Serializable
case class Union(children: Seq[SparkPlan]) extends SparkPlan with Product with Serializable

Union two plans, without a distinct.
Union two plans, without a distinct. This is UNION ALL in SQL.
final class UnsafeFixedWidthAggregationMap extends AnyRef
final class UnsafeKVExternalSorter extends AnyRef
case class Window(projectList: Seq[Attribute], windowExpression: Seq[NamedExpression], partitionSpec: Seq[Expression], orderSpec: Seq[SortOrder], child: SparkPlan) extends SparkPlan with UnaryNode with Product with Serializable

This class calculates and outputs (windowed) aggregates over the rows in a single (sorted) partition.
This class calculates and outputs (windowed) aggregates over the rows in a single (sorted) partition. The aggregates are calculated for each row in the group. Special processing instructions, frames, are used to calculate these aggregates. Frames are processed in the order specified in the window specification (the ORDER BY ... clause). There are four different frame types: - Entire partition: The frame is the entire partition, i.e. UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING. For this case, window function will take all rows as inputs and be evaluated once. - Growing frame: We only add new rows into the frame, i.e. UNBOUNDED PRECEDING AND .... Every time we move to a new row to process, we add some rows to the frame. We do not remove rows from this frame. - Shrinking frame: We only remove rows from the frame, i.e. ... AND UNBOUNDED FOLLOWING. Every time we move to a new row to process, we remove some rows from the frame. We do not add rows to this frame. - Moving frame: Every time we move to a new row to process, we remove some rows from the frame and we add some rows to the frame. Examples are: 1 PRECEDING AND CURRENT ROW and 1 FOLLOWING AND 2 FOLLOWING.
Different frame boundaries can be used in Growing, Shrinking and Moving frames. A frame boundary can be either Row or Range based: - Row Based: A row based boundary is based on the position of the row within the partition. An offset indicates the number of rows above or below the current row, the frame for the current row starts or ends. For instance, given a row based sliding frame with a lower bound offset of -1 and a upper bound offset of +2. The frame for row with index 5 would range from index 4 to index 6. - Range based: A range based boundary is based on the actual value of the ORDER BY expression(s). An offset is used to alter the value of the ORDER BY expression, for instance if the current order by expression has a value of 10 and the lower bound offset is -3, the resulting lower bound for the current row will be 10 - 3 = 7. This however puts a number of constraints on the ORDER BY expressions: there can be only one expression and this expression must have a numerical data type. An exception can be made when the offset is 0, because no value modification is needed, in this case multiple and non-numeric ORDER BY expression are allowed.
This is quite an expensive operator because every row for a single group must be in the same partition and partitions must be sorted according to the grouping and sort order. The operator requires the planner to take care of the partitioning and sorting.
The operator is semi-blocking. The window functions and aggregates are calculated one group at a time, the result will only be made available after the processing for the entire group has finished. The operator is able to process different frame configurations at the same time. This is done by delegating the actual frame processing (i.e. calculation of the window functions) to specialized classes, see WindowFunctionFrame, which take care of their own frame type: Entire Partition, Sliding, Growing & Shrinking. Boundary evaluation is also delegated to a pair of specialized classes: RowBoundOrdering & RangeBoundOrdering.

Value Members

object ClearCacheCommand extends LogicalPlan with RunnableCommand with Product with Serializable

Clear all cached data from the in-memory cache.
object EvaluatePython extends Serializable
object Exchange extends Serializable
object GroupedIterator
object RDDConversions
object RowIterator
object SortPrefixUtils
package aggregate
package columnar
package datasources
package debug

Contains methods for debugging query execution.
Contains methods for debugging query execution.
Usage:
```
import org.apache.spark.sql.execution.debug._
sql("SELECT key FROM src").debug()
dataFrame.typeCheck()
```
package joins

Physical execution operators for join operations.
package local

package execution

Type Members

case class AppendColumns[T, U](func: (T) ⇒ U, tEncoder: ExpressionEncoder[T], uEncoder: ExpressionEncoder[U], newColumns: Seq[Attribute], child: SparkPlan) extends SparkPlan with UnaryNode with Product with Serializable

case class BatchPythonEvaluation(udf: PythonUDF, output: Seq[Attribute], child: SparkPlan) extends SparkPlan with Product with Serializable

case class CacheTableCommand(tableName: String, plan: Option[LogicalPlan], isLazy: Boolean) extends LogicalPlan with RunnableCommand with Product with Serializable

class CoGroupedIterator extends Iterator[(InternalRow, Iterator[InternalRow], Iterator[InternalRow])]

case class Coalesce(numPartitions: Int, child: SparkPlan) extends SparkPlan with UnaryNode with Product with Serializable

class CoalescedPartitioner extends Partitioner

case class ConvertToSafe(child: SparkPlan) extends SparkPlan with UnaryNode with Product with Serializable

case class ConvertToUnsafe(child: SparkPlan) extends SparkPlan with UnaryNode with Product with Serializable

case class DescribeCommand(child: SparkPlan, output: Seq[Attribute], isExtended: Boolean) extends LogicalPlan with RunnableCommand with Product with Serializable

case class DescribeFunction(functionName: String, isExtended: Boolean) extends LogicalPlan with RunnableCommand with Product with Serializable

case class EvaluatePython(udf: PythonUDF, child: LogicalPlan, resultAttribute: AttributeReference) extends catalyst.plans.logical.UnaryNode with Product with Serializable

case class Except(left: SparkPlan, right: SparkPlan) extends SparkPlan with BinaryNode with Product with Serializable

case class Exchange(newPartitioning: Partitioning, child: SparkPlan, coordinator: Option[ExchangeCoordinator]) extends SparkPlan with UnaryNode with Product with Serializable

case class Expand(projections: Seq[Seq[Expression]], output: Seq[Attribute], child: SparkPlan) extends SparkPlan with UnaryNode with Product with Serializable

case class ExplainCommand(logicalPlan: LogicalPlan, output: Seq[Attribute] = ..., extended: Boolean = false) extends LogicalPlan with RunnableCommand with Product with Serializable

case class Filter(condition: Expression, child: SparkPlan) extends SparkPlan with UnaryNode with Product with Serializable

case class Generate(generator: Generator, join: Boolean, outer: Boolean, output: Seq[Attribute], child: SparkPlan) extends SparkPlan with UnaryNode with Product with Serializable

class GroupedIterator extends Iterator[(InternalRow, Iterator[InternalRow])]

case class Intersect(left: SparkPlan, right: SparkPlan) extends SparkPlan with BinaryNode with Product with Serializable

case class Limit(limit: Int, child: SparkPlan) extends SparkPlan with UnaryNode with Product with Serializable

case class MapPartitions[T, U](func: (Iterator[T]) ⇒ Iterator[U], tEncoder: ExpressionEncoder[T], uEncoder: ExpressionEncoder[U], output: Seq[Attribute], child: SparkPlan) extends SparkPlan with UnaryNode with Product with Serializable

case class OutputFaker(output: Seq[Attribute], child: SparkPlan) extends SparkPlan with Product with Serializable

case class Project(projectList: Seq[NamedExpression], child: SparkPlan) extends SparkPlan with UnaryNode with Product with Serializable

class QueryExecution extends AnyRef

class QueryExecutionException extends Exception

case class Sample(lowerBound: Double, upperBound: Double, withReplacement: Boolean, seed: Long, child: SparkPlan) extends SparkPlan with UnaryNode with Product with Serializable

case class SetCommand(kv: Option[(String, Option[String])]) extends LogicalPlan with RunnableCommand with Logging with Product with Serializable

case class ShowFunctions(db: Option[String], pattern: Option[String]) extends LogicalPlan with RunnableCommand with Product with Serializable

case class ShowTablesCommand(databaseName: Option[String]) extends LogicalPlan with RunnableCommand with Product with Serializable

class ShuffledRowRDD extends RDD[InternalRow]

case class Sort(sortOrder: Seq[SortOrder], global: Boolean, child: SparkPlan, testSpillFrequency: Int = 0) extends SparkPlan with UnaryNode with Product with Serializable

abstract class SparkPlan extends QueryPlan[SparkPlan] with Logging with Serializable

class SparkPlanner extends SparkStrategies

class SparkSQLParser extends AbstractSparkSQLParser

case class TakeOrderedAndProject(limit: Int, sortOrder: Seq[SortOrder], projectList: Option[Seq[NamedExpression]], child: SparkPlan) extends SparkPlan with UnaryNode with Product with Serializable

case class UncacheTableCommand(tableName: String) extends LogicalPlan with RunnableCommand with Product with Serializable

case class Union(children: Seq[SparkPlan]) extends SparkPlan with Product with Serializable

final class UnsafeFixedWidthAggregationMap extends AnyRef

final class UnsafeKVExternalSorter extends AnyRef

case class Window(projectList: Seq[Attribute], windowExpression: Seq[NamedExpression], partitionSpec: Seq[Expression], orderSpec: Seq[SortOrder], child: SparkPlan) extends SparkPlan with UnaryNode with Product with Serializable

Value Members

object ClearCacheCommand extends LogicalPlan with RunnableCommand with Product with Serializable

object EvaluatePython extends Serializable

object Exchange extends Serializable

object GroupedIterator

object RDDConversions

object RowIterator

object SortPrefixUtils

package aggregate

package columnar

package datasources

package debug

package joins

package local

Inherited from AnyRef

Inherited from Any

Ungrouped