Package

org.apache.spark.sql

execution

Permalink

package execution

The physical execution component of Spark SQL. Note that this is a private package. All classes in catalyst are considered an internal API to Spark SQL and are subject to change between minor releases.

Linear Supertypes
AnyRef, Any
Ordering
  1. Alphabetic
  2. By Inheritance
Inherited
  1. execution
  2. AnyRef
  3. Any
  1. Hide All
  2. Show All
Visibility
  1. Public
  2. All

Type Members

  1. case class AppendColumnsExec(func: (Any) ⇒ Any, deserializer: Expression, serializer: Seq[NamedExpression], child: SparkPlan) extends SparkPlan with UnaryExecNode with Product with Serializable

    Permalink

    Applies the given function to each input row, appending the encoded result at the end of the row.

  2. case class AppendColumnsWithObjectExec(func: (Any) ⇒ Any, inputSerializer: Seq[NamedExpression], newColumnsSerializer: Seq[NamedExpression], child: SparkPlan) extends SparkPlan with ObjectConsumerExec with Product with Serializable

    Permalink

    An optimized version of AppendColumnsExec, that can be executed on deserialized object directly.

  3. trait BaseLimitExec extends SparkPlan with UnaryExecNode with CodegenSupport

    Permalink

    Helper trait which defines methods that are shared by both LocalLimitExec and GlobalLimitExec.

  4. abstract class BufferedRowIterator extends AnyRef

    Permalink
  5. case class CoGroupExec(func: (Any, Iterator[Any], Iterator[Any]) ⇒ TraversableOnce[Any], keyDeserializer: Expression, leftDeserializer: Expression, rightDeserializer: Expression, leftGroup: Seq[Attribute], rightGroup: Seq[Attribute], leftAttr: Seq[Attribute], rightAttr: Seq[Attribute], outputObjAttr: Attribute, left: SparkPlan, right: SparkPlan) extends SparkPlan with BinaryExecNode with ObjectProducerExec with Product with Serializable

    Permalink

    Co-groups the data from left and right children, and calls the function with each group and 2 iterators containing all elements in the group from left and right side.

    Co-groups the data from left and right children, and calls the function with each group and 2 iterators containing all elements in the group from left and right side. The result of this function is flattened before being output.

  6. class CoGroupedIterator extends Iterator[(InternalRow, Iterator[InternalRow], Iterator[InternalRow])]

    Permalink

    Iterates over GroupedIterators and returns the cogrouped data, i.e.

    Iterates over GroupedIterators and returns the cogrouped data, i.e. each record is a grouping key with its associated values from all GroupedIterators. Note: we assume the output of each GroupedIterator is ordered by the grouping key.

  7. case class CoalesceExec(numPartitions: Int, child: SparkPlan) extends SparkPlan with UnaryExecNode with Product with Serializable

    Permalink

    Physical plan for returning a new RDD that has exactly numPartitions partitions.

    Physical plan for returning a new RDD that has exactly numPartitions partitions. Similar to coalesce defined on an RDD, this operation results in a narrow dependency, e.g. if you go from 1000 partitions to 100 partitions, there will not be a shuffle, instead each of the 100 new partitions will claim 10 of the current partitions.

  8. class CoalescedPartitioner extends Partitioner

    Permalink

    A Partitioner that might group together one or more partitions from the parent.

  9. trait CodegenSupport extends SparkPlan

    Permalink

    An interface for those physical operators that support codegen.

  10. case class CollapseCodegenStages(conf: SQLConf) extends Rule[SparkPlan] with Product with Serializable

    Permalink

    Find the chained plans that support codegen, collapse them together as WholeStageCodegen.

  11. case class CollectLimitExec(limit: Int, child: SparkPlan) extends SparkPlan with UnaryExecNode with Product with Serializable

    Permalink

    Take the first limit elements and collect them to a single partition.

    Take the first limit elements and collect them to a single partition.

    This operator will be used when a logical Limit operation is the final operator in an logical plan, which happens when the user is collecting results back to the driver.

  12. case class DeserializeToObjectExec(deserializer: Expression, outputObjAttr: Attribute, child: SparkPlan) extends SparkPlan with UnaryExecNode with ObjectProducerExec with CodegenSupport with Product with Serializable

    Permalink

    Takes the input row from child and turns it into object using the given deserializer expression.

    Takes the input row from child and turns it into object using the given deserializer expression. The output of this operator is a single-field safe row containing the deserialized object.

  13. case class ExpandExec(projections: Seq[Seq[Expression]], output: Seq[Attribute], child: SparkPlan) extends SparkPlan with UnaryExecNode with CodegenSupport with Product with Serializable

    Permalink

    Apply all of the GroupExpressions to every input row, hence we will get multiple output rows for an input row.

    Apply all of the GroupExpressions to every input row, hence we will get multiple output rows for an input row.

    projections

    The group of expressions, all of the group expressions should output the same schema specified bye the parameter output

    output

    The output Schema

    child

    Child operator

  14. case class FilterExec(condition: Expression, child: SparkPlan) extends SparkPlan with UnaryExecNode with CodegenSupport with PredicateHelper with Product with Serializable

    Permalink

    Physical plan for Filter.

  15. case class FlatMapGroupsInRExec(func: Array[Byte], packageNames: Array[Byte], broadcastVars: Array[Broadcast[AnyRef]], inputSchema: StructType, outputSchema: StructType, keyDeserializer: Expression, valueDeserializer: Expression, groupingAttributes: Seq[Attribute], dataAttributes: Seq[Attribute], outputObjAttr: Attribute, child: SparkPlan) extends SparkPlan with UnaryExecNode with ObjectProducerExec with Product with Serializable

    Permalink

    Groups the input rows together and calls the R function with each group and an iterator containing all elements in the group.

    Groups the input rows together and calls the R function with each group and an iterator containing all elements in the group. The result of this function is flattened before being output.

  16. case class GenerateExec(generator: Generator, join: Boolean, outer: Boolean, output: Seq[Attribute], child: SparkPlan) extends SparkPlan with UnaryExecNode with Product with Serializable

    Permalink

    Applies a Generator to a stream of input rows, combining the output of each into a new stream of rows.

    Applies a Generator to a stream of input rows, combining the output of each into a new stream of rows. This operation is similar to a flatMap in functional programming with one important additional feature, which allows the input rows to be joined with their output.

    generator

    the generator expression

    join

    when true, each output row is implicitly joined with the input tuple that produced it.

    outer

    when true, each input row will be output at least once, even if the output of the given generator is empty. outer has no effect when join is false.

    output

    the output attributes of this node, which constructed in analysis phase, and we can not change it, as the parent node bound with it already.

  17. case class GlobalLimitExec(limit: Int, child: SparkPlan) extends SparkPlan with BaseLimitExec with Product with Serializable

    Permalink

    Take the first limit elements of the child's single output partition.

  18. class GroupedIterator extends Iterator[(InternalRow, Iterator[InternalRow])]

    Permalink

    Iterates over a presorted set of rows, chunking it up by the grouping expression.

    Iterates over a presorted set of rows, chunking it up by the grouping expression. Each call to next will return a pair containing the current group and an iterator that will return all the elements of that group. Iterators for each group are lazily constructed by extracting rows from the input iterator. As such, full groups are never materialized by this class.

    Example input:

    Input: [a, 1], [b, 2], [b, 3]
    Grouping: x#1
    InputSchema: x#1, y#2

    Result:

    First call to next():  ([a], Iterator([a, 1])
    Second call to next(): ([b], Iterator([b, 2], [b, 3])

    Note, the class does not handle the case of an empty input for simplicity of implementation. Use the factory to construct a new instance.

  19. case class InputAdapter(child: SparkPlan) extends SparkPlan with UnaryExecNode with CodegenSupport with Product with Serializable

    Permalink

    InputAdapter is used to hide a SparkPlan from a subtree that support codegen.

    InputAdapter is used to hide a SparkPlan from a subtree that support codegen.

    This is the leaf node of a tree with WholeStageCodegen that is used to generate code that consumes an RDD iterator of InternalRow.

  20. case class LocalLimitExec(limit: Int, child: SparkPlan) extends SparkPlan with BaseLimitExec with Product with Serializable

    Permalink

    Take the first limit elements of each child partition, but do not collect or shuffle them.

  21. case class MapElementsExec(func: AnyRef, outputObjAttr: Attribute, child: SparkPlan) extends SparkPlan with ObjectConsumerExec with ObjectProducerExec with CodegenSupport with Product with Serializable

    Permalink

    Applies the given function to each input object.

    Applies the given function to each input object. The output of its child must be a single-field row containing the input object.

    This operator is kind of a safe version of ProjectExec, as its output is custom object, we need to use safe row to contain it.

  22. case class MapGroupsExec(func: (Any, Iterator[Any]) ⇒ TraversableOnce[Any], keyDeserializer: Expression, valueDeserializer: Expression, groupingAttributes: Seq[Attribute], dataAttributes: Seq[Attribute], outputObjAttr: Attribute, child: SparkPlan) extends SparkPlan with UnaryExecNode with ObjectProducerExec with Product with Serializable

    Permalink

    Groups the input rows together and calls the function with each group and an iterator containing all elements in the group.

    Groups the input rows together and calls the function with each group and an iterator containing all elements in the group. The result of this function is flattened before being output.

  23. case class MapPartitionsExec(func: (Iterator[Any]) ⇒ Iterator[Any], outputObjAttr: Attribute, child: SparkPlan) extends SparkPlan with ObjectConsumerExec with ObjectProducerExec with Product with Serializable

    Permalink

    Applies the given function to input object iterator.

    Applies the given function to input object iterator. The output of its child must be a single-field row containing the input object.

  24. trait ObjectConsumerExec extends SparkPlan with UnaryExecNode

    Permalink

    Physical version of ObjectConsumer.

  25. trait ObjectProducerExec extends SparkPlan

    Permalink

    Physical version of ObjectProducer.

  26. case class OutputFakerExec(output: Seq[Attribute], child: SparkPlan) extends SparkPlan with Product with Serializable

    Permalink

    A plan node that does nothing but lie about the output of its child.

    A plan node that does nothing but lie about the output of its child. Used to spice a (hopefully structurally equivalent) tree from a different optimization sequence into an already resolved tree.

  27. case class PlanSubqueries(sparkSession: SparkSession) extends Rule[SparkPlan] with Product with Serializable

    Permalink

    Plans scalar subqueries from that are present in the given SparkPlan.

  28. case class ProjectExec(projectList: Seq[NamedExpression], child: SparkPlan) extends SparkPlan with UnaryExecNode with CodegenSupport with Product with Serializable

    Permalink

    Physical plan for Project.

  29. class QueryExecution extends AnyRef

    Permalink

    The primary workflow for executing relational queries using Spark.

    The primary workflow for executing relational queries using Spark. Designed to allow easy access to the intermediate phases of query execution for developers.

    While this is not a public class, we should avoid changing the function names for the sake of changing them, because a lot of developers use the feature for debugging.

  30. class QueryExecutionException extends Exception

    Permalink
  31. case class RangeExec(range: Range) extends SparkPlan with LeafExecNode with CodegenSupport with Product with Serializable

    Permalink

    Physical plan for range (generating a range of 64 bit numbers).

  32. case class SampleExec(lowerBound: Double, upperBound: Double, withReplacement: Boolean, seed: Long, child: SparkPlan) extends SparkPlan with UnaryExecNode with CodegenSupport with Product with Serializable

    Permalink

    Physical plan for sampling the dataset.

    Physical plan for sampling the dataset.

    lowerBound

    Lower-bound of the sampling probability (usually 0.0)

    upperBound

    Upper-bound of the sampling probability. The expected fraction sampled will be ub - lb.

    withReplacement

    Whether to sample with replacement.

    seed

    the random seed

    child

    the SparkPlan

  33. case class ScalarSubquery(executedPlan: SparkPlan, exprId: ExprId) extends SubqueryExpression with Product with Serializable

    Permalink

    A subquery that will return only one row and one column.

    A subquery that will return only one row and one column.

    This is the physical copy of ScalarSubquery to be used inside SparkPlan.

  34. case class SerializeFromObjectExec(serializer: Seq[NamedExpression], child: SparkPlan) extends SparkPlan with ObjectConsumerExec with CodegenSupport with Product with Serializable

    Permalink

    Takes the input object from child and turns in into unsafe row using the given serializer expression.

    Takes the input object from child and turns in into unsafe row using the given serializer expression. The output of its child must be a single-field row containing the input object.

  35. class ShuffledRowRDD extends RDD[InternalRow]

    Permalink

    This is a specialized version of org.apache.spark.rdd.ShuffledRDD that is optimized for shuffling rows instead of Java key-value pairs.

    This is a specialized version of org.apache.spark.rdd.ShuffledRDD that is optimized for shuffling rows instead of Java key-value pairs. Note that something like this should eventually be implemented in Spark core, but that is blocked by some more general refactorings to shuffle interfaces / internals.

    This RDD takes a ShuffleDependency (dependency), and an optional array of partition start indices as input arguments (specifiedPartitionStartIndices).

    The dependency has the parent RDD of this RDD, which represents the dataset before shuffle (i.e. map output). Elements of this RDD are (partitionId, Row) pairs. Partition ids should be in the range [0, numPartitions - 1]. dependency.partitioner is the original partitioner used to partition map output, and dependency.partitioner.numPartitions is the number of pre-shuffle partitions (i.e. the number of partitions of the map output).

    When specifiedPartitionStartIndices is defined, specifiedPartitionStartIndices.length will be the number of post-shuffle partitions. For this case, the ith post-shuffle partition includes specifiedPartitionStartIndices[i] to specifiedPartitionStartIndices[i+1] - 1 (inclusive).

    When specifiedPartitionStartIndices is not defined, there will be dependency.partitioner.numPartitions post-shuffle partitions. For this case, a post-shuffle partition is created for every pre-shuffle partition.

  36. case class SortExec(sortOrder: Seq[SortOrder], global: Boolean, child: SparkPlan, testSpillFrequency: Int = 0) extends SparkPlan with UnaryExecNode with CodegenSupport with Product with Serializable

    Permalink

    Performs (external) sorting.

    Performs (external) sorting.

    global

    when true performs a global sort of all partitions by shuffling the data first if necessary.

    testSpillFrequency

    Method for configuring periodic spilling in unit tests. If set, will spill every frequency records.

  37. class SparkOptimizer extends Optimizer

    Permalink
  38. abstract class SparkPlan extends QueryPlan[SparkPlan] with Logging with Serializable

    Permalink

    The base class for physical operators.

    The base class for physical operators.

    The naming convention is that physical operators end with "Exec" suffix, e.g. ProjectExec.

  39. class SparkPlanInfo extends AnyRef

    Permalink

    :: DeveloperApi :: Stores information about a SQL SparkPlan.

    :: DeveloperApi :: Stores information about a SQL SparkPlan.

    Annotations
    @DeveloperApi()
  40. class SparkPlanner extends SparkStrategies

    Permalink
  41. class SparkSqlAstBuilder extends AstBuilder

    Permalink

    Builder that converts an ANTLR ParseTree into a LogicalPlan/Expression/TableIdentifier.

  42. class SparkSqlParser extends AbstractSqlParser

    Permalink

    Concrete parser for Spark SQL statements.

  43. abstract class SparkStrategy extends GenericStrategy[SparkPlan]

    Permalink

    Converts a logical plan into zero or more SparkPlans.

    Converts a logical plan into zero or more SparkPlans. This API is exposed for experimenting with the query planner and is not designed to be stable across spark releases. Developers writing libraries should instead consider using the stable APIs provided in org.apache.spark.sql.sources

    Annotations
    @DeveloperApi()
  44. case class SubqueryExec(name: String, child: SparkPlan) extends SparkPlan with UnaryExecNode with Product with Serializable

    Permalink

    Physical plan for a subquery.

    Physical plan for a subquery.

    This is used to generate tree string for SparkScalarSubquery.

  45. case class TakeOrderedAndProjectExec(limit: Int, sortOrder: Seq[SortOrder], projectList: Option[Seq[NamedExpression]], child: SparkPlan) extends SparkPlan with UnaryExecNode with Product with Serializable

    Permalink

    Take the first limit elements as defined by the sortOrder, and do projection if needed.

    Take the first limit elements as defined by the sortOrder, and do projection if needed. This is logically equivalent to having a Limit operator after a SortExec operator, or having a ProjectExec operator between them. This could have been named TopK, but Spark's top operator does the opposite in ordering so we name it TakeOrdered to avoid confusion.

  46. case class UnionExec(children: Seq[SparkPlan]) extends SparkPlan with Product with Serializable

    Permalink

    Physical plan for unioning two plans, without a distinct.

    Physical plan for unioning two plans, without a distinct. This is UNION ALL in SQL.

  47. final class UnsafeFixedWidthAggregationMap extends AnyRef

    Permalink
  48. final class UnsafeKVExternalSorter extends AnyRef

    Permalink
  49. case class WholeStageCodegenExec(child: SparkPlan) extends SparkPlan with UnaryExecNode with CodegenSupport with Product with Serializable

    Permalink

    WholeStageCodegen compile a subtree of plans that support codegen together into single Java function.

    WholeStageCodegen compile a subtree of plans that support codegen together into single Java function.

    Here is the call graph of to generate Java source (plan A support codegen, but plan B does not):

    WholeStageCodegen Plan A FakeInput Plan B

    -> execute() | doExecute() ---------> inputRDDs() -------> inputRDDs() ------> execute() | +-----------------> produce() | doProduce() -------> produce() | doProduce() | doConsume() <--------- consume() | doConsume() <-------- consume()

    SparkPlan A should override doProduce() and doConsume().

    doCodeGen() will create a CodeGenContext, which will hold a list of variables for input, used to generated code for BoundReference.

  50. case class WindowExec(windowExpression: Seq[NamedExpression], partitionSpec: Seq[Expression], orderSpec: Seq[SortOrder], child: SparkPlan) extends SparkPlan with UnaryExecNode with Product with Serializable

    Permalink

    This class calculates and outputs (windowed) aggregates over the rows in a single (sorted) partition.

    This class calculates and outputs (windowed) aggregates over the rows in a single (sorted) partition. The aggregates are calculated for each row in the group. Special processing instructions, frames, are used to calculate these aggregates. Frames are processed in the order specified in the window specification (the ORDER BY ... clause). There are four different frame types: - Entire partition: The frame is the entire partition, i.e. UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING. For this case, window function will take all rows as inputs and be evaluated once. - Growing frame: We only add new rows into the frame, i.e. UNBOUNDED PRECEDING AND .... Every time we move to a new row to process, we add some rows to the frame. We do not remove rows from this frame. - Shrinking frame: We only remove rows from the frame, i.e. ... AND UNBOUNDED FOLLOWING. Every time we move to a new row to process, we remove some rows from the frame. We do not add rows to this frame. - Moving frame: Every time we move to a new row to process, we remove some rows from the frame and we add some rows to the frame. Examples are: 1 PRECEDING AND CURRENT ROW and 1 FOLLOWING AND 2 FOLLOWING. - Offset frame: The frame consist of one row, which is an offset number of rows away from the current row. Only OffsetWindowFunctions can be processed in an offset frame.

    Different frame boundaries can be used in Growing, Shrinking and Moving frames. A frame boundary can be either Row or Range based: - Row Based: A row based boundary is based on the position of the row within the partition. An offset indicates the number of rows above or below the current row, the frame for the current row starts or ends. For instance, given a row based sliding frame with a lower bound offset of -1 and a upper bound offset of +2. The frame for row with index 5 would range from index 4 to index 6. - Range based: A range based boundary is based on the actual value of the ORDER BY expression(s). An offset is used to alter the value of the ORDER BY expression, for instance if the current order by expression has a value of 10 and the lower bound offset is -3, the resulting lower bound for the current row will be 10 - 3 = 7. This however puts a number of constraints on the ORDER BY expressions: there can be only one expression and this expression must have a numerical data type. An exception can be made when the offset is 0, because no value modification is needed, in this case multiple and non-numeric ORDER BY expression are allowed.

    This is quite an expensive operator because every row for a single group must be in the same partition and partitions must be sorted according to the grouping and sort order. The operator requires the planner to take care of the partitioning and sorting.

    The operator is semi-blocking. The window functions and aggregates are calculated one group at a time, the result will only be made available after the processing for the entire group has finished. The operator is able to process different frame configurations at the same time. This is done by delegating the actual frame processing (i.e. calculation of the window functions) to specialized classes, see WindowFunctionFrame, which take care of their own frame type: Entire Partition, Sliding, Growing & Shrinking. Boundary evaluation is also delegated to a pair of specialized classes: RowBoundOrdering & RangeBoundOrdering.

Value Members

  1. object GroupedIterator

    Permalink
  2. object ObjectOperator

    Permalink

    Helper functions for physical operators that work with user defined objects.

  3. object RDDConversions

    Permalink
  4. object RowIterator

    Permalink
  5. object SortPrefixUtils

    Permalink
  6. object SparkPlan extends Serializable

    Permalink
  7. object UnaryExecNode extends Serializable

    Permalink
  8. object WholeStageCodegenExec extends Serializable

    Permalink
  9. package aggregate

    Permalink
  10. package columnar

    Permalink
  11. package command

    Permalink
  12. package datasources

    Permalink
  13. package debug

    Permalink

    Contains methods for debugging query execution.

    Contains methods for debugging query execution.

    Usage:

    import org.apache.spark.sql.execution.debug._
    sql("SELECT 1").debug()
    sql("SELECT 1").debugCodegen()
  14. package exchange

    Permalink
  15. package joins

    Permalink

    Physical execution operators for join operations.

  16. package metric

    Permalink
  17. package python

    Permalink
  18. package streaming

    Permalink
  19. package ui

    Permalink
  20. package vectorized

    Permalink

Inherited from AnyRef

Inherited from Any

Ungrouped