Package

org.apache.spark.sql.catalyst.plans

physical

Permalink

package physical

Visibility
  1. Public
  2. All

Type Members

  1. case class ClusteredDistribution(clustering: Seq[Expression]) extends Distribution with Product with Serializable

    Permalink

    Represents data where tuples that share the same values for the clustering Expressions will be co-located.

    Represents data where tuples that share the same values for the clustering Expressions will be co-located. Based on the context, this can mean such tuples are either co-located in the same partition or they will be contiguous within a single partition.

  2. sealed trait Distribution extends AnyRef

    Permalink

    Specifies how tuples that share common expressions will be distributed when a query is executed in parallel on many machines.

    Specifies how tuples that share common expressions will be distributed when a query is executed in parallel on many machines. Distribution can be used to refer to two distinct physical properties:

    • Inter-node partitioning of data: In this case the distribution describes how tuples are partitioned across physical machines in a cluster. Knowing this property allows some operators (e.g., Aggregate) to perform partition local operations instead of global ones.
    • Intra-partition ordering of data: In this case the distribution describes guarantees made about how tuples are distributed within a single partition.
  3. case class HashPartitioning(expressions: Seq[Expression], numPartitions: Int) extends Expression with Partitioning with Unevaluable with Product with Serializable

    Permalink

    Represents a partitioning where rows are split up across partitions based on the hash of expressions.

    Represents a partitioning where rows are split up across partitions based on the hash of expressions. All rows where expressions evaluate to the same values are guaranteed to be in the same partition.

  4. case class OrderedDistribution(ordering: Seq[SortOrder]) extends Distribution with Product with Serializable

    Permalink

    Represents data where tuples have been ordered according to the ordering Expressions.

    Represents data where tuples have been ordered according to the ordering Expressions. This is a strictly stronger guarantee than ClusteredDistribution as an ordering will ensure that tuples that share the same value for the ordering expressions are contiguous and will never be split across partitions.

  5. sealed trait Partitioning extends AnyRef

    Permalink

    Describes how an operator's output is split across partitions.

    Describes how an operator's output is split across partitions. The compatibleWith, guarantees, and satisfies methods describe relationships between child partitionings, target partitionings, and Distributions. These relations are described more precisely in their individual method docs, but at a high level:

    • satisfies is a relationship between partitionings and distributions.
    • compatibleWith is relationships between an operator's child output partitionings.
    • guarantees is a relationship between a child's existing output partitioning and a target output partitioning.

    Diagrammatically:

    +--------------+ | Distribution | +--------------+ | satisfies | +--------------+ +--------------+ | Child | | Target | +----| Partitioning |----guarantees--->| Partitioning | | +--------------+ +--------------+ | | | | compatibleWith | | +------------+

  6. case class PartitioningCollection(partitionings: Seq[Partitioning]) extends Expression with Partitioning with Unevaluable with Product with Serializable

    Permalink

    A collection of Partitionings that can be used to describe the partitioning scheme of the output of a physical operator.

    A collection of Partitionings that can be used to describe the partitioning scheme of the output of a physical operator. It is usually used for an operator that has multiple children. In this case, a Partitioning in this collection describes how this operator's output is partitioned based on expressions from a child. For example, for a Join operator on two tables A and B with a join condition A.key1 = B.key2, assuming we use HashPartitioning schema, there are two Partitionings can be used to describe how the output of this Join operator is partitioned, which are HashPartitioning(A.key1) and HashPartitioning(B.key2). It is also worth noting that partitionings in this collection do not need to be equivalent, which is useful for Outer Join operators.

  7. case class RangePartitioning(ordering: Seq[SortOrder], numPartitions: Int) extends Expression with Partitioning with Unevaluable with Product with Serializable

    Permalink

    Represents a partitioning where rows are split across partitions based on some total ordering of the expressions specified in ordering.

    Represents a partitioning where rows are split across partitions based on some total ordering of the expressions specified in ordering. When data is partitioned in this manner the following two conditions are guaranteed to hold:

    • All row where the expressions in ordering evaluate to the same values will be in the same partition.
    • Each partition will have a min and max row, relative to the given ordering. All rows that are in between min and max in this ordering will reside in this partition.

    This class extends expression primarily so that transformations over expression will descend into its child.

  8. case class RoundRobinPartitioning(numPartitions: Int) extends Partitioning with Product with Serializable

    Permalink

    Represents a partitioning where rows are distributed evenly across output partitions by starting from a random target partition number and distributing rows in a round-robin fashion.

    Represents a partitioning where rows are distributed evenly across output partitions by starting from a random target partition number and distributing rows in a round-robin fashion. This partitioning is used when implementing the DataFrame.repartition() operator.

  9. case class UnknownPartitioning(numPartitions: Int) extends Partitioning with Product with Serializable

    Permalink

Value Members

  1. object AllTuples extends Distribution with Product with Serializable

    Permalink

    Represents a distribution that only has a single partition and all tuples of the dataset are co-located.

  2. object Partitioning

    Permalink
  3. object SinglePartition extends Partitioning with Product with Serializable

    Permalink
  4. object UnspecifiedDistribution extends Distribution with Product with Serializable

    Permalink

    Represents a distribution where no promises are made about co-location of data.

Ungrouped