Specifies how tuples that share common expressions will be distributed when a query is executed in parallel on many machines.
Specifies how tuples that share common expressions will be distributed when a query is executed in parallel on many machines. Distribution can be used to refer to two distinct physical properties:
Represents a partitioning where rows are split up across partitions based on the hash
of expressions
. All rows where expressions
evaluate to the same values are guaranteed to be
in the same partition.
Represents data where tuples have been ordered according to the ordering
Expressions. This is a strictly stronger guarantee than
ClusteredDistribution as an ordering will ensure that tuples that share the same value for
the ordering expressions are contiguous and will never be split across partitions.
Represents a partitioning where rows are split across partitions based on some total ordering of
the expressions specified in ordering
. When data is partitioned in this manner the following
two conditions are guaranteed to hold:
ordering
evaluate to the same values will be in the same
partition.min
and max
row, relative to the given ordering. All rows
that are in between min
and max
in this ordering
will reside in this partition.This class extends expression primarily so that transformations over expression will descend into its child.
Represents a distribution that only has a single partition and all tuples of the dataset are co-located.
Represents a distribution where no promises are made about co-location of data.
Represents data where tuples that share the same values for the
clustering
Expressions will be co-located. Based on the context, this can mean such tuples are either co-located in the same partition or they will be contiguous within a single partition.