aggregate

Type Members

abstract class AggregationIterator extends Iterator[InternalRow] with Logging

The base class of SortBasedAggregationIterator.
The base class of SortBasedAggregationIterator. It mainly contains two parts: 1. It initializes aggregate functions. 2. It creates two functions, processRow and generateOutput based on AggregateMode of its aggregate functions. processRow is the function to handle an input. generateOutput is used to generate result.
sealed trait BufferSetterGetterUtils extends AnyRef

A helper trait used to create specialized setter and getter for types supported by org.apache.spark.sql.execution.UnsafeFixedWidthAggregationMap's buffer.
A helper trait used to create specialized setter and getter for types supported by org.apache.spark.sql.execution.UnsafeFixedWidthAggregationMap's buffer. (see UnsafeFixedWidthAggregationMap.supportsAggregationBufferSchema).
case class SortBasedAggregate(requiredChildDistributionExpressions: Option[Seq[Expression]], groupingExpressions: Seq[NamedExpression], nonCompleteAggregateExpressions: Seq[AggregateExpression], nonCompleteAggregateAttributes: Seq[Attribute], completeAggregateExpressions: Seq[AggregateExpression], completeAggregateAttributes: Seq[Attribute], initialInputBufferOffset: Int, resultExpressions: Seq[NamedExpression], child: SparkPlan) extends SparkPlan with UnaryNode with Product with Serializable
class SortBasedAggregationIterator extends AggregationIterator

An iterator used to evaluate AggregateFunction.
An iterator used to evaluate AggregateFunction. It assumes the input rows have been sorted by values of groupingKeyAttributes.
case class TungstenAggregate(requiredChildDistributionExpressions: Option[Seq[Expression]], groupingExpressions: Seq[NamedExpression], nonCompleteAggregateExpressions: Seq[AggregateExpression], nonCompleteAggregateAttributes: Seq[Attribute], completeAggregateExpressions: Seq[AggregateExpression], completeAggregateAttributes: Seq[Attribute], initialInputBufferOffset: Int, resultExpressions: Seq[NamedExpression], child: SparkPlan) extends SparkPlan with UnaryNode with Product with Serializable
class TungstenAggregationIterator extends Iterator[UnsafeRow] with Logging

An iterator used to evaluate aggregate functions.
An iterator used to evaluate aggregate functions. It operates on UnsafeRows.
This iterator first uses hash-based aggregation to process input rows. It uses a hash map to store groups and their corresponding aggregation buffers. If we this map cannot allocate memory from memory manager, it spill the map into disk and create a new one. After processed all the input, then merge all the spills together using external sorter, and do sort-based aggregation.
The process has the following step:
- Step 0: Do hash-based aggregation.
- Step 1: Sort all entries of the hash map based on values of grouping expressions and spill them to disk.
- Step 2: Create a external sorter based on the spilled sorted map entries and reset the map.
- Step 3: Get a sorted KVIterator from the external sorter.
- Step 4: Repeat step 0 until no more input.
- Step 5: Initialize sort-based aggregation on the sorted iterator. Then, this iterator works in the way of sort-based aggregation.
The code of this class is organized as follows:
- Part 1: Initializing aggregate functions.
- Part 2: Methods and fields used by setting aggregation buffer values, processing input rows from inputIter, and generating output rows.
- Part 3: Methods and fields used by hash-based aggregation.
- Part 4: Methods and fields used when we switch to sort-based aggregation.
- Part 5: Methods and fields used by sort-based aggregation.
- Part 6: Loads input and process input rows.
- Part 7: Public methods of this iterator.
- Part 8: A utility function used to generate a result when there is no input and there is no grouping expression.
case class TypedAggregateExpression(aggregator: expressions.Aggregator[Any, Any, Any], aEncoder: Option[ExpressionEncoder[Any]], unresolvedBEncoder: ExpressionEncoder[Any], cEncoder: ExpressionEncoder[Any], children: Seq[Attribute], mutableAggBufferOffset: Int, inputAggBufferOffset: Int) extends ImperativeAggregate with Logging with Product with Serializable

This class is a rough sketch of how to hook Aggregator into the Aggregation system.
This class is a rough sketch of how to hook Aggregator into the Aggregation system. It has the following limitations:
- It assumes the aggregator has a zero, 0.

Value Members

object TungstenAggregate extends Serializable
object TypedAggregateExpression extends Serializable
object Utils

Utility functions used by the query planner to convert our plan to new aggregation code path.

package aggregate

Type Members

abstract class AggregationIterator extends Iterator[InternalRow] with Logging

sealed trait BufferSetterGetterUtils extends AnyRef

class SortBasedAggregationIterator extends AggregationIterator

class TungstenAggregationIterator extends Iterator[UnsafeRow] with Logging

Value Members

object TungstenAggregate extends Serializable

object TypedAggregateExpression extends Serializable

object Utils

Ungrouped