Spark Project SQL 3.1.1-hadoop-2.7 API < Back

Packages

package root

Definition Classes
root
package org

Definition Classes
root
package apache

Definition Classes
org
package spark

Definition Classes
apache
package sql
Allows the execution of relational queries, including those expressed in SQL using Spark.
Allows the execution of relational queries, including those expressed in SQL using Spark.

Definition Classes
spark
package execution
The physical execution component of Spark SQL.
The physical execution component of Spark SQL. Note that this is a private package. All classes in catalyst are considered an internal API to Spark SQL and are subject to change between minor releases.

Definition Classes
sql
package adaptive

Definition Classes
execution
package aggregate

Definition Classes
execution
package analysis

Definition Classes
execution
package arrow

Definition Classes
execution
package bucketing

Definition Classes
execution
package columnar

Definition Classes
execution
package command

Definition Classes
execution
package datasources

Definition Classes
execution
package debug
Contains methods for debugging query execution.
Contains methods for debugging query execution.
Usage:
```
import org.apache.spark.sql.execution.debug._
sql("SELECT 1").debug()
sql("SELECT 1").debugCodegen()
```
or for streaming case (structured streaming):
```
import org.apache.spark.sql.execution.debug._
val query = df.writeStream.<...>.start()
query.debugCodegen()
```
Note that debug in structured streaming is not supported, because it doesn't make sense for streaming to execute batch once while main query is running concurrently.
Definition Classes
execution
package dynamicpruning

Definition Classes
execution
package exchange

Definition Classes
execution
package joins

Definition Classes
execution
package metric

Definition Classes
execution
package python

Definition Classes
execution
package r

Definition Classes
execution
package stat

Definition Classes
execution
package streaming

Definition Classes
execution
package ui

Definition Classes
execution
package vectorized

Definition Classes
execution
AggregateHashMap
ColumnVectorUtils
Dictionary
MutableColumnarRow
OffHeapColumnVector
OnHeapColumnVector
WritableColumnVector
package window

Definition Classes
execution

org.apache.spark.sql.execution

vectorized

package vectorized

Type Members

class AggregateHashMap extends AnyRef
This is an illustrative implementation of an append-only single-key/single value aggregate hash map that can act as a 'cache' for extremely fast key-value lookups while evaluating aggregates (and fall back to the BytesToBytesMap if a given key isn't found).
This is an illustrative implementation of an append-only single-key/single value aggregate hash map that can act as a 'cache' for extremely fast key-value lookups while evaluating aggregates (and fall back to the BytesToBytesMap if a given key isn't found). This can be potentially 'codegened' in HashAggregate to speed up aggregates w/ key.
It is backed by a power-of-2-sized array for index lookups and a columnar batch that stores the key-value pairs. The index lookups in the array rely on linear probing (with a small number of maximum tries) and use an inexpensive hash function which makes it really efficient for a majority of lookups. However, using linear probing and an inexpensive hash function also makes it less robust as compared to the BytesToBytesMap (especially for a large number of keys or even for certain distribution of keys) and requires us to fall back on the latter for correctness.
class ColumnVectorUtils extends AnyRef
Utilities to help manipulate data associate with ColumnVectors.
Utilities to help manipulate data associate with ColumnVectors. These should be used mostly for debugging or other non-performance critical paths. These utilities are mostly used to convert ColumnVectors into other formats.
trait Dictionary extends AnyRef
The interface for dictionary in ColumnVector to decode dictionary encoded values.
final class MutableColumnarRow extends InternalRow
A mutable version of ColumnarRow, which is used in the vectorized hash map for hash aggregate, and ColumnarBatch to save object creation.
A mutable version of ColumnarRow, which is used in the vectorized hash map for hash aggregate, and ColumnarBatch to save object creation.
Note that this class intentionally has a lot of duplicated code with ColumnarRow, to avoid java polymorphism overhead by keeping ColumnarRow and this class final classes.
final class OffHeapColumnVector extends WritableColumnVector
Column data backed using offheap memory.
final class OnHeapColumnVector extends WritableColumnVector
A column backed by an in memory JVM array.
A column backed by an in memory JVM array. This stores the NULLs as a byte per value and a java array for the values.
abstract class WritableColumnVector extends ColumnVector
This class adds write APIs to ColumnVector.
This class adds write APIs to ColumnVector. It supports all the types and contains put APIs as well as their batched versions. The batched versions are preferable whenever possible.
Capacity: The data stored is dense but the arrays are not fixed capacity. It is the responsibility of the caller to call reserve() to ensure there is enough room before adding elements. This means that the put() APIs do not check as in common cases (i.e. flat schemas), the lengths are known up front.
A WritableColumnVector should be considered immutable once originally created. In other words, it is not valid to call put APIs after reads until reset() is called.
WritableColumnVector are intended to be reused.

Packages

vectorized 

package vectorized

Type Members

Ungrouped

vectorized