package vectorized
Type Members
-
class
AggregateHashMap extends AnyRef
This is an illustrative implementation of an append-only single-key/single value aggregate hash map that can act as a 'cache' for extremely fast key-value lookups while evaluating aggregates (and fall back to the
BytesToBytesMap
if a given key isn't found).This is an illustrative implementation of an append-only single-key/single value aggregate hash map that can act as a 'cache' for extremely fast key-value lookups while evaluating aggregates (and fall back to the
BytesToBytesMap
if a given key isn't found). This can be potentially 'codegened' in HashAggregate to speed up aggregates w/ key.It is backed by a power-of-2-sized array for index lookups and a columnar batch that stores the key-value pairs. The index lookups in the array rely on linear probing (with a small number of maximum tries) and use an inexpensive hash function which makes it really efficient for a majority of lookups. However, using linear probing and an inexpensive hash function also makes it less robust as compared to the
BytesToBytesMap
(especially for a large number of keys or even for certain distribution of keys) and requires us to fall back on the latter for correctness. -
class
ColumnVectorUtils extends AnyRef
Utilities to help manipulate data associate with ColumnVectors.
Utilities to help manipulate data associate with ColumnVectors. These should be used mostly for debugging or other non-performance critical paths. These utilities are mostly used to convert ColumnVectors into other formats.
-
trait
Dictionary extends AnyRef
The interface for dictionary in ColumnVector to decode dictionary encoded values.
-
final
class
MutableColumnarRow extends InternalRow
A mutable version of
ColumnarRow
, which is used in the vectorized hash map for hash aggregate, andColumnarBatch
to save object creation.A mutable version of
ColumnarRow
, which is used in the vectorized hash map for hash aggregate, andColumnarBatch
to save object creation.Note that this class intentionally has a lot of duplicated code with
ColumnarRow
, to avoid java polymorphism overhead by keepingColumnarRow
and this class final classes. -
final
class
OffHeapColumnVector extends WritableColumnVector
Column data backed using offheap memory.
-
final
class
OnHeapColumnVector extends WritableColumnVector
A column backed by an in memory JVM array.
A column backed by an in memory JVM array. This stores the NULLs as a byte per value and a java array for the values.
-
abstract
class
WritableColumnVector extends ColumnVector
This class adds write APIs to ColumnVector.
This class adds write APIs to ColumnVector. It supports all the types and contains put APIs as well as their batched versions. The batched versions are preferable whenever possible.
Capacity: The data stored is dense but the arrays are not fixed capacity. It is the responsibility of the caller to call reserve() to ensure there is enough room before adding elements. This means that the put() APIs do not check as in common cases (i.e. flat schemas), the lengths are known up front.
A WritableColumnVector should be considered immutable once originally created. In other words, it is not valid to call put APIs after reads until reset() is called.
WritableColumnVector are intended to be reused.