Packages

package vectorized

Type Members

  1. class AggregateHashMap extends AnyRef

    This is an illustrative implementation of an append-only single-key/single value aggregate hash map that can act as a 'cache' for extremely fast key-value lookups while evaluating aggregates (and fall back to the BytesToBytesMap if a given key isn't found).

    This is an illustrative implementation of an append-only single-key/single value aggregate hash map that can act as a 'cache' for extremely fast key-value lookups while evaluating aggregates (and fall back to the BytesToBytesMap if a given key isn't found). This can be potentially 'codegened' in HashAggregate to speed up aggregates w/ key.

    It is backed by a power-of-2-sized array for index lookups and a columnar batch that stores the key-value pairs. The index lookups in the array rely on linear probing (with a small number of maximum tries) and use an inexpensive hash function which makes it really efficient for a majority of lookups. However, using linear probing and an inexpensive hash function also makes it less robust as compared to the BytesToBytesMap (especially for a large number of keys or even for certain distribution of keys) and requires us to fall back on the latter for correctness.

  2. class ColumnVectorUtils extends AnyRef

    Utilities to help manipulate data associate with ColumnVectors.

    Utilities to help manipulate data associate with ColumnVectors. These should be used mostly for debugging or other non-performance critical paths. These utilities are mostly used to convert ColumnVectors into other formats.

  3. trait Dictionary extends AnyRef

    The interface for dictionary in ColumnVector to decode dictionary encoded values.

  4. final class MutableColumnarRow extends InternalRow

    A mutable version of ColumnarRow, which is used in the vectorized hash map for hash aggregate, and ColumnarBatch to save object creation.

    A mutable version of ColumnarRow, which is used in the vectorized hash map for hash aggregate, and ColumnarBatch to save object creation.

    Note that this class intentionally has a lot of duplicated code with ColumnarRow, to avoid java polymorphism overhead by keeping ColumnarRow and this class final classes.

  5. final class OffHeapColumnVector extends WritableColumnVector

    Column data backed using offheap memory.

  6. final class OnHeapColumnVector extends WritableColumnVector

    A column backed by an in memory JVM array.

    A column backed by an in memory JVM array. This stores the NULLs as a byte per value and a java array for the values.

  7. abstract class WritableColumnVector extends ColumnVector

    This class adds write APIs to ColumnVector.

    This class adds write APIs to ColumnVector. It supports all the types and contains put APIs as well as their batched versions. The batched versions are preferable whenever possible.

    Capacity: The data stored is dense but the arrays are not fixed capacity. It is the responsibility of the caller to call reserve() to ensure there is enough room before adding elements. This means that the put() APIs do not check as in common cases (i.e. flat schemas), the lengths are known up front.

    A WritableColumnVector should be considered immutable once originally created. In other words, it is not valid to call put APIs after reads until reset() is called.

    WritableColumnVector are intended to be reused.

Ungrouped