Package

org.apache.spark.sql.execution.columnar

encoding

Permalink

package encoding

Visibility
  1. Public
  2. All

Type Members

  1. final class BigDictionaryDecoder extends BigDictionaryDecoderBase with NotNullDecoder

    Permalink
  2. abstract class BigDictionaryDecoderBase extends DictionaryDecoderBase

    Permalink
  3. final class BigDictionaryDecoderNullable extends BigDictionaryDecoderBase with NullableDecoder

    Permalink
  4. final class BooleanBitSetDecoder extends BooleanBitSetDecoderBase with NotNullDecoder

    Permalink
  5. abstract class BooleanBitSetDecoderBase extends ColumnDecoder with BooleanBitSetEncoding

    Permalink
  6. final class BooleanBitSetDecoderNullable extends BooleanBitSetDecoderBase with NullableDecoder

    Permalink
  7. final class BooleanBitSetEncoder extends NotNullEncoder with BooleanBitSetEncoderBase

    Permalink
  8. trait BooleanBitSetEncoderBase extends ColumnEncoder with BooleanBitSetEncoding

    Permalink
  9. final class BooleanBitSetEncoderNullable extends NullableEncoder with BooleanBitSetEncoderBase

    Permalink
  10. trait BooleanBitSetEncoding extends ColumnEncoding

    Permalink
  11. trait BooleanRunLengthDecoder extends RunLengthDecoding

    Permalink

    Run length encoding optimized for booleans.

    Run length encoding optimized for booleans. Each short run-length value represents the run-length with the value. Even numbered run-lengths indicate a run of false values having half the length, while odd numbered run-lengths are for true values (having length = run / 2 + 1).

  12. trait BooleanRunLengthEncoder extends AnyRef

    Permalink
  13. abstract class ColumnDecoder extends ColumnEncoding

    Permalink
  14. final class ColumnDeleteDecoder extends AnyRef

    Permalink

    Decodes the deleted positions of a batch that has seen some deletes.

  15. final class ColumnDeleteDelta extends ColumnFormatValue with Delta

    Permalink

    Simple delta that merges the deleted positions

  16. final class ColumnDeleteEncoder extends ColumnEncoder

    Permalink

    Currently just stores the deleted positions assuming sorted by position at plan level.

    Currently just stores the deleted positions assuming sorted by position at plan level. This can be optimized to use a more efficient storage when number of positions is large large like a boolean bitset, or use a more comprehensive compression scheme like PFOR (https://github.com/lemire/JavaFastPFOR).

  17. final class ColumnDeltaDecoder extends AnyRef

    Permalink

    Internal class to decode values from a single delta as obtained from ColumnDeltaEncoder.

    Internal class to decode values from a single delta as obtained from ColumnDeltaEncoder. Should not be used directly rather the combined decoder UpdatedColumnDecoder should be the one used.

  18. final class ColumnDeltaEncoder extends ColumnEncoder

    Permalink

    Encodes a delta value for a ColumnFormatValue obtained after an update operation that can change one or more values.

    Encodes a delta value for a ColumnFormatValue obtained after an update operation that can change one or more values. This applies the update in an optimized batch manner as far as possible.

    The format of delta encoding is straightforward and adds the positions in the full column in addition to the normal column encoding. So the layout looks like below:

     .----------------------- Base encoding scheme (4 bytes)
    |    .------------------- Null bitset size as number of longs N (4 bytes)
    |   |
    |   |   .---------------- Null bitset longs (8 x N bytes,
    |   |   |                                    empty if null count is zero)
    |   |   |  .------------- Positions in full column value
    |   |   |  |
    |   |   |  |    .-------- Encoded non-null elements
    |   |   |  |    |
    V   V   V  V    V
    +---+---+--+--- +--------------+
    |   |   |  |    |   ...   ...  |
    +---+---+--+----+--------------+
     \-----/ \--------------------/
      header           body

    Whether the value type is a delta or not is determined by the "deltaHierarchy" field in ColumnFormatValue and the negative columnIndex in ColumnFormatKey. Encoding typeId itself does not store anything for it separately.

    An alternative could be storing the position before each encoded element but it will not work properly for schemes like run-length encoding that will not write anything if elements are in that current run-length.

    A set of new updated column values results in the merge of those values with the existing encoded values held in the current delta with smallest hierarchy depth (i.e. one that has a maximum size of 100). Each delta can grow to a limit after which it is subsumed in a larger delta of bigger size thus creating a hierarchy of deltas. So base delta will go till 100 entries or so, then the next higher level one will go till say 1000 entries and so on till the full ColumnFormatValue size is attained. This design attempts to minimize disk writes at the cost of some scan overhead for columns that see a large number of updates. The hierarchy is expected to be small not more than 3-4 levels to get a good balance between write overhead and scan overhead.

  19. trait ColumnEncoder extends ColumnEncoding

    Permalink
  20. trait ColumnEncoding extends AnyRef

    Permalink

    Base class for encoding and decoding in columnar form.

    Base class for encoding and decoding in columnar form. Memory layout of the bytes for a set of column values is:

     .----------------------- Encoding scheme (4 bytes)
    |    .------------------- Null bitset size as number of longs N (4 bytes)
    |   |
    |   |   .---------------- Null bitset longs (8 x N bytes,
    |   |   |                                    empty if null count is zero)
    |   |   |     .---------- Encoded non-null elements
    V   V   V     V
    +---+---+-----+---------+
    |   |   | ... | ... ... |
    +---+---+-----+---------+
     \-----/ \-------------/
      header      body
  21. case class ColumnStatsSchema(fieldName: String, dataType: DataType) extends Product with Serializable

    Permalink
  22. abstract class DeltaWriter extends AnyRef

    Permalink

    Trait to read column values from delta encoded column and write to target delta column.

    Trait to read column values from delta encoded column and write to target delta column. The reads may not be sequential and could be random-access reads while writes will be sequential, sorted by position in the full column value. Implementations should not do any null value handling.

    This uses a separate base class rather than a closure to avoid the overhead of boxing/unboxing with multi-argument closures (>2 arguments).

  23. trait DeltaWriterFactory extends AnyRef

    Permalink

    Factory for DeltaWriter used by code generation to enable using the simpler Janino "createFastEvaluator" API.

  24. final class DictionaryDecoder extends DictionaryDecoderBase with NotNullDecoder

    Permalink
  25. abstract class DictionaryDecoderBase extends ColumnDecoder with DictionaryEncoding

    Permalink
  26. final class DictionaryDecoderNullable extends DictionaryDecoderBase with NullableDecoder

    Permalink
  27. final class DictionaryEncoder extends NotNullEncoder with DictionaryEncoderBase

    Permalink
  28. trait DictionaryEncoderBase extends ColumnEncoder with DictionaryEncoding

    Permalink
  29. final class DictionaryEncoderNullable extends NullableEncoder with DictionaryEncoderBase

    Permalink
  30. trait DictionaryEncoding extends ColumnEncoding

    Permalink
  31. trait NotNullDecoder extends ColumnDecoder

    Permalink
  32. trait NotNullEncoder extends ColumnEncoder

    Permalink
  33. trait NullableDecoder extends ColumnDecoder

    Permalink

    Nulls are stored either as a bitset or as a sequence of positions (or inverse sequence of missing positions) if the number of nulls is small (or non-nulls is small).

    Nulls are stored either as a bitset or as a sequence of positions (or inverse sequence of missing positions) if the number of nulls is small (or non-nulls is small). This is indicated by the count field which is negative for the latter case. The decoder for latter keeps track of next null position and compares against that.

  34. trait NullableEncoder extends NotNullEncoder

    Permalink
  35. final class RunLengthDecoder extends RunLengthDecoderBase with NotNullDecoder

    Permalink
  36. abstract class RunLengthDecoderBase extends ColumnDecoder with RunLengthEncoding

    Permalink
  37. final class RunLengthDecoderNullable extends RunLengthDecoderBase with NullableDecoder

    Permalink
  38. trait RunLengthDecoding extends AnyRef

    Permalink
  39. trait RunLengthEncoding extends ColumnEncoding

    Permalink
  40. final class StringDictionary extends AnyRef

    Permalink
  41. trait Uncompressed extends ColumnEncoding

    Permalink
  42. final class UncompressedDecoder extends UncompressedDecoderBase with NotNullDecoder

    Permalink
  43. abstract class UncompressedDecoderBase extends ColumnDecoder with Uncompressed

    Permalink
  44. final class UncompressedDecoderNullable extends UncompressedDecoderBase with NullableDecoder

    Permalink
  45. final class UncompressedEncoder extends NotNullEncoder with UncompressedEncoderBase

    Permalink
  46. trait UncompressedEncoderBase extends ColumnEncoder with Uncompressed

    Permalink
  47. final class UncompressedEncoderNullable extends NullableEncoder with UncompressedEncoderBase

    Permalink
  48. final class UpdatedColumnDecoder extends UpdatedColumnDecoderBase

    Permalink

    Decodes a column of a batch that has seen some updates by combining all delta values, and full column value obtained from ColumnDeltaEncoders and column encoders.

    Decodes a column of a batch that has seen some updates by combining all delta values, and full column value obtained from ColumnDeltaEncoders and column encoders. Callers should provide this with the set of all deltas for the column apart from the full column value.

    To create an instance, use the companion class apply method which will create a nullable or non-nullable version as appropriate.

  49. abstract class UpdatedColumnDecoderBase extends AnyRef

    Permalink
  50. final class UpdatedColumnDecoderNullable extends UpdatedColumnDecoderBase

    Permalink

    Nullable version of UpdatedColumnDecoder.

Value Members

  1. object BitSet

    Permalink

    Static methods for working with fixed-size bitsets stored elsewhere in a byte array or long array.

    Static methods for working with fixed-size bitsets stored elsewhere in a byte array or long array. Similar to Spark's BitSetMethods but respects platform endian-ness so is suitable for storage.

  2. object ColumnEncoding

    Permalink
  3. object ColumnStatsSchema extends Serializable

    Permalink
  4. object DeltaWriter

    Permalink
  5. object UpdatedColumnDecoder

    Permalink

Ungrouped