Class ES87TSDBDocValuesEncoder

java.lang.Object
org.elasticsearch.index.codec.tsdb.ES87TSDBDocValuesEncoder

public class ES87TSDBDocValuesEncoder extends Object
This class provides encoding and decoding of doc values using the following schemes:
  • delta encoding: encodes numeric fields in such a way to store the initial value and the difference between the initial value and all subsequent values. Delta values normally require much less bits than the original 32 or 64 bits.
  • offset encoding: encodes numeric fields in such a way to store values in range [0, max - min] instead of [min, max]. Reducing the range makes delta encoding much more effective since numbers in range [0, max - min] require less bits than values in range [min, max].
  • gcd encoding: encodes numeric fields in such a way to store values divided by their Greatest Common Divisor. Diving values by their GCD reduces values magnitude making delta encoding much more effective as a result of the fact that dividing a number by another number reduces its magnitude and, as a result, the bits required to represent it.
  • (f)or encoding: encodes numeric fields in such a way to store the initial value and then the XOR between each value and the previous one, making delta encoding much more effective. Values sharing common values for higher bits will require less bits when delta encoded. This is expected to be effective especially with floating point values sharing a common exponent and sign bit.
Notice that encoding and decoding are written in a nested way, for instance deltaEncode(int, int, long[], org.apache.lucene.store.DataOutput) calling removeOffset(int, int, long[], org.apache.lucene.store.DataOutput) and so on. This allows us to easily introduce new encoding schemes or remove existing (non-effective) encoding schemes in a backward-compatible way. A token is used as a bitmask to represent which encoding is applied and allows us to detect the applied encoding scheme at decoding time. This encoding and decoding scheme is meant to work on blocks of 128 values. Larger block sizes incur a decoding penalty when random access to doc values is required since a full block must be decoded. Of course, decoding follows the opposite order with respect to encoding.
  • Constructor Details

    • ES87TSDBDocValuesEncoder

      public ES87TSDBDocValuesEncoder()