:: DeveloperApi :: A simple open hash table optimized for the append-only use case, where keys are never removed, but the value for each key may be changed.
A simple, fixed-size bit set implementation.
A simple, fixed-size bit set implementation. This implementation is fast because it avoids safety/bound checking.
:: DeveloperApi :: An append-only map that spills sorted content to disk when there is insufficient space for it to grow.
:: DeveloperApi :: An append-only map that spills sorted content to disk when there is insufficient space for it to grow.
This map takes two passes over the data:
(1) Values are merged into combiners, which are sorted and spilled to disk as necessary (2) Combiners are read from disk and merged together
The setting of the spill threshold faces the following trade-off: If the spill threshold is too high, the in-memory map may occupy more memory than is available, resulting in OOM. However, if the spill threshold is too low, we spill frequently and incur unnecessary disk writes. This may lead to a performance regression compared to the normal case of using the non-spilling AppendOnlyMap.
Two parameters control the memory threshold:
spark.shuffle.memoryFraction
specifies the collective amount of memory used for storing
these maps as a fraction of the executor's total memory. Since each concurrently running
task maintains one map, the actual threshold for each map is this quantity divided by the
number of running tasks.
spark.shuffle.safetyFraction
specifies an additional margin of safety as a fraction of
this threshold, in case map size estimation is not sufficiently accurate.
:: DeveloperApi :: A simple open hash table optimized for the append-only use case, where keys are never removed, but the value for each key may be changed.
This implementation uses quadratic probing with a power-of-2 hash table size, which is guaranteed to explore all spaces for each key (see http://en.wikipedia.org/wiki/Quadratic_probing).
TODO: Cache the hash values of each key? java.util.HashMap does that.