Class StreamingTombstoneHistogramBuilder
- java.lang.Object
-
- org.apache.cassandra.utils.streamhist.StreamingTombstoneHistogramBuilder
-
public class StreamingTombstoneHistogramBuilder extends java.lang.Object
Histogram that can be constructed from streaming of data. Histogram used to retrieve the number of droppable tombstones for example viaSSTableReader.getDroppableTombstonesBefore(long)
.When an sstable is written (or streamed), this histogram-builder receives the "local deletion timestamp" as an
long
viaupdate(long)
. Negative values are not supported.Algorithm: Histogram is represented as collection of {point, weight} pairs. When new point p with weight m is added:
- If point p is already exists in collection, add m to recorded value of point p
- If there is no point p in the collection, add point p with weight m
- If point was added and collection size became larger than maxBinSize:
- Find nearest points p1 and p2 in the collection
- Replace these two points with one weighted point p3 = (p1*m1+p2*m2)/(p1+p2)
There are some optimization to make histogram builder faster:
- Spool: big map that saves from excessively merging of small bin. This map can contains up to maxSpoolSize points and accumulate weight from same points. For example, if spoolSize=100, binSize=10 and there are only 50 different points. it will be only 40 merges regardless how many points will be added.
- Spool is organized as open-addressing primitive hash map where odd elements are points and event elements are values. Spool can not resize => when number of collisions became bigger than threshold or size became large that array_size/2 Spool is drained to bin
- Bin is organized as sorted arrays. It reduces garbage collection pressure and allows to find elements in log(binSize) time via binary search
- To use existing Arrays.binarySearch {point, values} in bin pairs is packed in one long
The original algorithm is taken from following paper: Yael Ben-Haim and Elad Tom-Tov, "A Streaming Parallel Decision Tree Algorithm" (2010) http://jmlr.csail.mit.edu/papers/volume11/ben-haim10a/ben-haim10a.pdf
-
-
Constructor Summary
Constructors Constructor Description StreamingTombstoneHistogramBuilder(int maxBinSize, int maxSpoolSize, int roundSeconds)
-
Method Summary
All Methods Static Methods Instance Methods Concrete Methods Modifier and Type Method Description TombstoneHistogram
build()
Creates a 'finished' snapshot of the current state of the histogram, but leaves this builder instance open for subsequent additions to the histograms.void
flushHistogram()
Drain the temporary spool into the final binsvoid
releaseBuffers()
Release inner spool buffers.static int
saturatingCastToInt(long value)
static long
saturatingCastToLong(long value)
static long
saturatingCastToMaxDeletionTime(long value)
Cast to an long with maximum value ofCell.MAX_DELETION_TIME
to avoid representing values that aren't a tombstonevoid
update(long point)
Adds new point to this histogram with a default value of 1.void
update(long point, int value)
Adds new point {@param point} with value {@param value} to this histogram.
-
-
-
Method Detail
-
update
public void update(long point)
Adds new point to this histogram with a default value of 1.- Parameters:
point
- the point to be added
-
update
public void update(long point, int value)
Adds new point {@param point} with value {@param value} to this histogram.
-
flushHistogram
public void flushHistogram()
Drain the temporary spool into the final bins
-
releaseBuffers
public void releaseBuffers()
Release inner spool buffers. Histogram remains readable and writable, but with lesser performance. Not intended for use before finalization.
-
build
public TombstoneHistogram build()
Creates a 'finished' snapshot of the current state of the histogram, but leaves this builder instance open for subsequent additions to the histograms. Basically, this allows us to have some degree of sanity wrt sstable early open.
-
saturatingCastToInt
public static int saturatingCastToInt(long value)
-
saturatingCastToLong
public static long saturatingCastToLong(long value)
-
saturatingCastToMaxDeletionTime
public static long saturatingCastToMaxDeletionTime(long value)
Cast to an long with maximum value ofCell.MAX_DELETION_TIME
to avoid representing values that aren't a tombstone
-
-