A QTree provides an approximate Map[Double,A:Monoid] suitable for range queries, quantile queries,
and combinations of these (for example, if you use a numeric A, you can derive the inter-quartile mean).
It is loosely related to the Q-Digest data structure from http://www.cs.virginia.edu/~son/cs851/papers/ucsb.sensys04.pdf,
but using an immutable tree structure, and carrying a generalized sum (of type A) at each node instead of just a count.
The basic idea is to keep a binary tree, where the root represents the entire range of the input keys,
and each child node represents either the lower or upper half of its parent's range. Ranges are constrained to be
dyadic intervals (https://en.wikipedia.org/wiki/Interval_(mathematics)#Dyadic_intervals) for ease of merging.
To keep the size bounded, the total count carried by any sub-tree must be at least 1/(2^k) of the total
count at the root. Any sub-trees that do not meet this criteria have their children pruned and become leaves.
(It's important that they not be pruned away entirely, but that we keep a fringe of low-count leaves that can
gain weight over time and ultimately split again when warranted).
Quantile and range queries both give hard upper and lower bounds; the true result will be somewhere in the range given.
A QTree provides an approximate Map[Double,A:Monoid] suitable for range queries, quantile queries, and combinations of these (for example, if you use a numeric A, you can derive the inter-quartile mean).
It is loosely related to the Q-Digest data structure from http://www.cs.virginia.edu/~son/cs851/papers/ucsb.sensys04.pdf, but using an immutable tree structure, and carrying a generalized sum (of type A) at each node instead of just a count.
The basic idea is to keep a binary tree, where the root represents the entire range of the input keys, and each child node represents either the lower or upper half of its parent's range. Ranges are constrained to be dyadic intervals (https://en.wikipedia.org/wiki/Interval_(mathematics)#Dyadic_intervals) for ease of merging.
To keep the size bounded, the total count carried by any sub-tree must be at least 1/(2^k) of the total count at the root. Any sub-trees that do not meet this criteria have their children pruned and become leaves. (It's important that they not be pruned away entirely, but that we keep a fringe of low-count leaves that can gain weight over time and ultimately split again when warranted).
Quantile and range queries both give hard upper and lower bounds; the true result will be somewhere in the range given.
Keys must be >= 0.