Returns an estimate of the total number of times this item has been seen in the stream so far.
Returns an estimate of the total number of times this item has been seen in the stream so far. This estimate is an upper bound.
It is always true that trueFrequency <= estimatedFrequency. With probability p >= 1 - delta, it also holds that estimatedFrequency <= trueFrequency + eps * totalCount.
Finds all heavy hitters, i.
Finds all heavy hitters, i.e., elements in the stream that appear at least (heavyHittersPct * totalCount) times.
Every item that appears at least (heavyHittersPct * totalCount) times is output, and with probability p >= 1 - delta, no item whose count is less than (heavyHittersPct - eps) * totalCount is output.
Note that the set of heavy hitters contains at most 1 / heavyHittersPct elements, so keeping track of all elements that appear more than (say) 1% of the time requires tracking at most 100 items.
Returns an estimate of the inner product against another data stream.
Returns an estimate of the inner product against another data stream.
In other words, let a_i denote the number of times element i has been seen in the data stream summarized by this CMS, and let b_i denote the same for the other CMS. Then this returns an estimate of <a, b> = \sum a_i b_i
Note: this can also be viewed as the join size between two relations.
It is always true that actualInnerProduct <= estimatedInnerProduct. With probability p >= 1 - delta, it also holds that estimatedInnerProduct <= actualInnerProduct + eps * thisTotalCount * otherTotalCount
Used for holding a single element, to avoid repeatedly adding elements from sparse counts tables.