Updates the sketch with a new element from the data stream.
Returns an estimate of the total number of times this item has been seen in the stream so far.
Returns an estimate of the total number of times this item has been seen in the stream so far. This estimate is an upper bound.
It is always true that trueFrequency <= estimatedFrequency. With probability p >= 1 - delta, it also holds that estimatedFrequency <= trueFrequency + eps * totalCount.
Finds all heavy hitters, i.
Finds all heavy hitters, i.e., elements in the stream that appear at least (heavyHittersPct * totalCount) times.
Every item that appears at least (heavyHittersPct * totalCount) times is output, and with probability p >= 1 - delta, no item whose count is less than (heavyHittersPct - eps) * totalCount is output.
Note that the set of heavy hitters contains at most 1 / heavyHittersPct elements, so keeping track of all elements that appear more than (say) 1% of the time requires tracking at most 100 items.
Let X be a CMS, and let count_X[j, k] denote the value in X's 2-dimensional count table at row j and column k.
Let X be a CMS, and let count_X[j, k] denote the value in X's 2-dimensional count table at row j and column k. Then the Count-Min sketch estimate of the inner product between A and B is the minimum inner product between their rows: estimatedInnerProduct = min_j (\sum_k count_A[j, k] * count_B[j, k])
The general Count-Min sketch structure, used for holding any number of elements.