For use from Java/minimizing code bloat in scala
For use from Java/minimizing code bloat in scala
For use from Java/minimizing code bloat in scala
This is a wrapper around SummingCache that attempts to grow the capacity by up to some maximum, as long as there's enough RAM.
This is a wrapper around SummingCache that attempts to grow the capacity by up to some maximum, as long as there's enough RAM. It determines that there's enough RAM to grow by maintaining a SentinelCache which keeps caching and summing the evicted values. Once the SentinelCache has grown to the same size as the current cache, plus some margin, without running out of RAM, then this indicates that we have enough headroom to double the capacity.
An IndexedSeq that automatically switches representation between dense and sparse depending on sparsity Should be an efficient representation for all sizes, and it should not be necessary to special case immutable algebras based on the sparsity of the vectors.
This is for the case where your Ring[T] is a Rng (i.e.
This is for the case where your Ring[T] is a Rng (i.e. there is no unit).
http://en.wikipedia.org/wiki/Pseudo-ring#Adjoining_an_identity_element
Represents functions of the kind: f(x) = slope * x + intercept
This feeds the value in on the LEFT!!! This may seem counter intuitive, but with this approach, a stream/iterator which is summed will have the same output as applying the function one at a time in order to the input.
This feeds the value in on the LEFT!!! This may seem counter intuitive, but with this approach, a stream/iterator which is summed will have the same output as applying the function one at a time in order to the input. If we did the "lexigraphically correct" thing, which might be (f+g)(x) = f(g(x)) then we would wind up reversing the list in the sum. (f1 + f2)(x) = f2(f1(x)) so that: listOfFn.foldLeft(x) { (v, fn) => fn(v) } = (Monoid.sum(listOfFn))(x)
This is a type that models map/reduce(map).
This is a type that models map/reduce(map). First each item is mapped, then we reduce with a semigroup, then finally we present the results.
Unlike Fold, Aggregator keeps it's middle aggregation type externally visible. This is because Aggregators are useful in parallel map/reduce systems where there may be some additional types needed to cross the map/reduce boundary (such a serialization and intermediate storage). If you don't care about the middle type, an _ may be used and the main utility of the instance is still preserved (e.g. def operate[T](ag: Aggregator[T, _, Int]): Int)
Note, join is very useful to combine multiple aggregations with one pass. Also GeneratedTupleAggregator.fromN((agg1, agg2, ... aggN)) can glue these together well.
This type is the the Fold.M from Haskell's fold package: https://hackage.haskell.org/package/folds-0.6.2/docs/Data-Fold-M.html
Aggregators are Applicatives, but this hides the middle type.
Aggregators are Applicatives, but this hides the middle type. If you need a join that does not hide the middle type use join on the trait, or GeneratedTupleAggregator.fromN
Simple implementation of an Applicative type-class.
Simple implementation of an Applicative type-class. There are many choices for the canonical second operation (join, sequence, joinWith, ap), all equivalent. For a Functor modeling concurrent computations with failure, like Future, combining results with join can save a lot of time over combining with flatMap. (Given two operations, if the second fails before the first completes, one can fail the entire computation right then. With flatMap, one would have to wait for the first operation to complete before failing it.)
Laws Applicatives must follow: map(apply(x))(f) == apply(f(x)) join(apply(x), apply(y)) == apply((x, y)) (sequence and joinWith specialize join - they should behave appropriately)
Group and Ring ARE NOT AUTOMATIC.
Group and Ring ARE NOT AUTOMATIC. You have to check that the laws hold for your Applicative. If your M[_] is a wrapper type (Option[_], Some[_], Try[_], Future[_], etc...) this generally works.
This is a Monoid, for all Applicatives.
This enrichment allows us to use our Applicative instances in for expressions: if (import Applicative._) has been done
Group and Ring ARE NOT AUTOMATIC.
Group and Ring ARE NOT AUTOMATIC. You have to check that the laws hold for your Applicative. If your M[_] is a wrapper type (Option[_], Some[_], Try[_], Future[_], etc...) this generally works.
This is a Semigroup, for all Applicatives.
Extends pair-wise sum Array monoid into a Group negate is defined as the negation of each element of the array.
Pair-wise sum Array monoid.
Pair-wise sum Array monoid.
plus returns left[i] + right[i] for all array elements. The resulting array will be as long as the longest array (with its elements duplicated) zero is an empty array
Tracks the count and mean value of Doubles in a data stream.
Tracks the count and mean value of Doubles in a data stream.
Adding two instances of AveragedValue with + is equivalent to taking an average of the two streams, with each stream weighted by its count.
The mean calculation uses a numerically stable online algorithm suitable for large numbers of records, similar to Chan et. al.'s parallel variance algorithm on Wikipedia. As long as your count doesn't overflow a Long, the mean calculation won't overflow.
the number of aggregated items
the average value of all aggregated items
MomentsGroup.getCombinedMean for implementation of +
Bloom Filter data structure
Bloom Filter with 1 value.
Empty bloom filter.
Batched: the free semigroup.
Batched: the free semigroup.
For any type T
, Batched[T]
represents a way to lazily combine T
values as a semigroup would (i.e. associatively). A Semigroup[T]
instance can be used to recover a T
value from a Batched[T]
.
Like other free structures, Batched trades space for time. A sum of
batched values defers the underlying semigroup action, instead
storing all values in memory (in a tree structure). If an
underlying semigroup is available, Batched.semigroup
and
Batch.monoid
can be configured to periodically sum the tree to
keep the overall size below batchSize
.
Batched[T]
values are guaranteed not to be empty -- that is, they
will contain at least one T
value.
Compacting monoid for batched values.
Compacting monoid for batched values.
This monoid ensures that the batch's tree structure has fewer
than batchSize
values in it. When more values are added, the
tree is compacted using m
.
Compacting semigroup for batched values.
Compacting semigroup for batched values.
This semigroup ensures that the batch's tree structure has fewer
than batchSize
values in it. When more values are added, the
tree is compacted using s
.
Bloom Filter - a probabilistic data structure to test presence of an element.
Bloom Filter - a probabilistic data structure to test presence of an element.
Operations 1) insert: hash the value k times, updating the bitfield at the index equal to each hashed value 2) query: hash the value k times. If there are k collisions, then return true; otherwise false.
http://en.wikipedia.org/wiki/Bloom_filter
Represents something that consumes I and may emit O.
Represents something that consumes I and may emit O. Has some internal state that may be used to improve performance. Generally used to model folds or reduces (see BufferedReduce)
This never emits on put, you must call flush designed to be use in the stackable pattern with ArrayBufferedOperation
A wrapper for Array[Byte]
that provides sane implementations of hashCode
, equals
, and toString
.
A wrapper for Array[Byte]
that provides sane implementations of hashCode
, equals
, and toString
.
The wrapped array of bytes is assumed to be never modified.
Note: Unfortunately we cannot make Bytes a value class because a value class may not override the hashCode
and equals
methods (cf. SIP-15, criterion 4).
Instead of wrapping an Array[Byte]
with this class you can also convert an Array[Byte]
to a Seq[Byte]
via
Scala's toSeq
method:
val arrayByte: Array[Byte] = Array(1.toByte) val seqByte: Seq[Byte] = arrayByte.toSeq
Like Bytes, a Seq[Byte]
has sane hashCode
, equals
, and toString
implementations.
Performance-wise we found that a Seq[Byte]
is comparable to Bytes. For example, a CMS[Seq[Byte]]
was
measured to be only slightly slower than CMS[Bytes]
(think: single-digit percentages).
the wrapped array of bytes
A Count-Min sketch data structure that allows for counting and frequency estimation of elements in a data stream.
A Count-Min sketch data structure that allows for counting and frequency estimation of elements in a data stream.
Tip: If you also need to track heavy hitters ("Top N" problems), take a look at TopCMS.
This example demonstrates how to count Long
elements with CMS, i.e. K=Long
.
Note that the actual counting is always performed with a Long
, regardless of your choice of K
. That is,
the counting table behind the scenes is backed by Long
values (at least in the current implementation), and thus
the returned frequency estimates are always instances of Approximate[Long]
.
The type used to identify the elements to be counted.
// Creates a monoid for a CMS that can count `Long` elements. val cmsMonoid: CMSMonoid[Long] = { val eps = 0.001 val delta = 1E-10 val seed = 1 CMS.monoid[Long](eps, delta, seed) } // Creates a CMS instance that has counted the element `1L`. val cms: CMS[Long] = cmsMonoid.create(1L) // Estimates the frequency of `1L` val estimate: Approximate[Long] = cms.frequency(1L)
An Aggregator for CMS.
An Aggregator for CMS. Can be created using CMS.aggregator.
A trait for CMS implementations that can count elements in a data stream and that can answer point queries (i.e.
The Count-Min sketch uses d
(aka depth
) pair-wise independent hash functions drawn from a universal hashing
family of the form:
The Count-Min sketch uses d
(aka depth
) pair-wise independent hash functions drawn from a universal hashing
family of the form:
h(x) = [a * x + b (mod p)] (mod m)
As a requirement for using CMS you must provide an implicit CMSHasher[K]
for the type K
of the items you want to
count. Algebird ships with several such implicits for commonly used types K
such as Long
and BigInt
.
If your type K
is not supported out of the box, you have two options: 1) You provide a "translation" function to
convert items of your (unsupported) type K
to a supported type such as Double, and then use the contramap
function of CMSHasher to create the required CMSHasher[K]
for your type (see the documentation of contramap
for an example); 2) You implement a CMSHasher[K]
from scratch, using the existing CMSHasher implementations as a
starting point.
A trait for CMS implementations that can track heavy hitters in a data stream.
A trait for CMS implementations that can track heavy hitters in a data stream.
It is up to the implementation how the semantics of tracking heavy hitters are defined. For instance, one implementation could track the "top %" heavy hitters whereas another implementation could track the "top N" heavy hitters.
Known implementations: TopCMS.
The type used to identify the elements to be counted.
The general Count-Min sketch structure, used for holding any number of elements.
Used for holding a single element, to avoid repeatedly adding elements from sparse counts tables.
Monoid for adding CMS sketches.
Monoid for adding CMS sketches.
eps
and delta
are parameters that bound the error of each query estimate. For example, errors in
answering point queries (e.g., how often has element x appeared in the stream described by the sketch?)
are often of the form: "with probability p >= 1 - delta, the estimate is close to the truth by
some factor depending on eps."
The type K
is the type of items you want to count. You must provide an implicit CMSHasher[K]
for K
, and
Algebird ships with several such implicits for commonly used types such as Long
and BigInt
.
If your type K
is not supported out of the box, you have two options: 1) You provide a "translation" function to
convert items of your (unsupported) type K
to a supported type such as Double, and then use the contramap
function of CMSHasher to create the required CMSHasher[K]
for your type (see the documentation of CMSHasher
for an example); 2) You implement a CMSHasher[K]
from scratch, using the existing CMSHasher implementations as a
starting point.
Note: Because Arrays in Scala/Java not have sane equals
and hashCode
implementations, you cannot safely use types
such as Array[Byte]
. Extra work is required for Arrays. For example, you may opt to convert Array[T]
to a
Seq[T]
via toSeq
, or you can provide appropriate wrapper classes. Algebird provides one such wrapper class,
Bytes, to safely wrap an Array[Byte]
for use with CMS.
The type used to identify the elements to be counted. For example, if you want to count the occurrence of
user names, you could map each username to a unique numeric ID expressed as a Long
, and then count the
occurrences of those Long
s with a CMS of type K=Long
. Note that this mapping between the elements of
your problem domain and their identifiers used for counting via CMS should be bijective.
We require a CMSHasher context bound for K
, see CMSHasherImplicits for available implicits that
can be imported.
Which type K should you pick in practice? For domains that have less than 2^64
unique elements, you'd
typically use
Long. For larger domains you can try
BigInt, for example. Other possibilities
include Spire's
SafeLong and
Numerical data types (https://github.com/non/spire), though Algebird does
not include the required implicits for CMS-hashing (cf. CMSHasherImplicits.
Configuration parameters for CMS.
Configuration parameters for CMS.
The type used to identify the elements to be counted.
Pair-wise independent hashes functions. We need N=depth
such functions (depth
can be derived from
delta
).
One-sided error bound on the error of each point query, i.e. frequency estimate.
A bound on the probability that a query estimate does not lie within some small interval
(an interval that depends on eps
) around the truth.
An Option parameter about how many exact counts a sparse CMS wants to keep.
This mutable builder can be used when speed is essential and you can be sure the scope of the mutability cannot escape in an unsafe way.
This mutable builder can be used when speed is essential and you can be sure the scope of the mutability cannot escape in an unsafe way. The intended use is to allocate and call result in one method without letting a reference to the instance escape into a closure.
Zero element.
Zero element. Used for initialization.
These are the individual instances which the Monoid knows how to add
Either semigroup is useful for error handling.
Either semigroup is useful for error handling. if everything is correct, use Right (it's right, get it?), if something goes wrong, use Left. plus does the normal thing for plus(Right, Right), or plus(Left, Left), but if exactly one is Left, we return that value (to keep the error condition). Typically, the left value will be a string representing the errors.
EventuallySemigroup
EventuallySemigroup
EventuallySemigroup
Classes that support algebraic structures with dynamic switching between two representations, the original type O and the eventual type E.
Classes that support algebraic structures with dynamic switching between two representations, the original type O and the eventual type E. In the case of Semigroup, we specify - Two Semigroups eventualSemigroup and originalSemigroup - A Semigroup homomorphism convert: O => E - A conditional mustConvert: O => Boolean Then we get a Semigroup[Either[E,O]], where: Left(x) + Left(y) = Left(x+y) Left(x) + Right(y) = Left(x+convert(y)) Right(x) + Left(y) = Left(convert(x)+y) Right(x) + Right(y) = Left(convert(x+y)) if mustConvert(x+y) Right(x+y) otherwise. EventuallyMonoid, EventuallyGroup, and EventuallyRing are defined analogously, with the contract that convert respect the appropriate structure.
Exponential Histogram algorithm from http://www-cs-students.stanford.edu/~datar/papers/sicomp_streams.pdf
Exponential Histogram algorithm from http://www-cs-students.stanford.edu/~datar/papers/sicomp_streams.pdf
An Exponential Histogram is a sliding window counter that can guarantee a bounded relative error. You configure the data structure with
- epsilon, the relative error you're willing to tolerate - windowSize, the number of time ticks that you want to track
You interact with the data structure by adding (number, timestamp)
pairs into the exponential histogram. querying it for an
approximate counts with guess
.
The approximate count is guaranteed to be within conf.epsilon
relative error of the true count seen across the supplied
windowSize
.
Next steps:
- efficient serialization - Query EH with a shorter window than the configured window - Discussion of epsilon vs memory tradeoffs
the config values for this instance.
Vector of timestamps of each (powers of 2) ticks. This is the key to the exponential histogram representation. See ExpHist.Canonical for more info.
total ticks tracked. total == buckets.map(_.size).sum
current timestamp of this instance.
To keep code using algebird.Field compiling, we export algebra Field
Tracks the "least recent", or earliest, wrapped instance of T
by
the order in which items are seen.
Tracks the "least recent", or earliest, wrapped instance of T
by
the order in which items are seen.
wrapped instance of T
Aggregator that selects the first instance of T
in the
aggregated stream.
A Preparer that has had one or more flatMap operations applied.
A Preparer that has had one or more flatMap operations applied. It can only accept MonoidAggregators.
Folds are first-class representations of "Traversable.foldLeft." They have the nice property that they can be fused to work in parallel over an input sequence.
Folds are first-class representations of "Traversable.foldLeft." They have the nice property that they can be fused to work in parallel over an input sequence.
A Fold accumulates inputs (I) into some internal type (X), converting to a defined output type (O) when done. We use existential types to hide internal details and to allow for internal and external (X and O) types to differ for "map" and "join."
In discussing this type we draw parallels to Function1 and related types. You can think of a fold as a function "Seq[I] => O" but in reality we do not have to materialize the input sequence at once to "run" the fold.
The traversal of the input data structure is NOT done by Fold itself. Instead we expose some methods like "overTraversable" that know how to iterate through various sequence types and drive the fold. We also expose some internal state so library authors can fold over their own types.
See the companion object for constructors.
Folds are Applicatives!
A FoldState defines a left fold with a "hidden" accumulator type.
A FoldState defines a left fold with a "hidden" accumulator type. It is exposed so library authors can run Folds over their own sequence types.
The fold can be executed correctly according to the properties of "add" and your traversed data structure. For example, the "add" function of a monoidal fold will be associative. A FoldState is valid for only one iteration because the accumulator (seeded by "start" may be mutable.
The three components of a fold are add: (X, I) => X - updates and returns internal state for every input I start: X - the initial state end: X => O - transforms internal state to a final result
Folding over Seq(x, y) would produce the result end(add(add(start, x), y))
Function1 monoid.
Function1 monoid. plus means function composition, zero is the identity function
Simple implementation of a Functor type-class.
Simple implementation of a Functor type-class.
Laws Functors must follow: map(m)(id) == m map(m)(f andThen g) == map(map(m)(f))(g)
This enrichment allows us to use our Functor instances in for expressions: if (import Functor._) has been done
You can think of this as a Sparse vector ring
Group: this is a monoid that also has subtraction (and negation): So, you can do (a-b), or -a (which is equal to 0 - a).
Group: this is a monoid that also has subtraction (and negation): So, you can do (a-b), or -a (which is equal to 0 - a).
HLLSeries can produce a HyperLogLog counter for any window into the past, using a constant factor more space than HyperLogLog.
HLLSeries can produce a HyperLogLog counter for any window into the past, using a constant factor more space than HyperLogLog.
For each hash bucket, rather than keeping a single max RhoW value, it keeps every RhoW value it has seen, and the max timestamp where it saw that value. This allows it to reconstruct an HLL as it would be had it started at zero at any given point in the past, and seen the same updates this structure has seen.
The number of bits to use
Vector of maps of RhoW -> max timestamp where it was seen
New HLLSeries
A typeclass to represent hashing to 128 bits.
A typeclass to represent hashing to 128 bits. Used for HLL, but possibly other applications
Containers for holding heavy hitter items and their associated counts.
Controls how a CMS that implements CMSHeavyHitters tracks heavy hitters.
val hllSeriesMonoid = new HyperLogLogSeriesMonoid(bits)
val hllSeriesMonoid = new HyperLogLogSeriesMonoid(bits)
val examples: Seq[Array[Byte], Long] val series = examples .map { case (bytes, timestamp) => hllSeriesMonoid.create(bytes, timestamp) } .reduce { hllSeriesMonoid.plus(_,_) }
val estimate1 = series.since(timestamp1.toLong).toHLL.estimatedSize val estimate2 = series.since(timestamp2.toLong).toHLL.estimatedSize
Note that this works similar to Semigroup[Map[Int,T]] not like Semigroup[List[T]] This does element-wise operations, like standard vector math, not concatenation, like Semigroup[String] or Semigroup[List[T]]
Note that this works similar to Semigroup[Map[Int,T]] not like Semigroup[List[T]] This does element-wise operations, like standard vector math, not concatenation, like Semigroup[String] or Semigroup[List[T]]
If l.size != r.size, then only sums the elements up to the index min(l.size, r.size); appends the remainder to the result.
Represents a single interval on a T with an Ordering
Since Lists are mutable, this always makes a full copy.
Since Lists are mutable, this always makes a full copy. Prefer scala immutable Lists if you use scala immutable lists, the tail of the result of plus is always the right argument
Since maps are mutable, this always makes a full copy.
Since maps are mutable, this always makes a full copy. Prefer scala immutable maps if you use scala immutable maps, this operation is much faster TODO extend this to Group, Ring
Tracks the "most recent", or last, wrapped instance of T
by the
order in which items are seen.
Tracks the "most recent", or last, wrapped instance of T
by the
order in which items are seen.
wrapped instance of T
Aggregator that selects the last instance of T
in the
aggregated stream.
List concatenation monoid.
List concatenation monoid. plus means concatenation, zero is empty list
You can think of this as a Sparse vector group
A Preparer that has had zero or more map transformations applied, but no flatMaps.
A Preparer that has had zero or more map transformations applied, but no flatMaps. This can produce any type of Aggregator.
Tracks the maximum wrapped instance of some ordered type T
.
Aggregator that selects the maximum instance of T
in the
aggregated stream.
Tracks the minimum wrapped instance of some ordered type T
.
Aggregator that selects the minimum instance of T
in the
aggregated stream.
MinHasher as a Monoid operates on this class to avoid the too generic Array[Byte].
MinHasher as a Monoid operates on this class to avoid the too generic Array[Byte]. The bytes are assumed to be never modified. The only reason we did not use IndexedSeq[Byte] instead of Array[Byte] is because a ByteBuffer is used internally in MinHasher and it can wrap Array[Byte].
Instances of MinHasher can create, combine, and compare fixed-sized signatures of arbitrarily sized sets.
Instances of MinHasher can create, combine, and compare fixed-sized signatures of arbitrarily sized sets.
A signature is represented by a byte array of approx maxBytes size. You can initialize a signature with a single element, usually a Long or String. You can combine any two set's signatures to produce the signature of their union. You can compare any two set's signatures to estimate their Jaccard similarity. You can use a set's signature to estimate the number of distinct values in the set. You can also use a combination of the above to estimate the size of the intersection of two sets from their signatures. The more bytes in the signature, the more accurate all of the above will be.
You can also use these signatures to quickly find similar sets without doing n^2 comparisons. Each signature is assigned to several buckets; sets whose signatures end up in the same bucket are likely to be similar. The targetThreshold controls the desired level of similarity - the higher the threshold, the more efficiently you can find all the similar sets.
This abstract superclass is generic with regards to the size of the hash used. Depending on the number of unique values in the domain of the sets, you may want a MinHasher16, a MinHasher32, or a new custom subclass.
This implementation is modeled after Chapter 3 of Ullman and Rajaraman's Mining of Massive Datasets: http://infolab.stanford.edu/~ullman/mmds/ch3a.pdf
A class to calculate the first five central moments over a sequence of Doubles.
A class to calculate the first five central moments over a sequence of Doubles. Given the first five central moments, we can then calculate metrics like skewness and kurtosis.
m{i} denotes the ith central moment.
Simple implementation of a Monad type-class.
Simple implementation of a Monad type-class. Subclasses only need to override apply and flatMap, but they should override map, join, joinWith, and sequence if there are better implementations.
Laws Monads must follow: identities: flatMap(apply(x))(fn) == fn(x) flatMap(m)(apply _) == m associativity on flatMap (you can either flatMap f first, or f to g: flatMap(flatMap(m)(f))(g) == flatMap(m) { x => flatMap(f(x))(g) }
This enrichment allows us to use our Monad instances in for expressions: if (import Monad._) has been done
Monoid (take a deep breath, and relax about the weird name): This is a semigroup that has an additive identity (called zero), such that a+0=a, 0+a=a, for every a
Monoid (take a deep breath, and relax about the weird name): This is a semigroup that has an additive identity (called zero), such that a+0=a, 0+a=a, for every a
Some(5) - Some(3) == Some(2) Some(5) - Some(5) == None negate Some(5) == Some(-5) Note: Some(0) and None are equivalent under this Group
Some(5) + Some(3) == Some(8) Some(5) + None == Some(5)
This is a typeclass to represent things which are countable down.
This is a typeclass to represent things which are countable down. Note that it is important that a value prev(t) is always less than t. Note that prev returns Option because this class comes with the notion that some items may reach a minimum key, which is None.
Preparer is a way to build up an Aggregator through composition using a more natural API: it allows you to start with the input type and describe a series of transformations and aggregations from there, rather than starting from the aggregation and composing "outwards" in both directions.
Preparer is a way to build up an Aggregator through composition using a more natural API: it allows you to start with the input type and describe a series of transformations and aggregations from there, rather than starting from the aggregation and composing "outwards" in both directions.
Uses of Preparer will always start with a call to Preparer[A], and end with a call to monoidAggregate or a related method, to produce an Aggregator instance.
Priority is a type class for prioritized implicit search.
Priority is a type class for prioritized implicit search.
This type class will attempt to provide an implicit instance of P
(the preferred type). If that type is not available it will
fallback to F
(the fallback type). If neither type is available
then a Priority[P, F]
instance will not be available.
This type can be useful for problems where multiple algorithms can be used, depending on the type classes available.
taken from non/algebra until we make algebird depend on non/algebra
Combine 10 groups into a product group
Combine 10 monoids into a product monoid
Combine 10 rings into a product ring
Combine 10 semigroups into a product semigroup
Combine 11 groups into a product group
Combine 11 monoids into a product monoid
Combine 11 rings into a product ring
Combine 11 semigroups into a product semigroup
Combine 12 groups into a product group
Combine 12 monoids into a product monoid
Combine 12 rings into a product ring
Combine 12 semigroups into a product semigroup
Combine 13 groups into a product group
Combine 13 monoids into a product monoid
Combine 13 rings into a product ring
Combine 13 semigroups into a product semigroup
Combine 14 groups into a product group
Combine 14 monoids into a product monoid
Combine 14 rings into a product ring
Combine 14 semigroups into a product semigroup
Combine 15 groups into a product group
Combine 15 monoids into a product monoid
Combine 15 rings into a product ring
Combine 15 semigroups into a product semigroup
Combine 16 groups into a product group
Combine 16 monoids into a product monoid
Combine 16 rings into a product ring
Combine 16 semigroups into a product semigroup
Combine 17 groups into a product group
Combine 17 monoids into a product monoid
Combine 17 rings into a product ring
Combine 17 semigroups into a product semigroup
Combine 18 groups into a product group
Combine 18 monoids into a product monoid
Combine 18 rings into a product ring
Combine 18 semigroups into a product semigroup
Combine 19 groups into a product group
Combine 19 monoids into a product monoid
Combine 19 rings into a product ring
Combine 19 semigroups into a product semigroup
Combine 20 groups into a product group
Combine 20 monoids into a product monoid
Combine 20 rings into a product ring
Combine 20 semigroups into a product semigroup
Combine 21 groups into a product group
Combine 21 monoids into a product monoid
Combine 21 rings into a product ring
Combine 21 semigroups into a product semigroup
Combine 22 groups into a product group
Combine 22 monoids into a product monoid
Combine 22 rings into a product ring
Combine 22 semigroups into a product semigroup
Combine 2 groups into a product group
Combine 2 monoids into a product monoid
Combine 2 rings into a product ring
Combine 2 semigroups into a product semigroup
Combine 3 groups into a product group
Combine 3 monoids into a product monoid
Combine 3 rings into a product ring
Combine 3 semigroups into a product semigroup
Combine 4 groups into a product group
Combine 4 monoids into a product monoid
Combine 4 rings into a product ring
Combine 4 semigroups into a product semigroup
Combine 5 groups into a product group
Combine 5 monoids into a product monoid
Combine 5 rings into a product ring
Combine 5 semigroups into a product semigroup
Combine 6 groups into a product group
Combine 6 monoids into a product monoid
Combine 6 rings into a product ring
Combine 6 semigroups into a product semigroup
Combine 7 groups into a product group
Combine 7 monoids into a product monoid
Combine 7 rings into a product ring
Combine 7 semigroups into a product semigroup
Combine 8 groups into a product group
Combine 8 monoids into a product monoid
Combine 8 rings into a product ring
Combine 8 semigroups into a product semigroup
Combine 9 groups into a product group
Combine 9 monoids into a product monoid
Combine 9 rings into a product ring
Combine 9 semigroups into a product semigroup
QTree aggregator is an aggregator that can be used to find the approximate percentile bounds.
QTree aggregator is an aggregator that can be used to find the approximate percentile bounds. The items that are iterated over to produce this approximation cannot be negative. Returns an Intersection which represents the bounded approximation.
QTreeAggregatorLowerBound is an aggregator that is used to find an appoximate percentile.
QTreeAggregatorLowerBound is an aggregator that is used to find an appoximate percentile. This is similar to a QTreeAggregator, but is a convenience because instead of returning an Intersection, it instead returns the lower bound of the percentile. Like a QTreeAggregator, the items that are iterated over to produce this approximation cannot be negative.
Used to represent cases where we need to periodically reset a + b = a + b |a + b = |(a + b) a + |b = |b |a + |b = |b
Ring: Group + multiplication (see: http://en.wikipedia.org/wiki/Ring_%28mathematics%29) and the three elements it defines:
Ring: Group + multiplication (see: http://en.wikipedia.org/wiki/Ring_%28mathematics%29) and the three elements it defines:
Note, if you have distributive property, additive inverses, and multiplicative identity you can prove you have a commutative group under the ring:
Basically a specific implementation of the RightFoldedMonoid gradient is the gradient of the function to be minimized To use this, you need to insert an initial weight SGDWeights before you start adding SGDPos objects.
Basically a specific implementation of the RightFoldedMonoid gradient is the gradient of the function to be minimized To use this, you need to insert an initial weight SGDWeights before you start adding SGDPos objects. Otherwise you will just be doing list concatenation.
K1 defines a scope for the CMS.
K1 defines a scope for the CMS. For each k1, keep the top heavyHittersN associated k2 values.
A semigroup is any type T
with an associative operation (plus
):
A semigroup is any type T
with an associative operation (plus
):
a plus (b plus c) = (a plus b) plus c
Example instances:
Semigroup[Int]
: plus
Int#+
Semigroup[List[T]]
: plus
is List#++
This is a combinator on semigroups, after you do the plus, you transform B with a fold function This will not be valid for all fold functions.
This is a combinator on semigroups, after you do the plus, you transform B with a fold function This will not be valid for all fold functions. You need to prove that it is still associative.
Clearly only values of (a,b) are valid if fold(a,b) == b, so keep that in mind.
I have not yet found a sufficient condition on (A,B) => B that makes it correct Clearly a (trivial) constant function {(l,r) => r} works. Also, if B is List[T], and (l:A,r:List[T]) = r.sortBy(fn(l)) this works as well (due to the associativity on A, and the fact that the list never loses data).
For approximate lists (like top-K applications) this might work (or be close enough to associative that for approximation algorithms it is fine), and in fact, that is the main motivation of this code: Produce some ordering in A, and use it to do sorted-topK on the list in B.
Seems like an open topic here.... you are obliged to think on your own about this.
This is a summing cache whose goal is to grow until we run out of memory, at which point it clears itself and stops growing.
This is a summing cache whose goal is to grow until we run out of memory, at which point it clears itself and stops growing. Note that we can lose the values in this cache at any point; we don't put anything here we care about.
SetDiff
is a class that represents changes applied to a set.
SetDiff
is a class that represents changes applied to a set. It
is in fact a Set[T] => Set[T], but doesn't extend Function1 since
that brings in a pack of methods that we don't necessarily want.
Set union monoid.
Set union monoid. plus means union, zero is empty set
convert is not not implemented here
Use a Hash128 when converting to HLL, rather than an implicit conversion to Array[Byte] Unifying with SetSizeAggregator would be nice, but since they only differ in an implicit parameter, scala seems to be giving me errors.
An Aggregator for the SketchMap.
An Aggregator for the SketchMap. Can be created using SketchMap.aggregator
Hashes an arbitrary key type to one that the Sketch Map can use.
Responsible for creating instances of SketchMap.
Convenience class for holding constant parameters of a Sketch Map.
Data structure used in the Space-Saving Algorithm to find the approximate most frequent and top-k elements.
Data structure used in the Space-Saving Algorithm to find the approximate most frequent and top-k elements. The algorithm is described in "Efficient Computation of Frequent and Top-k Elements in Data Streams". See here: www.cs.ucsb.edu/research/tech_reports/reports/2005-23.pdf In the paper the data structure is called StreamSummary but we chose to call it SpaceSaver instead. Note that the adaptation to hadoop and parallelization were not described in the article and have not been proven to be mathematically correct or preserve the guarantees or benefits of the algorithm.
A sparse Count-Min sketch structure, used for situations where the key is highly skewed.
A Stateful summer is something that is potentially more efficient (a buffer, a cache, etc...) that has the same result as a sum: Law 1: Semigroup.sumOption(items) == (Monoid.plus(items.map { stateful.put(_) }.filter { _.isDefined }, stateful.flush) && stateful.isFlushed) Law 2: isFlushed == flush.isEmpty
This is a typeclass to represent things which increase.
This is a typeclass to represent things which increase. Note that it is important that a value after being incremented is always larger than it was before. Note that next returns Option because this class comes with the notion of the "greatest" key, which is None. Ints, for example, will cycle if next(java.lang.Integer.MAX_VALUE) is called, therefore we need a notion of what happens when we hit the bounds at which our ordering is violating. This is also useful for closed sets which have a fixed progression.
Sum the entire iterator one item at a time.
Sum the entire iterator one item at a time. Only emits on flush you should probably prefer BufferedSumAll
A Stateful Summer on Map[K,V] that keeps a cache of recent keys
A SummingCache that also tracks the number of key hits
A Count-Min sketch data structure that allows for (a) counting and frequency estimation of elements in a data stream and (b) tracking the heavy hitters among these elements.
A Count-Min sketch data structure that allows for (a) counting and frequency estimation of elements in a data stream and (b) tracking the heavy hitters among these elements.
The logic of how heavy hitters are computed is pluggable, see HeavyHittersLogic.
Tip: If you do not need to track heavy hitters, take a look at CMS, which is more efficient in this case.
This example demonstrates how to count Long
elements with TopCMS, i.e. K=Long
.
Note that the actual counting is always performed with a Long
, regardless of your choice of K
. That is,
the counting table behind the scenes is backed by Long
values (at least in the current implementation), and thus
the returned frequency estimates are always instances of Approximate[Long]
.
The type used to identify the elements to be counted.
// Creates a monoid for a CMS that can count `Long` elements. val topPctCMSMonoid: TopPctCMSMonoid[Long] = { val eps = 0.001 val delta = 1E-10 val seed = 1 val heavyHittersPct = 0.1 TopPctCMS.monoid[Long](eps, delta, seed, heavyHittersPct) } // Creates a TopCMS instance that has counted the element `1L`. val topCMS: TopCMS[Long] = topPctCMSMonoid.create(1L) // Estimates the frequency of `1L` val estimate: Approximate[Long] = topCMS.frequency(1L) // What are the heavy hitters so far? val heavyHitters: Set[Long] = topCMS.heavyHitters
Used for holding a single element, to avoid repeatedly adding elements from sparse counts tables.
Zero element.
Zero element. Used for initialization.
A top-k monoid that is much faster than SortedListTake equivalent to: (left ++ right).sorted.take(k) but doesn't do a total sort If you can handle the mutability, mutable.PriorityQueueMonoid is even faster.
A top-k monoid that is much faster than SortedListTake equivalent to: (left ++ right).sorted.take(k) but doesn't do a total sort If you can handle the mutability, mutable.PriorityQueueMonoid is even faster.
NOTE!!!! This assumes the inputs are already sorted! resorting each time kills speed
An Aggregator for TopNCMS.
An Aggregator for TopNCMS. Can be created using TopNCMS.aggregator.
Monoid for top-N based TopCMS sketches.
Monoid for top-N based TopCMS sketches. Use with care! (see warning below)
++
) is an unsafe operationTop-N computations are not associative. The effect is that a top-N CMS has an ordering bias (with regard to heavy
hitters) when merging CMS instances (e.g. via ++
). This means merging heavy hitters across CMS instances may
lead to incorrect, biased results: the outcome is biased by the order in which CMS instances / heavy hitters are
being merged, with the rule of thumb being that the earlier a set of heavy hitters is being merged, the more likely
is the end result biased towards these heavy hitters.
The warning above only applies when adding CMS instances (think: cms1 ++ cms2
). In comparison, heavy hitters
are correctly computed when:
Seq[K]
cms + item
or cms + (item, count)
.See the discussion in Algebird issue 353 for further details.
The following, alternative data structures may be better picks than a top-N based CMS given the warning above:
The type K
is the type of items you want to count. You must provide an implicit CMSHasher[K]
for K
, and
Algebird ships with several such implicits for commonly used types such as Long
and BigInt
.
If your type K
is not supported out of the box, you have two options: 1) You provide a "translation" function to
convert items of your (unsupported) type K
to a supported type such as Double, and then use the contramap
function of CMSHasher to create the required CMSHasher[K]
for your type (see the documentation of CMSHasher
for an example); 2) You implement a CMSHasher[K]
from scratch, using the existing CMSHasher implementations as a
starting point.
Note: Because Arrays in Scala/Java not have sane equals
and hashCode
implementations, you cannot safely use types
such as Array[Byte]
. Extra work is required for Arrays. For example, you may opt to convert Array[T]
to a
Seq[T]
via toSeq
, or you can provide appropriate wrapper classes. Algebird provides one such wrapper class,
Bytes, to safely wrap an Array[Byte]
for use with CMS.
The type used to identify the elements to be counted. For example, if you want to count the occurrence of
user names, you could map each username to a unique numeric ID expressed as a Long
, and then count the
occurrences of those Long
s with a CMS of type K=Long
. Note that this mapping between the elements of
your problem domain and their identifiers used for counting via CMS should be bijective.
We require a CMSHasher context bound for K
, see CMSHasher for available implicits that
can be imported.
Which type K should you pick in practice? For domains that have less than 2^64
unique elements, you'd
typically use
Long. For larger domains you can try
BigInt, for example.
Tracks the top N heavy hitters, where N
is defined by heavyHittersN
.
Tracks the top N heavy hitters, where N
is defined by heavyHittersN
.
Warning: top-N computations are not associative. The effect is that a top-N CMS has an ordering bias (with regard to heavy hitters) when merging instances. This means merging heavy hitters across CMS instances may lead to incorrect, biased results: the outcome is biased by the order in which CMS instances / heavy hitters are being merged, with the rule of thumb being that the earlier a set of heavy hitters is being merged, the more likely is the end result biased towards these heavy hitters.
Discussion in Algebird issue 353
An Aggregator for TopPctCMS.
An Aggregator for TopPctCMS. Can be created using TopPctCMS.aggregator.
Monoid for Top-% based TopCMS sketches.
Monoid for Top-% based TopCMS sketches.
The type K
is the type of items you want to count. You must provide an implicit CMSHasher[K]
for K
, and
Algebird ships with several such implicits for commonly used types such as Long
and BigInt
.
If your type K
is not supported out of the box, you have two options: 1) You provide a "translation" function to
convert items of your (unsupported) type K
to a supported type such as Double, and then use the contramap
function of CMSHasher to create the required CMSHasher[K]
for your type (see the documentation of CMSHasher
for an example); 2) You implement a CMSHasher[K]
from scratch, using the existing CMSHasher implementations as a
starting point.
Note: Because Arrays in Scala/Java not have sane equals
and hashCode
implementations, you cannot safely use types
such as Array[Byte]
. Extra work is required for Arrays. For example, you may opt to convert Array[T]
to a
Seq[T]
via toSeq
, or you can provide appropriate wrapper classes. Algebird provides one such wrapper class,
Bytes, to safely wrap an Array[Byte]
for use with CMS.
The type used to identify the elements to be counted. For example, if you want to count the occurrence of
user names, you could map each username to a unique numeric ID expressed as a Long
, and then count the
occurrences of those Long
s with a CMS of type K=Long
. Note that this mapping between the elements of
your problem domain and their identifiers used for counting via CMS should be bijective.
We require a CMSHasher context bound for K
, see CMSHasher for available implicits that
can be imported.
Which type K should you pick in practice? For domains that have less than 2^64
unique elements, you'd
typically use
Long. For larger domains you can try
BigInt, for example.
Finds all heavy hitters, i.e., elements in the stream that appear at least (heavyHittersPct * totalCount)
times.
Finds all heavy hitters, i.e., elements in the stream that appear at least (heavyHittersPct * totalCount)
times.
Every item that appears at least (heavyHittersPct * totalCount)
times is output, and with probability
p >= 1 - delta
, no item whose count is less than (heavyHittersPct - eps) * totalCount
is output.
This also means that this parameter is an upper bound on the number of heavy hitters that will be tracked: the set
of heavy hitters contains at most 1 / heavyHittersPct
elements. For example, if heavyHittersPct=0.01
(or
0.25), then at most 1 / 0.01 = 100
items (or 1 / 0.25 = 4
items) will be tracked/returned as heavy hitters.
This parameter can thus control the memory footprint required for tracking heavy hitters.
Combine 10 groups into a product group
Combine 10 monoids into a product monoid
Combine 10 rings into a product ring
Combine 10 semigroups into a product semigroup
Combine 11 groups into a product group
Combine 11 monoids into a product monoid
Combine 11 rings into a product ring
Combine 11 semigroups into a product semigroup
Combine 12 groups into a product group
Combine 12 monoids into a product monoid
Combine 12 rings into a product ring
Combine 12 semigroups into a product semigroup
Combine 13 groups into a product group
Combine 13 monoids into a product monoid
Combine 13 rings into a product ring
Combine 13 semigroups into a product semigroup
Combine 14 groups into a product group
Combine 14 monoids into a product monoid
Combine 14 rings into a product ring
Combine 14 semigroups into a product semigroup
Combine 15 groups into a product group
Combine 15 monoids into a product monoid
Combine 15 rings into a product ring
Combine 15 semigroups into a product semigroup
Combine 16 groups into a product group
Combine 16 monoids into a product monoid
Combine 16 rings into a product ring
Combine 16 semigroups into a product semigroup
Combine 17 groups into a product group
Combine 17 monoids into a product monoid
Combine 17 rings into a product ring
Combine 17 semigroups into a product semigroup
Combine 18 groups into a product group
Combine 18 monoids into a product monoid
Combine 18 rings into a product ring
Combine 18 semigroups into a product semigroup
Combine 19 groups into a product group
Combine 19 monoids into a product monoid
Combine 19 rings into a product ring
Combine 19 semigroups into a product semigroup
Combine 20 groups into a product group
Combine 20 monoids into a product monoid
Combine 20 rings into a product ring
Combine 20 semigroups into a product semigroup
Combine 21 groups into a product group
Combine 21 monoids into a product monoid
Combine 21 rings into a product ring
Combine 21 semigroups into a product semigroup
Combine 22 groups into a product group
Combine 22 monoids into a product monoid
Combine 22 rings into a product ring
Combine 22 semigroups into a product semigroup
Combine 2 groups into a product group
Combine 2 monoids into a product monoid
Combine 2 rings into a product ring
Combine 2 semigroups into a product semigroup
Combine 3 groups into a product group
Combine 3 monoids into a product monoid
Combine 3 rings into a product ring
Combine 3 semigroups into a product semigroup
Combine 4 groups into a product group
Combine 4 monoids into a product monoid
Combine 4 rings into a product ring
Combine 4 semigroups into a product semigroup
Combine 5 groups into a product group
Combine 5 monoids into a product monoid
Combine 5 rings into a product ring
Combine 5 semigroups into a product semigroup
Combine 6 groups into a product group
Combine 6 monoids into a product monoid
Combine 6 rings into a product ring
Combine 6 semigroups into a product semigroup
Combine 7 groups into a product group
Combine 7 monoids into a product monoid
Combine 7 rings into a product ring
Combine 7 semigroups into a product semigroup
Combine 8 groups into a product group
Combine 8 monoids into a product monoid
Combine 8 rings into a product ring
Combine 8 semigroups into a product semigroup
Combine 9 groups into a product group
Combine 9 monoids into a product monoid
Combine 9 rings into a product ring
Combine 9 semigroups into a product semigroup
In some legacy cases, we have implemented Rings where we lacked the full laws.
In some legacy cases, we have implemented Rings where we lacked the full laws. This allows you to be precise (only implement the structure you have), but unsafely use it as a Ring in legacy code that is expecting a Ring.
In some legacy cases, we have implemented Rings where we lacked the full laws.
In some legacy cases, we have implemented Rings where we lacked the full laws. This allows you to be precise (only implement the structure you have), but unsafely use it as a Ring in legacy code that is expecting a Ring.
A super lightweight (hopefully) version of BitSet
A super lightweight (hopefully) version of BitSet
(Since version 0.12.3) This is no longer used.
Some functions to create or convert AdaptiveVectors
Aggregators compose well.
Aggregators compose well.
To create a parallel aggregator that operates on a single input in parallel, use: GeneratedTupleAggregator.from2((agg1, agg2))
Boolean AND monoid.
Boolean AND monoid. plus means logical AND, zero is true.
Follows the type-class pattern for the Applicative trait
Group implementation for AveragedValue.
Provides a set of operations needed to create and use AveragedValue instances.
Aggregator that uses AveragedValue to calculate the mean
of all Double
values in the stream.
Aggregator that uses AveragedValue to calculate the mean
of all Double
values in the stream. Each Double value receives a
count of 1 during aggregation.
Helper functions to generate or to translate between various CMS parameters (cf.
Helper functions to generate or to translate between various CMS parameters (cf. CMSParams).
This formerly held the instances that moved to object CMSHasher
This formerly held the instances that moved to object CMSHasher
These instances are slow, but here for compatibility with old serialized data. For new code, avoid these and instead use the implicits found in the CMSHasher companion object.
Represents a container class together with time.
Represents a container class together with time. Its monoid consists of exponentially scaling the older value and summing with the newer one.
Provides a set of operations and typeclass instances needed to use First instances.
Methods to create and run Folds.
Methods to create and run Folds.
The Folds defined here are immutable and serializable, which we expect by default. It is important that you as a user indicate mutability or non-serializability when defining new Folds. Additionally, it is recommended that "end" functions not mutate the accumulator in order to support scans (producing a stream of intermediate outputs by calling "end" at each step).
Follows the type-class pattern for the Functor trait
This gives default hashes using Murmur128 with a seed of 12345678 (for no good reason, but it should not be changed lest we break serialized HLLs)
Implementation of the HyperLogLog approximate counting as a Monoid
Implementation of the HyperLogLog approximate counting as a Monoid
http://algo.inria.fr/flajolet/Publications/FlFuGaMe07.pdf HyperLogLog: the analysis of a near-optimal cardinality estimation algorithm Philippe Flajolet and Éric Fusy and Olivier Gandouet and Frédéric Meunier
This object makes it easier to create Aggregator instances that use HLL
Provides a set of operations and typeclass instances needed to use Last instances.
Provides a set of operations and typeclass instances needed to use Max instances.
A Metric[V] m is a function (V, V) => Double that satisfies the following properties:
A Metric[V] m is a function (V, V) => Double that satisfies the following properties:
1. m(v1, v2) >= 0 2. m(v1, v2) == 0 iff v1 == v2 3. m(v1, v2) == m(v2, v1) 4. m(v1, v3) <= m(v1, v2) + m(v2, v3)
If you implement this trait, make sure that you follow these rules.
Provides a set of operations and typeclass instances needed to use Min instances.
A monoid to perform moment calculations.
Follows the type-class pattern for the Monad trait
Boolean OR monoid.
Boolean OR monoid. plus means logical OR, zero is false.
A QTree provides an approximate Map[Double,A:Monoid] suitable for range queries, quantile queries, and combinations of these (for example, if you use a numeric A, you can derive the inter-quartile mean).
A QTree provides an approximate Map[Double,A:Monoid] suitable for range queries, quantile queries, and combinations of these (for example, if you use a numeric A, you can derive the inter-quartile mean).
It is loosely related to the Q-Digest data structure from http://www.cs.virginia.edu/~son/cs851/papers/ucsb.sensys04.pdf, but using an immutable tree structure, and carrying a generalized sum (of type A) at each node instead of just a count.
The basic idea is to keep a binary tree, where the root represents the entire range of the input keys, and each child node represents either the lower or upper half of its parent's range. Ranges are constrained to be dyadic intervals (https://en.wikipedia.org/wiki/Interval_(mathematics)#Dyadic_intervals) for ease of merging.
To keep the size bounded, the total count carried by any sub-tree must be at least 1/(2^k) of the total count at the root. Any sub-trees that do not meet this criteria have their children pruned and become leaves. (It's important that they not be pruned away entirely, but that we keep a fringe of low-count leaves that can gain weight over time and ultimately split again when warranted).
Quantile and range queries both give hard upper and lower bounds; the true result will be somewhere in the range given.
Keys must be >= 0.
This is an associative, but not commutative monoid Also, you must start on the right, with a value, and all subsequent RightFolded must be RightFoldedToFold objects or zero
This is an associative, but not commutative monoid Also, you must start on the right, with a value, and all subsequent RightFolded must be RightFoldedToFold objects or zero
If you add two Folded values together, you always get the one on the left, so this forms a kind of reset of the fold.
This monoid takes a list of values of type In or Out, and folds to the right all the Ins into Out values, leaving you with a list of Out values, then finally, maps those outs onto Acc, where there is a group, and adds all the Accs up.
This monoid takes a list of values of type In or Out, and folds to the right all the Ins into Out values, leaving you with a list of Out values, then finally, maps those outs onto Acc, where there is a group, and adds all the Accs up. So, if you have a list: I I I O I O O I O I O the monoid is equivalent to the computation:
map(fold(List(I,I,I),O)) + map(fold(List(I),O)) + map(fold(List(),O)) + map(fold(List(I),O)) + map(fold(List(I),O))
This models a version of the map/reduce paradigm, where the fold happens on the mappers for each group on Ins, and then they are mapped to Accs, sent to a single reducer and all the Accs are added up.
Data structure representing an approximation of Map[K, V], where V has an implicit ordering and monoid.
Data structure representing an approximation of Map[K, V], where V has an implicit ordering and monoid. This is a more generic version of CountMinSketch.
Values are stored in valuesTable, a 2D vector containing aggregated sums of values inserted to the Sketch Map.
The data structure stores top non-zero values, called Heavy Hitters. The values are sorted by an implicit reverse ordering for the value, and the number of heavy hitters stored is based on the heavyHittersCount set in params.
Use SketchMapMonoid to create instances of this class.
Creates an Iterator that emits partial sums of an input Iterator[V].
Creates an Iterator that emits partial sums of an input Iterator[V]. Generally this is useful to change from processing individual Vs to possibly blocks of V @see SummingQueue or a cache of recent Keys in a V=Map[K,W] case: @see SummingCache
This class represents a vector space.
This class represents a vector space. For the required properties see:
http://en.wikipedia.org/wiki/Vector_space#Definition
This is here to ease transition to using algebra.Field as the field type.
This is here to ease transition to using algebra.Field as the field type. Intended use is to do:
{code} import com.twitter.algebird.field._ {/code}
Note, this are not strictly lawful since floating point arithmetic using IEEE-754 is only approximately associative and distributive.