Class MapAggregator<U extends Comparable<U> & Serializable,X>
- Type Parameters:
X
- the type that is returned by the currently set of mapper function. the next added mapper function will be called with a parameter of this type as inputU
- the type of the index values returned by the `mapper function`, used to group results
- All Implemented Interfaces:
Mappable<X>
This class provides similar functionality as a MapReducer, with the difference that here the `reduce` does automatic aggregation of results by the values returned by an arbitrary indexing function.
All results for which the set `indexer` returns the same value are aggregated into separate "bins". This can be used to aggregate results by timestamp, geographic region, user id, osm tag, etc.
Internally, this wraps around an existing MapReducer object, which still continues to be responsible for all actual calculations.
-
Method Summary
Modifier and TypeMethodDescription<V extends Comparable<V> & Serializable>
MapAggregator<OSHDBCombinedIndex<U,V>, X> aggregateBy
(SerializableFunction<X, V> indexer) Sets up aggregation by another custom index.<V extends Comparable<V> & Serializable>
MapAggregator<OSHDBCombinedIndex<U,V>, X> aggregateBy
(SerializableFunction<X, V> indexer, Collection<V> zerofill) Sets up aggregation by another custom index.<V extends Comparable<V> & Serializable,
P extends org.locationtech.jts.geom.Geometry & org.locationtech.jts.geom.Polygonal>
MapAggregator<OSHDBCombinedIndex<U,V>, X> aggregateByGeometry
(Map<V, P> geometries) Aggregates the results by sub-regions as well, in addition to the timestamps.Sets up automatic aggregation by timestamp.Sets up aggregation by a custom time index.areaOfInterest
(OSHDBBoundingBox bboxFilter) Set the area of interest to the given bounding box.<P extends org.locationtech.jts.geom.Geometry & org.locationtech.jts.geom.Polygonal>
MapAggregator<U,X> areaOfInterest
(P polygonFilter) Set the area of interest to the given polygon.average()
Calculates the averages of the results.average
(SerializableFunction<X, R> mapper) Calculates the average of the results provided by a given `mapper` function.collect()
Collects the results of this data aggregation into Lists.count()
Counts the number of results.Counts all unique values of the results.Returns an estimate of the median of the results.estimatedMedian
(SerializableFunction<X, R> mapper) Returns an estimate of the median of the results after applying the given map function.estimatedQuantile
(double q) Returns an estimate of a requested quantile of the results.estimatedQuantile
(SerializableFunction<X, R> mapper, double q) Returns an estimate of a requested quantile of the results after applying the given map function.Returns a function that computes estimates of arbitrary quantiles of the results.Returns an estimate of the quantiles of the results.<R extends Number>
SortedMap<U,DoubleUnaryOperator> estimatedQuantiles
(SerializableFunction<X, R> mapper) Returns a function that computes estimates of arbitrary quantiles of the results after applying the given map function.estimatedQuantiles
(SerializableFunction<X, R> mapper, Iterable<Double> q) Returns an estimate of the quantiles of the results after applying the given map function.Apply a textual filter to this query.Apply a custom filter expression to this query.Adds a custom arbitrary filter that gets executed in the current transformation chain.<R> MapAggregator<U,
R> flatMap
(SerializableFunction<X, Iterable<R>> flatMapper) Set an arbitrary `flatMap` transformation function, which returns list with an arbitrary number of results per input data entry.void
forEach
(SerializableBiConsumer<U, List<X>> action) Deprecated.only for testing purposes.<R> MapAggregator<U,
R> map
(SerializableFunction<X, R> mapper) Set an arbitrary `map` transformation function.reduce
(SerializableSupplier<S> identitySupplier, SerializableBiFunction<S, X, S> accumulator, SerializableBinaryOperator<S> combiner) Map-reduce routine with built-in aggregation.reduce
(SerializableSupplier<X> identitySupplier, SerializableBinaryOperator<X> accumulator) Map-reduce routine with built-in aggregation (shorthand syntax).stream()
Returns all results as a Stream.sum()
Sums up the results.sum
(SerializableFunction<X, R> mapper) Sums up the results provided by a given `mapper` function.uniq()
Gets all unique values of the results.uniq
(SerializableFunction<X, R> mapper) Gets all unique values of the results provided by a given mapper function.Calculates the weighted average of the results provided by the `mapper` function.
-
Method Details
-
aggregateBy
@Contract(pure=true) public <V extends Comparable<V> & Serializable> MapAggregator<OSHDBCombinedIndex<U,V>, aggregateByX> (SerializableFunction<X, V> indexer, Collection<V> zerofill) Sets up aggregation by another custom index.- Parameters:
indexer
- a callback function that returns an index object for each given datazerofill
- a collection of values that are expected to be present in the result- Returns:
- a MapAggregatorByIndex object with the new index applied as well
-
aggregateBy
@Contract(pure=true) public <V extends Comparable<V> & Serializable> MapAggregator<OSHDBCombinedIndex<U,V>, aggregateByX> (SerializableFunction<X, V> indexer) Sets up aggregation by another custom index.- Type Parameters:
V
- the type of the values used to aggregate- Parameters:
indexer
- a callback function that returns an index object for each given data.- Returns:
- a MapAggregatorByIndex object with the new index applied as well
-
aggregateByTimestamp
@Contract(pure=true) public MapAggregator<OSHDBCombinedIndex<U,OSHDBTimestamp>, aggregateByTimestamp()X> Sets up automatic aggregation by timestamp.In the OSMEntitySnapshotView, the snapshots' timestamp will be used directly to aggregate results into. In the OSMContributionView, the timestamps of the respective data modifications will be matched to corresponding time intervals (that are defined by the `timestamps` setting here).
- Returns:
- a MapAggregatorByTimestampAndIndex object with the equivalent state (settings, filters, map function, etc.) of the current MapReducer object
-
aggregateByTimestamp
@Contract(pure=true) public MapAggregator<OSHDBCombinedIndex<U,OSHDBTimestamp>, aggregateByTimestampX> (SerializableFunction<X, OSHDBTimestamp> indexer) Sets up aggregation by a custom time index.The timestamps returned by the supplied indexing function are matched to the corresponding time intervals
- Parameters:
indexer
- a callback function that returns a timestamp object for each given data. Note that if this function returns timestamps outside of the supplied timestamps() interval results may be undefined- Returns:
- a MapAggregatorByTimestampAndIndex object with the equivalent state (settings, filters, map function, etc.) of the current MapReducer object
-
aggregateByGeometry
@Contract(pure=true) public <V extends Comparable<V> & Serializable,P extends org.locationtech.jts.geom.Geometry & org.locationtech.jts.geom.Polygonal> MapAggregator<OSHDBCombinedIndex<U,V>, aggregateByGeometryX> (Map<V, P> geometries) throws UnsupportedOperationExceptionAggregates the results by sub-regions as well, in addition to the timestamps.Cannot be used together with the `groupByEntity()` setting enabled.
- Type Parameters:
V
- the type of the identifers used to aggregateP
- a polygonal geometry type- Parameters:
geometries
- an associated list of polygons and identifiers- Returns:
- a MapAggregator object with the equivalent state (settings, filters, map function, etc.) of the current MapReducer object
- Throws:
UnsupportedOperationException
- if this is called when the `groupByEntity()` mode has been activatedUnsupportedOperationException
- when called after any map or flatMap functions are set
-
areaOfInterest
Set the area of interest to the given bounding box.Only objects inside or clipped by this bbox will be passed on to the analysis' `mapper` function.
- Parameters:
bboxFilter
- the bounding box to query the data in- Returns:
- a modified copy of this object (can be used to chain multiple commands together)
-
areaOfInterest
@Contract(pure=true) public <P extends org.locationtech.jts.geom.Geometry & org.locationtech.jts.geom.Polygonal> MapAggregator<U,X> areaOfInterest(P polygonFilter) Set the area of interest to the given polygon. Only objects inside or clipped by this polygon will be passed on to the analysis' `mapper` function.- Parameters:
polygonFilter
- the bounding box to query the data in- Returns:
- a modified copy of this object (can be used to chain multiple commands together)
-
sum
Sums up the results.The current data values need to be numeric (castable to "Number" type), otherwise a runtime exception will be thrown.
- Returns:
- the sum of the current data
- Throws:
UnsupportedOperationException
- if the data cannot be cast to numbersException
-
sum
@Contract(pure=true) public <R extends Number> SortedMap<U,R> sum(SerializableFunction<X, R> mapper) throws ExceptionSums up the results provided by a given `mapper` function.This is a shorthand for `.map(mapper).sum()`, with the difference that here the numerical return type of the `mapper` is ensured.
- Type Parameters:
R
- the numeric type that is returned by the `mapper` function- Parameters:
mapper
- function that returns the numbers to sum up- Returns:
- the summed up results of the `mapper` function
- Throws:
Exception
-
count
Counts the number of results.- Returns:
- the total count of features or modifications, summed up over all timestamps
- Throws:
Exception
-
uniq
Gets all unique values of the results.For example, this can be used together with the OSMContributionView to get the total amount of unique users editing specific feature types.
- Returns:
- the set of distinct values
- Throws:
Exception
-
uniq
@Contract(pure=true) public <R> SortedMap<U,Set<R>> uniq(SerializableFunction<X, R> mapper) throws ExceptionGets all unique values of the results provided by a given mapper function.This is a shorthand for `.map(mapper).uniq()`.
- Type Parameters:
R
- the type that is returned by the `mapper` function- Parameters:
mapper
- function that returns some values- Returns:
- a set of distinct values returned by the `mapper` function
- Throws:
Exception
-
countUniq
Counts all unique values of the results.For example, this can be used together with the OSMContributionView to get the number of unique users editing specific feature types.
- Returns:
- the set of distinct values
- Throws:
Exception
-
average
Calculates the averages of the results.The current data values need to be numeric (castable to "Number" type), otherwise a runtime exception will be thrown.
- Returns:
- the average of the current data
- Throws:
UnsupportedOperationException
- if the data cannot be cast to numbersException
-
average
@Contract(pure=true) public <R extends Number> SortedMap<U,Double> average(SerializableFunction<X, R> mapper) throws ExceptionCalculates the average of the results provided by a given `mapper` function.- Type Parameters:
R
- the numeric type that is returned by the `mapper` function- Parameters:
mapper
- function that returns the numbers to average- Returns:
- the average of the numbers returned by the `mapper` function
- Throws:
Exception
-
weightedAverage
@Contract(pure=true) public SortedMap<U,Double> weightedAverage(SerializableFunction<X, WeightedValue> mapper) throws ExceptionCalculates the weighted average of the results provided by the `mapper` function.The mapper must return an object of the type `WeightedValue` which contains a numeric value associated with a (floating point) weight.
- Parameters:
mapper
- function that gets called for each entity snapshot or modification, needs to return the value and weight combination of numbers to average- Returns:
- the weighted average of the numbers returned by the `mapper` function
- Throws:
Exception
-
estimatedMedian
Returns an estimate of the median of the results.Uses the t-digest algorithm to calculate estimates for the quantiles in a map-reduce system: https://raw.githubusercontent.com/tdunning/t-digest/master/docs/t-digest-paper/histo.pdf
- Returns:
- estimated median
- Throws:
Exception
-
estimatedMedian
@Contract(pure=true) public <R extends Number> SortedMap<U,Double> estimatedMedian(SerializableFunction<X, R> mapper) throws ExceptionReturns an estimate of the median of the results after applying the given map function.Uses the t-digest algorithm to calculate estimates for the quantiles in a map-reduce system: https://raw.githubusercontent.com/tdunning/t-digest/master/docs/t-digest-paper/histo.pdf
- Parameters:
mapper
- function that returns the numbers to generate the mean for- Returns:
- estimated median
- Throws:
Exception
-
estimatedQuantile
Returns an estimate of a requested quantile of the results.Uses the t-digest algorithm to calculate estimates for the quantiles in a map-reduce system: https://raw.githubusercontent.com/tdunning/t-digest/master/docs/t-digest-paper/histo.pdf
- Parameters:
q
- the desired quantile to calculate (as a number between 0 and 1)- Returns:
- estimated quantile boundary
- Throws:
Exception
-
estimatedQuantile
@Contract(pure=true) public <R extends Number> SortedMap<U,Double> estimatedQuantile(SerializableFunction<X, R> mapper, double q) throws ExceptionReturns an estimate of a requested quantile of the results after applying the given map function.Uses the t-digest algorithm to calculate estimates for the quantiles in a map-reduce system: https://raw.githubusercontent.com/tdunning/t-digest/master/docs/t-digest-paper/histo.pdf
- Parameters:
mapper
- function that returns the numbers to generate the quantile forq
- the desired quantile to calculate (as a number between 0 and 1)- Returns:
- estimated quantile boundary
- Throws:
Exception
-
estimatedQuantiles
@Contract(pure=true) public SortedMap<U,List<Double>> estimatedQuantiles(Iterable<Double> q) throws Exception Returns an estimate of the quantiles of the results.Uses the t-digest algorithm to calculate estimates for the quantiles in a map-reduce system: https://raw.githubusercontent.com/tdunning/t-digest/master/docs/t-digest-paper/histo.pdf
- Parameters:
q
- the desired quantiles to calculate (as a collection of numbers between 0 and 1)- Returns:
- estimated quantile boundaries
- Throws:
Exception
-
estimatedQuantiles
@Contract(pure=true) public <R extends Number> SortedMap<U,List<Double>> estimatedQuantiles(SerializableFunction<X, R> mapper, Iterable<Double> q) throws ExceptionReturns an estimate of the quantiles of the results after applying the given map function.Uses the t-digest algorithm to calculate estimates for the quantiles in a map-reduce system: https://raw.githubusercontent.com/tdunning/t-digest/master/docs/t-digest-paper/histo.pdf
- Parameters:
mapper
- function that returns the numbers to generate the quantiles forq
- the desired quantiles to calculate (as a collection of numbers between 0 and 1)- Returns:
- estimated quantile boundaries
- Throws:
Exception
-
estimatedQuantiles
Returns a function that computes estimates of arbitrary quantiles of the results.Uses the t-digest algorithm to calculate estimates for the quantiles in a map-reduce system: https://raw.githubusercontent.com/tdunning/t-digest/master/docs/t-digest-paper/histo.pdf
- Returns:
- a function that computes estimated quantile boundaries
- Throws:
Exception
-
estimatedQuantiles
@Contract(pure=true) public <R extends Number> SortedMap<U,DoubleUnaryOperator> estimatedQuantiles(SerializableFunction<X, R> mapper) throws ExceptionReturns a function that computes estimates of arbitrary quantiles of the results after applying the given map function.Uses the t-digest algorithm to calculate estimates for the quantiles in a map-reduce system: https://raw.githubusercontent.com/tdunning/t-digest/master/docs/t-digest-paper/histo.pdf
- Parameters:
mapper
- function that returns the numbers to generate the quantiles for- Returns:
- a function that computes estimated quantile boundaries
- Throws:
Exception
-
forEach
Deprecated.only for testing purposes. use `.collect().forEach()` or `.stream().forEach()` insteadIterates over the results of this data aggregation.This method can be handy for testing purposes. But note that since the `action` doesn't produce a return value, it must facilitate its own way of producing output.
If you'd like to use such a "forEach" in a non-test use case, use `.collect().forEach()` or `.stream().forEach()` instead.
- Parameters:
action
- function that gets called for each transformed data entry- Throws:
Exception
-
collect
Collects the results of this data aggregation into Lists.- Returns:
- an aggregated map of lists with all results
- Throws:
Exception
-
stream
Returns all results as a Stream.- Returns:
- a stream with all results returned by the `mapper` function
- Throws:
Exception
-
map
Set an arbitrary `map` transformation function.- Specified by:
map
in interfaceMappable<U extends Comparable<U> & Serializable>
- Type Parameters:
R
- an arbitrary data type which is the return type of the transformation `map` function- Parameters:
mapper
- function that will be applied to each data entry (osm entity snapshot or contribution)- Returns:
- a modified copy of this MapAggregator object operating on the transformed type R
-
flatMap
@Contract(pure=true) public <R> MapAggregator<U,R> flatMap(SerializableFunction<X, Iterable<R>> flatMapper) Set an arbitrary `flatMap` transformation function, which returns list with an arbitrary number of results per input data entry.The results of this function will be "flattened", meaning that they can be for example transformed again by setting additional `map` functions.
- Specified by:
flatMap
in interfaceMappable<U extends Comparable<U> & Serializable>
- Type Parameters:
R
- an arbitrary data type which is the return type of the transformation `map` function- Parameters:
flatMapper
- function that will be applied to each data entry (osm entity snapshot or contribution) and returns a list of results- Returns:
- a modified copy of this MapAggregator object operating on the transformed type R
-
filter
Adds a custom arbitrary filter that gets executed in the current transformation chain.- Specified by:
filter
in interfaceMappable<U extends Comparable<U> & Serializable>
- Parameters:
f
- the filter function that determines if the respective data should be passed on (when f returns true) or discarded (when f returns false)- Returns:
- a modified copy of this object (can be used to chain multiple commands together)
-
filter
Apply a custom filter expression to this query.- Parameters:
f
- theFilterExpression
to apply- Returns:
- a modified copy of this object (can be used to chain multiple commands together)
- See Also:
-
- oshdb-filter readme and org.heigit.ohsome.oshdb.filter for further information about how to create such a filter expression object.
-
filter
Apply a textual filter to this query.- Parameters:
f
- the filter string to apply- Returns:
- a modified copy of this object (can be used to chain multiple commands together)
- See Also:
-
- oshdb-filter readme for a description of the filter syntax.
-
reduce
@Contract(pure=true) public <S> SortedMap<U,S> reduce(SerializableSupplier<S> identitySupplier, SerializableBiFunction<S, X, throws ExceptionS> accumulator, SerializableBinaryOperator<S> combiner) Map-reduce routine with built-in aggregation.This can be used to perform an arbitrary reduce routine whose results are aggregated separately according to some custom index value.
The combination of the used types and identity/reducer functions must make "mathematical" sense:
- the accumulator and combiner functions need to be associative,
- values generated by the identitySupplier factory must be an identity for the combiner function: `combiner(identitySupplier(),x)` must be equal to `x`,
- the combiner function must be compatible with the accumulator function: `combiner(u, accumulator(identitySupplier(), t)) == accumulator.apply(u, t)`
Functionally, this interface is similar to Java11 Stream's reduce(identity,accumulator,combiner) interface.
- Type Parameters:
S
- the data type used to contain the "reduced" (intermediate and final) results- Parameters:
identitySupplier
- a factory function that returns a new starting value to reduce results into (e.g. when summing values, one needs to start at zero)accumulator
- a function that takes a result from the `mapper` function (type <R>) and an accumulation value (type <S>, e.g. the result of `identitySupplier()`) and returns the "sum" of the two; contrary to `combiner`, this function is allowed to alter (mutate) the state of the accumulation value (e.g. directly adding new values to an existing Set object)combiner
- a function that calculates the "sum" of two <S> values; this function must be pure (have no side effects), and is not allowed to alter the state of the two input objects it gets!- Returns:
- the result of the map-reduce operation, the final result of the last call to the `combiner` function, after all `mapper` results have been aggregated (in the `accumulator` and `combiner` steps)
- Throws:
Exception
-
reduce
@Contract(pure=true) public SortedMap<U,X> reduce(SerializableSupplier<X> identitySupplier, SerializableBinaryOperator<X> accumulator) throws Exception Map-reduce routine with built-in aggregation (shorthand syntax).This can be used to perform an arbitrary reduce routine whose results are aggregated separately according to some custom index value.
This variant is shorter to program than `reduce(identitySupplier, accumulator, combiner)`, but can only be used if the result type is the same as the current `map`ped type <X>. Also this variant can be less efficient since it cannot benefit from the mutability freedoms the accumulator+combiner approach has.
The combination of the used types and identity/reducer functions must make "mathematical" sense:
- the accumulator and combiner functions need to be associative,
- values generated by the identitySupplier factory must be an identity for the combiner function: `combiner(identitySupplier(),x)` must be equal to `x`,
- the combiner function must be compatible with the accumulator function: `combiner(u, accumulator(identitySupplier(), t)) == accumulator.apply(u, t)`
Functionally, this interface is similar to Java11 Stream's reduce(identity,accumulator,combiner) interface.
- Parameters:
identitySupplier
- a factory function that returns a new starting value to reduce results into (e.g. when summing values, one needs to start at zero)accumulator
- a function that takes a result from the `mapper` function (type <X>) and an accumulation value (also of type <X>, e.g. the result of `identitySupplier()`) and returns the "sum" of the two; contrary to `combiner`, this function is not to alter (mutate) the state of the accumulation value (e.g. directly adding new values to an existing Set object)- Returns:
- the result of the map-reduce operation, the final result of the last call to the `combiner` function, after all `mapper` results have been aggregated (in the `accumulator` and `combiner` steps)
- Throws:
Exception
-