Class CountingSummarizer<K>
- java.lang.Object
-
- org.apache.accumulo.core.client.summary.CountingSummarizer<K>
-
- Type Parameters:
K
- The counter key type. This type must have good implementations ofObject.hashCode()
andObject.equals(Object)
.
- All Implemented Interfaces:
Summarizer
- Direct Known Subclasses:
AuthorizationSummarizer
,FamilySummarizer
,VisibilitySummarizer
public abstract class CountingSummarizer<K> extends Object implements Summarizer
This class counts arbitrary keys while defending against too many keys and keys that are too long.During collection and summarization this class will use the functions from
converter()
andencoder()
. For each key/value the function fromconverter()
will be called to create zero or more counter objects. A counter associated with each counter object will be incremented, as long as there are not too many counters and the counter object is not too long.When
Summarizer.Collector.summarize(Summarizer.StatisticConsumer)
is called, the function fromencoder()
will be used to convert counter objects to strings. These strings will be used to emit statistics. Overridingencoder()
is optional. One reason to override is if the counter object contains binary or special data. For example, a function that base64 encodes counter objects could be created.If the counter key type is mutable, then consider overriding
copier()
.The function returned by
converter()
will be called frequently and should be very efficient. The function returned byencoder()
will be called less frequently and can be more expensive. The reason these two functions exists is to avoid the conversion to string for each key value, if that conversion is unnecessary.Below is an example implementation that counts column visibilities. This example avoids converting column visibility to string for each key/value. This example shows the source code for
VisibilitySummarizer
.public class VisibilitySummarizer extends CountingSummarizer<ByteSequence> { @Override protected UnaryOperator<ByteSequence> copier() { // ByteSequences are mutable, so override and provide a copy function return ArrayByteSequence::new; } @Override protected Converter<ByteSequence> converter() { return (key, val, consumer) -> consumer.accept(key.getColumnVisibilityData()); } }
- Since:
- 2.0.0
- See Also:
CounterSummary
-
-
Nested Class Summary
Nested Classes Modifier and Type Class Description static interface
CountingSummarizer.Converter<K>
A function that converts key values to zero or more counter objects.-
Nested classes/interfaces inherited from interface org.apache.accumulo.core.client.summary.Summarizer
Summarizer.Collector, Summarizer.Combiner, Summarizer.StatisticConsumer
-
-
Field Summary
Fields Modifier and Type Field Description static String
COUNTER_STAT_PREFIX
This prefixes all counters when emitting statistics inSummarizer.Collector.summarize(Summarizer.StatisticConsumer)
.static String
DELETES_IGNORED_STAT
This is the name of the statistic that tracks the total number of deleted keys seen.static String
EMITTED_STAT
This is the name of the statistic that tracks the total number of counter objects emitted by theCountingSummarizer.Converter
.static String
INGNORE_DELETES_DEFAULT
static String
INGNORE_DELETES_OPT
A configuration option to determine if delete keys should be counted.static String
MAX_CKL_DEFAULT
static String
MAX_COUNTER_DEFAULT
static String
MAX_COUNTER_LEN_OPT
A configuration option for specifying the maximum length of an individual counter key.static String
MAX_COUNTERS_OPT
A configuration option for specifying the maximum number of unique counters an instance of this summarizer should track.static String
SEEN_STAT
This tracks the total number of key/values seen by theSummarizer.Collector
static String
TOO_LONG_STAT
This is the name of the statistic that tracks how many counter objects were ignored because they were too long.static String
TOO_MANY_STAT
This is the name of the statistic that tracks how many counters objects were ignored because the number of unique counters was exceeded.
-
Constructor Summary
Constructors Constructor Description CountingSummarizer()
-
Method Summary
All Methods Instance Methods Abstract Methods Concrete Methods Modifier and Type Method Description Summarizer.Collector
collector(SummarizerConfiguration sc)
Factory method that creates aSummarizer.Collector
based on configuration.Summarizer.Combiner
combiner(SummarizerConfiguration sc)
Factory method that creates aSummarizer.Combiner
.protected abstract CountingSummarizer.Converter<K>
converter()
protected UnaryOperator<K>
copier()
Override this if your key type is mutable and subject to change.protected Function<K,String>
encoder()
-
-
-
Field Detail
-
MAX_COUNTERS_OPT
public static final String MAX_COUNTERS_OPT
A configuration option for specifying the maximum number of unique counters an instance of this summarizer should track. If not specified, a default of "1024" will be used.- See Also:
- Constant Field Values
-
MAX_COUNTER_LEN_OPT
public static final String MAX_COUNTER_LEN_OPT
A configuration option for specifying the maximum length of an individual counter key. If not specified, a default of "128" will be used.- See Also:
- Constant Field Values
-
INGNORE_DELETES_OPT
public static final String INGNORE_DELETES_OPT
A configuration option to determine if delete keys should be counted. If set to true then delete keys will not be passed to theCountingSummarizer.Converter
and the statistic "deletesIgnored" will track the number of deleted ignored. This options defaults to "true".- See Also:
- Constant Field Values
-
COUNTER_STAT_PREFIX
public static final String COUNTER_STAT_PREFIX
This prefixes all counters when emitting statistics inSummarizer.Collector.summarize(Summarizer.StatisticConsumer)
.- See Also:
- Constant Field Values
-
TOO_MANY_STAT
public static final String TOO_MANY_STAT
This is the name of the statistic that tracks how many counters objects were ignored because the number of unique counters was exceeded. The max number of unique counters is specified byMAX_COUNTERS_OPT
.- See Also:
- Constant Field Values
-
TOO_LONG_STAT
public static final String TOO_LONG_STAT
This is the name of the statistic that tracks how many counter objects were ignored because they were too long. The maximum length is specified byMAX_COUNTER_LEN_OPT
.- See Also:
- Constant Field Values
-
EMITTED_STAT
public static final String EMITTED_STAT
This is the name of the statistic that tracks the total number of counter objects emitted by theCountingSummarizer.Converter
. This includes emitted Counter objects that were ignored.- See Also:
- Constant Field Values
-
DELETES_IGNORED_STAT
public static final String DELETES_IGNORED_STAT
This is the name of the statistic that tracks the total number of deleted keys seen. This statistic is only incremented when the "ignoreDeletes" option is set to true.- See Also:
- Constant Field Values
-
SEEN_STAT
public static final String SEEN_STAT
This tracks the total number of key/values seen by theSummarizer.Collector
- See Also:
- Constant Field Values
-
MAX_COUNTER_DEFAULT
public static final String MAX_COUNTER_DEFAULT
- See Also:
- Constant Field Values
-
MAX_CKL_DEFAULT
public static final String MAX_CKL_DEFAULT
- See Also:
- Constant Field Values
-
INGNORE_DELETES_DEFAULT
public static final String INGNORE_DELETES_DEFAULT
- See Also:
- Constant Field Values
-
-
Method Detail
-
converter
protected abstract CountingSummarizer.Converter<K> converter()
- Returns:
- A function that is used to convert each key value to zero or more counter objects. Each function returned should be independent.
-
encoder
protected Function<K,String> encoder()
- Returns:
- A function that is used to convert counter objects to String. The default function
calls
Object.toString()
on the counter object.
-
copier
protected UnaryOperator<K> copier()
Override this if your key type is mutable and subject to change.- Returns:
- a function that used to copy the counter object. This function is only used when the
collector has never seen the counter object before. In this case the collector needs to
possibly copy the counter object before using as map key. The default implementation is
the
UnaryOperator.identity()
function.
-
collector
public Summarizer.Collector collector(SummarizerConfiguration sc)
Description copied from interface:Summarizer
Factory method that creates aSummarizer.Collector
based on configuration. EachSummarizer.Collector
created by this method should be independent and have its own internal state. Accumulo uses a Collector to generate summary statistics about a sequence of key values written to a file.- Specified by:
collector
in interfaceSummarizer
-
combiner
public Summarizer.Combiner combiner(SummarizerConfiguration sc)
Description copied from interface:Summarizer
Factory method that creates aSummarizer.Combiner
. Accumulo will only use the created Combiner to merge data fromSummarizer.Collector
s created using the sameSummarizerConfiguration
.- Specified by:
combiner
in interfaceSummarizer
-
-