org.apache.hadoop.mapred.lib
Class InputSampler.IntervalSampler<K,V>

java.lang.Object
  extended by org.apache.hadoop.mapreduce.lib.partition.InputSampler.IntervalSampler<K,V>
      extended by org.apache.hadoop.mapred.lib.InputSampler.IntervalSampler<K,V>
All Implemented Interfaces:
InputSampler.Sampler<K,V>
Enclosing class:
InputSampler<K,V>

public static class InputSampler.IntervalSampler<K,V>
extends InputSampler.IntervalSampler<K,V>
implements InputSampler.Sampler<K,V>

Sample from s splits at regular intervals. Useful for sorted data.


Field Summary
 
Fields inherited from class org.apache.hadoop.mapreduce.lib.partition.InputSampler.IntervalSampler
freq, maxSplitsSampled
 
Constructor Summary
InputSampler.IntervalSampler(double freq)
          Create a new IntervalSampler sampling all splits.
InputSampler.IntervalSampler(double freq, int maxSplitsSampled)
          Create a new IntervalSampler.
 
Method Summary
 K[] getSample(InputFormat<K,V> inf, JobConf job)
          For each split sampled, emit when the ratio of the number of records retained to the total record count is less than the specified frequency.
 
Methods inherited from class org.apache.hadoop.mapreduce.lib.partition.InputSampler.IntervalSampler
getSample
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 
Methods inherited from interface org.apache.hadoop.mapreduce.lib.partition.InputSampler.Sampler
getSample
 

Constructor Detail

InputSampler.IntervalSampler

public InputSampler.IntervalSampler(double freq)
Create a new IntervalSampler sampling all splits.

Parameters:
freq - The frequency with which records will be emitted.

InputSampler.IntervalSampler

public InputSampler.IntervalSampler(double freq,
                                    int maxSplitsSampled)
Create a new IntervalSampler.

Parameters:
freq - The frequency with which records will be emitted.
maxSplitsSampled - The maximum number of splits to examine.
See Also:
getSample(org.apache.hadoop.mapred.InputFormat, org.apache.hadoop.mapred.JobConf)
Method Detail

getSample

public K[] getSample(InputFormat<K,V> inf,
                     JobConf job)
              throws IOException
For each split sampled, emit when the ratio of the number of records retained to the total record count is less than the specified frequency.

Throws:
IOException


Copyright © 2013 Apache Software Foundation. All Rights Reserved.