org.apache.hadoop.mapred.lib
Class InputSampler.RandomSampler<K,V>

java.lang.Object
  extended by org.apache.hadoop.mapreduce.lib.partition.InputSampler.RandomSampler<K,V>
      extended by org.apache.hadoop.mapred.lib.InputSampler.RandomSampler<K,V>
All Implemented Interfaces:
InputSampler.Sampler<K,V>
Enclosing class:
InputSampler<K,V>

public static class InputSampler.RandomSampler<K,V>
extends InputSampler.RandomSampler<K,V>
implements InputSampler.Sampler<K,V>

Sample from random points in the input. General-purpose sampler. Takes numSamples / maxSplitsSampled inputs from each split.


Field Summary
 
Fields inherited from class org.apache.hadoop.mapreduce.lib.partition.InputSampler.RandomSampler
freq, maxSplitsSampled, numSamples
 
Constructor Summary
InputSampler.RandomSampler(double freq, int numSamples)
          Create a new RandomSampler sampling all splits.
InputSampler.RandomSampler(double freq, int numSamples, int maxSplitsSampled)
          Create a new RandomSampler.
 
Method Summary
 K[] getSample(InputFormat<K,V> inf, JobConf job)
          Randomize the split order, then take the specified number of keys from each split sampled, where each key is selected with the specified probability and possibly replaced by a subsequently selected key when the quota of keys from that split is satisfied.
 
Methods inherited from class org.apache.hadoop.mapreduce.lib.partition.InputSampler.RandomSampler
getSample
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 
Methods inherited from interface org.apache.hadoop.mapreduce.lib.partition.InputSampler.Sampler
getSample
 

Constructor Detail

InputSampler.RandomSampler

public InputSampler.RandomSampler(double freq,
                                  int numSamples)
Create a new RandomSampler sampling all splits. This will read every split at the client, which is very expensive.

Parameters:
freq - Probability with which a key will be chosen.
numSamples - Total number of samples to obtain from all selected splits.

InputSampler.RandomSampler

public InputSampler.RandomSampler(double freq,
                                  int numSamples,
                                  int maxSplitsSampled)
Create a new RandomSampler.

Parameters:
freq - Probability with which a key will be chosen.
numSamples - Total number of samples to obtain from all selected splits.
maxSplitsSampled - The maximum number of splits to examine.
Method Detail

getSample

public K[] getSample(InputFormat<K,V> inf,
                     JobConf job)
              throws IOException
Randomize the split order, then take the specified number of keys from each split sampled, where each key is selected with the specified probability and possibly replaced by a subsequently selected key when the quota of keys from that split is satisfied.

Throws:
IOException


Copyright © 2013 Apache Software Foundation. All Rights Reserved.