org.apache.hadoop.mapreduce.lib.partition
Class InputSampler.RandomSampler<K,V>
java.lang.Object
org.apache.hadoop.mapreduce.lib.partition.InputSampler.RandomSampler<K,V>
- All Implemented Interfaces:
- InputSampler.Sampler<K,V>
- Direct Known Subclasses:
- InputSampler.RandomSampler
- Enclosing class:
- InputSampler<K,V>
public static class InputSampler.RandomSampler<K,V>
- extends Object
- implements InputSampler.Sampler<K,V>
Sample from random points in the input.
General-purpose sampler. Takes numSamples / maxSplitsSampled inputs from
each split.
Method Summary |
K[] |
getSample(InputFormat<K,V> inf,
Job job)
Randomize the split order, then take the specified number of keys from
each split sampled, where each key is selected with the specified
probability and possibly replaced by a subsequently selected key when
the quota of keys from that split is satisfied. |
Methods inherited from class java.lang.Object |
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
freq
protected double freq
numSamples
protected final int numSamples
maxSplitsSampled
protected final int maxSplitsSampled
InputSampler.RandomSampler
public InputSampler.RandomSampler(double freq,
int numSamples)
- Create a new RandomSampler sampling all splits.
This will read every split at the client, which is very expensive.
- Parameters:
freq
- Probability with which a key will be chosen.numSamples
- Total number of samples to obtain from all selected
splits.
InputSampler.RandomSampler
public InputSampler.RandomSampler(double freq,
int numSamples,
int maxSplitsSampled)
- Create a new RandomSampler.
- Parameters:
freq
- Probability with which a key will be chosen.numSamples
- Total number of samples to obtain from all selected
splits.maxSplitsSampled
- The maximum number of splits to examine.
getSample
public K[] getSample(InputFormat<K,V> inf,
Job job)
throws IOException,
InterruptedException
- Randomize the split order, then take the specified number of keys from
each split sampled, where each key is selected with the specified
probability and possibly replaced by a subsequently selected key when
the quota of keys from that split is satisfied.
- Specified by:
getSample
in interface InputSampler.Sampler<K,V>
- Throws:
IOException
InterruptedException
Copyright © 2013 Apache Software Foundation. All Rights Reserved.