org.apache.hadoop.mapred.lib
Class InputSampler.SplitSampler<K,V>
java.lang.Object
org.apache.hadoop.mapreduce.lib.partition.InputSampler.SplitSampler<K,V>
org.apache.hadoop.mapred.lib.InputSampler.SplitSampler<K,V>
- All Implemented Interfaces:
- InputSampler.Sampler<K,V>
- Enclosing class:
- InputSampler<K,V>
public static class InputSampler.SplitSampler<K,V>
- extends InputSampler.SplitSampler<K,V>
- implements InputSampler.Sampler<K,V>
Samples the first n records from s splits.
Inexpensive way to sample random data.
Methods inherited from class java.lang.Object |
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
InputSampler.SplitSampler
public InputSampler.SplitSampler(int numSamples)
- Create a SplitSampler sampling all splits.
Takes the first numSamples / numSplits records from each split.
- Parameters:
numSamples
- Total number of samples to obtain from all selected
splits.
InputSampler.SplitSampler
public InputSampler.SplitSampler(int numSamples,
int maxSplitsSampled)
- Create a new SplitSampler.
- Parameters:
numSamples
- Total number of samples to obtain from all selected
splits.maxSplitsSampled
- The maximum number of splits to examine.
getSample
public K[] getSample(InputFormat<K,V> inf,
JobConf job)
throws IOException
- From each split sampled, take the first numSamples / numSplits records.
- Throws:
IOException
Copyright © 2013 Apache Software Foundation. All Rights Reserved.