org.apache.hadoop.mapreduce.lib.partition
Class InputSampler.SplitSampler<K,V>

java.lang.Object
  extended by org.apache.hadoop.mapreduce.lib.partition.InputSampler.SplitSampler<K,V>
All Implemented Interfaces:
InputSampler.Sampler<K,V>
Direct Known Subclasses:
InputSampler.SplitSampler
Enclosing class:
InputSampler<K,V>

public static class InputSampler.SplitSampler<K,V>
extends Object
implements InputSampler.Sampler<K,V>

Samples the first n records from s splits. Inexpensive way to sample random data.


Field Summary
protected  int maxSplitsSampled
           
protected  int numSamples
           
 
Constructor Summary
InputSampler.SplitSampler(int numSamples)
          Create a SplitSampler sampling all splits.
InputSampler.SplitSampler(int numSamples, int maxSplitsSampled)
          Create a new SplitSampler.
 
Method Summary
 K[] getSample(InputFormat<K,V> inf, Job job)
          From each split sampled, take the first numSamples / numSplits records.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

numSamples

protected final int numSamples

maxSplitsSampled

protected final int maxSplitsSampled
Constructor Detail

InputSampler.SplitSampler

public InputSampler.SplitSampler(int numSamples)
Create a SplitSampler sampling all splits. Takes the first numSamples / numSplits records from each split.

Parameters:
numSamples - Total number of samples to obtain from all selected splits.

InputSampler.SplitSampler

public InputSampler.SplitSampler(int numSamples,
                                 int maxSplitsSampled)
Create a new SplitSampler.

Parameters:
numSamples - Total number of samples to obtain from all selected splits.
maxSplitsSampled - The maximum number of splits to examine.
Method Detail

getSample

public K[] getSample(InputFormat<K,V> inf,
                     Job job)
              throws IOException,
                     InterruptedException
From each split sampled, take the first numSamples / numSplits records.

Specified by:
getSample in interface InputSampler.Sampler<K,V>
Throws:
IOException
InterruptedException


Copyright © 2013 Apache Software Foundation. All Rights Reserved.