T
- The type of the sampler.@Internal public class ReservoirSamplerWithoutReplacement<T> extends DistributedRandomSampler<T>
DistributedRandomSampler
interface. In the first phase, we generate random numbers as the weights for each element and
select top K elements as the output of each partitions. In the second phase, we select top K
elements from all the outputs of the first phase.
This implementation refers to the algorithm described in
"Optimal Random Sampling from Distributed Streams Revisited".EMPTY_INTERMEDIATE_ITERABLE, numSamples
EMPTY_ITERABLE, EPSILON
Constructor and Description |
---|
ReservoirSamplerWithoutReplacement(int numSamples)
Create a new sampler with reservoir size and a default random number generator.
|
ReservoirSamplerWithoutReplacement(int numSamples,
long seed)
Create a new sampler with reservoir size and the seed for random number generator.
|
ReservoirSamplerWithoutReplacement(int numSamples,
Random random)
Create a new sampler with reservoir size and a supplied random number generator.
|
Modifier and Type | Method and Description |
---|---|
Iterator<IntermediateSampleData<T>> |
sampleInPartition(Iterator<T> input)
Sample algorithm for the first phase.
|
sample, sampleInCoordinator
public ReservoirSamplerWithoutReplacement(int numSamples, Random random)
numSamples
- Maximum number of samples to retain in reservoir, must be non-negative.random
- Instance of random number generator for sampling.public ReservoirSamplerWithoutReplacement(int numSamples)
numSamples
- Maximum number of samples to retain in reservoir, must be non-negative.public ReservoirSamplerWithoutReplacement(int numSamples, long seed)
numSamples
- Maximum number of samples to retain in reservoir, must be non-negative.seed
- Random number generator seed.public Iterator<IntermediateSampleData<T>> sampleInPartition(Iterator<T> input)
DistributedRandomSampler
sampleInPartition
in class DistributedRandomSampler<T>
input
- The DataSet input of each partition.Copyright © 2014–2016 The Apache Software Foundation. All rights reserved.