T - The type of the sampler.@Internal public class ReservoirSamplerWithoutReplacement<T> extends DistributedRandomSampler<T>
DistributedRandomSampler interface. In
the first phase, we generate random numbers as the weights for each element and select top K
elements as the output of each partitions. In the second phase, we select top K elements from all
the outputs of the first phase.
This implementation refers to the algorithm described in "Optimal Random Sampling from Distributed Streams Revisited".
emptyIntermediateIterable, numSamplesemptyIterable, EPSILON| Constructor and Description |
|---|
ReservoirSamplerWithoutReplacement(int numSamples)
Create a new sampler with reservoir size and a default random number generator.
|
ReservoirSamplerWithoutReplacement(int numSamples,
long seed)
Create a new sampler with reservoir size and the seed for random number generator.
|
ReservoirSamplerWithoutReplacement(int numSamples,
Random random)
Create a new sampler with reservoir size and a supplied random number generator.
|
| Modifier and Type | Method and Description |
|---|---|
Iterator<IntermediateSampleData<T>> |
sampleInPartition(Iterator<T> input)
Sample algorithm for the first phase.
|
sample, sampleInCoordinatorpublic ReservoirSamplerWithoutReplacement(int numSamples,
Random random)
numSamples - Maximum number of samples to retain in reservoir, must be non-negative.random - Instance of random number generator for sampling.public ReservoirSamplerWithoutReplacement(int numSamples)
numSamples - Maximum number of samples to retain in reservoir, must be non-negative.public ReservoirSamplerWithoutReplacement(int numSamples,
long seed)
numSamples - Maximum number of samples to retain in reservoir, must be non-negative.seed - Random number generator seed.public Iterator<IntermediateSampleData<T>> sampleInPartition(Iterator<T> input)
DistributedRandomSamplersampleInPartition in class DistributedRandomSampler<T>input - The DataSet input of each partition.Copyright © 2014–2022 The Apache Software Foundation. All rights reserved.