T - The type of sample.@Internal public class ReservoirSamplerWithReplacement<T> extends DistributedRandomSampler<T>
ReservoirSamplerWithoutReplacement. The main
 difference is that, in the first phase, we generate weights for each element K times, so that
 each element can get selected multiple times.
 This implementation refers to the algorithm described in "Optimal Random Sampling from Distributed Streams Revisited".
emptyIntermediateIterable, numSamplesemptyIterable, EPSILON| Constructor and Description | 
|---|
| ReservoirSamplerWithReplacement(int numSamples)Create a sampler with fixed sample size and default random number generator. | 
| ReservoirSamplerWithReplacement(int numSamples,
                               long seed)Create a sampler with fixed sample size and random number generator seed. | 
| ReservoirSamplerWithReplacement(int numSamples,
                               Random random)Create a sampler with fixed sample size and random number generator. | 
| Modifier and Type | Method and Description | 
|---|---|
| Iterator<IntermediateSampleData<T>> | sampleInPartition(Iterator<T> input)Sample algorithm for the first phase. | 
sample, sampleInCoordinatorpublic ReservoirSamplerWithReplacement(int numSamples)
numSamples - Number of selected elements, must be non-negative.public ReservoirSamplerWithReplacement(int numSamples,
                                       long seed)
numSamples - Number of selected elements, must be non-negative.seed - Random number generator seedpublic ReservoirSamplerWithReplacement(int numSamples,
                                       Random random)
numSamples - Number of selected elements, must be non-negative.random - Random number generatorpublic Iterator<IntermediateSampleData<T>> sampleInPartition(Iterator<T> input)
DistributedRandomSamplersampleInPartition in class DistributedRandomSampler<T>input - The DataSet input of each partition.Copyright © 2014–2021 The Apache Software Foundation. All rights reserved.