T - The type of sample.@Internal public class ReservoirSamplerWithReplacement<T> extends DistributedRandomSampler<T>
ReservoirSamplerWithoutReplacement. The main
difference is that, in the first phase, we generate weights for each element K times, so that
each element can get selected multiple times.
This implementation refers to the algorithm described in "Optimal Random Sampling from Distributed Streams Revisited".
emptyIntermediateIterable, numSamplesemptyIterable, EPSILON| 构造器和说明 |
|---|
ReservoirSamplerWithReplacement(int numSamples)
Create a sampler with fixed sample size and default random number generator.
|
ReservoirSamplerWithReplacement(int numSamples,
long seed)
Create a sampler with fixed sample size and random number generator seed.
|
ReservoirSamplerWithReplacement(int numSamples,
Random random)
Create a sampler with fixed sample size and random number generator.
|
| 限定符和类型 | 方法和说明 |
|---|---|
Iterator<IntermediateSampleData<T>> |
sampleInPartition(Iterator<T> input)
Sample algorithm for the first phase.
|
sample, sampleInCoordinatorpublic ReservoirSamplerWithReplacement(int numSamples)
numSamples - Number of selected elements, must be non-negative.public ReservoirSamplerWithReplacement(int numSamples,
long seed)
numSamples - Number of selected elements, must be non-negative.seed - Random number generator seedpublic ReservoirSamplerWithReplacement(int numSamples,
Random random)
numSamples - Number of selected elements, must be non-negative.random - Random number generatorpublic Iterator<IntermediateSampleData<T>> sampleInPartition(Iterator<T> input)
DistributedRandomSamplersampleInPartition 在类中 DistributedRandomSampler<T>input - The DataSet input of each partition.Copyright © 2014–2023 The Apache Software Foundation. All rights reserved.