T - The type of the sampler.@Internal public class ReservoirSamplerWithoutReplacement<T> extends DistributedRandomSampler<T>
DistributedRandomSampler
 interface. In the first phase, we generate random numbers as the weights for each element and
 select top K elements as the output of each partitions. In the second phase, we select top K
 elements from all the outputs of the first phase.
 This implementation refers to the algorithm described in "Optimal Random Sampling from Distributed Streams Revisited".
emptyIntermediateIterable, numSamplesemptyIterable, EPSILON| 构造器和说明 | 
|---|
| ReservoirSamplerWithoutReplacement(int numSamples)Create a new sampler with reservoir size and a default random number generator. | 
| ReservoirSamplerWithoutReplacement(int numSamples,
                                  long seed)Create a new sampler with reservoir size and the seed for random number generator. | 
| ReservoirSamplerWithoutReplacement(int numSamples,
                                  Random random)Create a new sampler with reservoir size and a supplied random number generator. | 
| 限定符和类型 | 方法和说明 | 
|---|---|
| Iterator<IntermediateSampleData<T>> | sampleInPartition(Iterator<T> input)Sample algorithm for the first phase. | 
sample, sampleInCoordinatorpublic ReservoirSamplerWithoutReplacement(int numSamples,
                                          Random random)
numSamples - Maximum number of samples to retain in reservoir, must be non-negative.random - Instance of random number generator for sampling.public ReservoirSamplerWithoutReplacement(int numSamples)
numSamples - Maximum number of samples to retain in reservoir, must be non-negative.public ReservoirSamplerWithoutReplacement(int numSamples,
                                          long seed)
numSamples - Maximum number of samples to retain in reservoir, must be non-negative.seed - Random number generator seed.public Iterator<IntermediateSampleData<T>> sampleInPartition(Iterator<T> input)
DistributedRandomSamplersampleInPartition 在类中 DistributedRandomSampler<T>input - The DataSet input of each partition.Copyright © 2014–2020 The Apache Software Foundation. All rights reserved.