A producer that can produce a VwDownsampledMultilabelRowCreator.
Perform the initial scramble.
Perform the initial scramble. This should be called once on the initial
seed prior to the first call to sampleCombination
.
an initial seed
a more scrambled seed.
Sample a k-combination from a population of n.
Sample a k-combination from a population of n.
This algorithm uses a linear congruential pseudorandom number generator (see Knuth) to perform reservoir sampling via "Algorithm R".
It is ~ O(n).
If n
≤ k
, then return 0, ..., n
- 1; otherwise, if k
< n
, the
returned array have length k
with values between 0 and n - 1
(inclusive)
but it is NOT guaranteed to be sorted.
NOTE: This is a pure function. It produces the same results as if
java.util.Random
was used to perform reservoir sampling but since it doesn't
carry state, this can be trivially operated in parallel with no locking or CAS
loop overhead. The consequence is that the seed
must be provided on every call
and a new seed will be returned as part of the output.
To get this function to act like java.util.Random
, the first time it is called, the
seed should be produce by running the desired seed through initSeedScramble
. For
instance:
val (kComb1, newSeed1) = sampleCombination(4, 2, initSeedScramble(0)) val (kComb2, newSeed2) = sampleCombination(4, 2, newSeed1)
For more information, see:
population size
combination size
the seed to use for random selection
a tuple 2 containing the array of 0-based indices representing the k-combination and a new random seed.