Transform a collection of categorical features to binary columns, with at most N one-values.
Similar to NHotEncoder but uses MurmursHash3 to hash features into buckets to reduce CPU
and memory overhead.
Missing values are transformed to [0.0, 0.0, ...].
If hashBucketSize is inferred with HLL, the estimate is scaled by sizeScalingFactor to reduce
the number of collisions.
Rough table of relationship of scaling factor to % collisions, measured from a corpus of 466544
English words:
Transform a collection of categorical features to binary columns, with at most N one-values. Similar to NHotEncoder but uses MurmursHash3 to hash features into buckets to reduce CPU and memory overhead.
Missing values are transformed to [0.0, 0.0, ...].
If hashBucketSize is inferred with HLL, the estimate is scaled by sizeScalingFactor to reduce the number of collisions.
Rough table of relationship of scaling factor to % collisions, measured from a corpus of 466544 English words: