Transform a collection of categorical features to binary columns, with at most a single
one-value. Similar to OneHotEncoder but uses MurmursHash3 to hash features into buckets to
reduce CPU and memory overhead.
Missing values are transformed to zero vectors.
If hashBucketSize is inferred with HLL, the estimate is scaled by sizeScalingFactor to reduce
the number of collisions.
Rough table of relationship of scaling factor to % collisions, measured from a corpus of 466544
English words:
Transform a collection of categorical features to binary columns, with at most a single one-value. Similar to OneHotEncoder but uses MurmursHash3 to hash features into buckets to reduce CPU and memory overhead.
Missing values are transformed to zero vectors.
If hashBucketSize is inferred with HLL, the estimate is scaled by sizeScalingFactor to reduce the number of collisions.
Rough table of relationship of scaling factor to % collisions, measured from a corpus of 466544 English words: