Create a new TopNOneHotEncoder instance.
Create a new TopNOneHotEncoder instance.
number of items to keep track of
one-sided error bound on the error of each point query, i.e. frequency estimate
a bound on the probability that a query estimate does not lie within some small
interval (an interval that depends on eps
) around the truth
a seed to initialize the random number generator used to create the pairwise independent hash functions
whether to indicate to encode items outside of the top n set as
unknown
Transform a collection of categorical features to binary columns, with at most a single one-value. Only the top N items are tracked.
The list of top N is estimated with Algebird's SketchMap data structure. With probability at least
1 - delta
, this estimate is withineps * N
of the true frequency (i.e.,true frequency <= estimate <= true frequency + eps * N
), where N is the total size of the input collection.Missing values are either transformed to zero vectors or encoded as
unknown
.