com.databricks.labs.automl.utils.data
Method for filtering out any fields that are above a certain cardinality threshold to protect against creating unmanageably large feature vectors or computationally extreme StringIndexed values
Method for filtering out any fields that are above a certain cardinality threshold to protect against creating unmanageably large feature vectors or computationally extreme StringIndexed values
Fields to validate cardinality for
The mode of cardinality checking [either "approx" for approximate distinct or "exact"]
The limitation above which any field's cardinality will cause the field to be culled from the collection of fields to perform an operation on
The precision set point for approx_distinct calculations for expected high cardinality fields or large data sets.
Array[String] of column names whose cardinality is below the threshold specified by cardinalityLimit
0.5.2
Validation method for ensuring that the fields specified have a cardinality below a set threshold
Validation method for ensuring that the fields specified have a cardinality below a set threshold
Fields to test as an Array of Column Names
The type of distinct check to perform to calculate the cardinality [either 'exact' or 'approx']
The limit, above which, the check will fail.
0.5.2
AssertionError
if the cardinality of a field exceeds the threshold