Param for how to handle invalid entries.
Param for how to handle invalid entries. Options are 'skip' (filter out rows with invalid values), 'error' (throw an error), or 'keep' (keep invalid values in a special additional bucket). Note that in the multiple columns case, the invalid handling is applied to all columns. That said for 'error' it will throw an error if any invalids are found in any column, for 'skip' it will skip rows with any invalids in any columns, etc. Default: "error"
Number of buckets (quantiles, or categories) into which data points are grouped.
Number of buckets (quantiles, or categories) into which data points are grouped. Must be greater than or equal to 2.
See also handleInvalid, which can optionally create an additional bucket for NaN values.
default: 2
Array of number of buckets (quantiles, or categories) into which data points are grouped.
Array of number of buckets (quantiles, or categories) into which data points are grouped. Each value must be greater than or equal to 2
See also handleInvalid, which can optionally create an additional bucket for NaN values.
Relative error (see documentation for
org.apache.spark.sql.DataFrameStatFunctions.approxQuantile
for description)
Must be in the range [0, 1].
Relative error (see documentation for
org.apache.spark.sql.DataFrameStatFunctions.approxQuantile
for description)
Must be in the range [0, 1].
Note that in multiple columns case, relative error is applied to all columns.
default: 0.001
Params for QuantileDiscretizer.