org.apache.spark.ml.automl.feature
Configuration of the Parameter for handling invalid entries in a previously modeled feature column.
Configuration of the Parameter for handling invalid entries in a previously modeled feature column.
Setter for whether to allow for unseen indexed nominal values to be used in the transformation of a dataset with the generated BinaryEncoderModel.
Setter for whether to allow for unseen indexed nominal values to be used in the transformation of a dataset with the generated BinaryEncoderModel.
The setting to be used: either 'keep' or 'error'
0.5.3
Default: 'error' optional settings: 'keep' or 'error'
Setter for specifying the column names in Array format for the columns intended to be Binary Indexed.
Setter for specifying the column names in Array format for the columns intended to be Binary Indexed.
Array of column names
0.5.3
Setter for specifying the desired output columns in Array format for the columns to be generated as Breeze DenseVectors when the model is used to transform a dataset
Setter for specifying the desired output columns in Array format for the columns to be generated as Breeze DenseVectors when the model is used to transform a dataset
Array of output column names
0.5.3
the index position relationship between setInputCols and setOutputCols is a 1 to 1 relationship. The positional order and length must be congruent and match.
Main transformation method that will apply the model's configured encoding through a udf to the input dataset and add encoded columns.
Main transformation method that will apply the model's configured encoding through a udf to the input dataset and add encoded columns.
input dataset for the model to mutate
a DataFrame with added BinaryEncoded columns
0.5.3
Method for mutating the dataset schema to support the addition of BinaryEncoded columns
Method for mutating the dataset schema to support the addition of BinaryEncoded columns
the schema of the dataset
0.5.3
Method for validating the resultant schema from the application of building and transforming using this encoder package.
Method for validating the resultant schema from the application of building and transforming using this encoder package. The purpose of validation is to ensure that the supplied input columns are of the correct binary or nominal (ordinal numeric) type and that the output columns will contain the correct number of columns based on the configuration set.
The schema of the dataset supplied for training of the model or used in transforming using the model
Boolean flag for whether to allow for an additional binary encoding value to be used for any values that were unknown at the time of model training, which will summarily be converted to a 'max binary value' of the encoding length + 1 with maximum n * "1" values.
StructType that represents the transformed schema with additional output columns appended to the dataset structure.
0.5.3
UnsupportedOperationException
if the configured input cols and output cols do not match one another in
length.