Package smile.feature.imputation
Class SimpleImputer
java.lang.Object
smile.feature.imputation.SimpleImputer
- All Implemented Interfaces:
Serializable
,Function<smile.data.Tuple,
,smile.data.Tuple> smile.data.transform.Transform
Simple algorithm replaces missing values with the constant value
along each column.
- See Also:
-
Constructor Summary
Constructors -
Method Summary
Modifier and TypeMethodDescriptionsmile.data.DataFrame
apply
(smile.data.DataFrame data) smile.data.Tuple
apply
(smile.data.Tuple x) static SimpleImputer
Fits the missing value imputation values.static SimpleImputer
Fits the missing value imputation values.static boolean
hasMissing
(smile.data.Tuple x) Return true if the tuple x has missing values.static double[][]
impute
(double[][] data) Impute the missing values with column averages.toString()
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait
Methods inherited from interface smile.data.transform.Transform
andThen, compose
-
Constructor Details
-
SimpleImputer
Constructor.- Parameters:
values
- the map of column name to the constant value.
-
-
Method Details
-
hasMissing
public static boolean hasMissing(smile.data.Tuple x) Return true if the tuple x has missing values.- Parameters:
x
- a tuple.- Returns:
- true if the tuple x has missing values.
-
apply
public smile.data.Tuple apply(smile.data.Tuple x) -
apply
public smile.data.DataFrame apply(smile.data.DataFrame data) - Specified by:
apply
in interfacesmile.data.transform.Transform
-
toString
-
fit
Fits the missing value imputation values. Impute all the numeric columns with median, boolean/nominal columns with mode, and text columns with empty string.- Parameters:
data
- the training data.columns
- the columns to impute. If empty, impute all the applicable columns.- Returns:
- the imputer.
-
fit
public static SimpleImputer fit(smile.data.DataFrame data, double lower, double upper, String... columns) Fits the missing value imputation values. Impute all the numeric columns with the mean of values in the range [lower, upper], boolean/nominal columns with mode, and text columns with empty string.- Parameters:
data
- the training data.lower
- the lower limit in terms of percentiles of the original distribution (e.g. 5th percentile).upper
- the upper limit in terms of percentiles of the original distribution (e.g. 95th percentile).columns
- the columns to impute. If empty, impute all the applicable columns.- Returns:
- the imputer.
-
impute
public static double[][] impute(double[][] data) Impute the missing values with column averages.- Parameters:
data
- data with missing values.- Returns:
- the imputed data.
- Throws:
IllegalArgumentException
- when the whole row or column is missing.
-