smile.feature

Interface Summary
Interface Description

FeatureGenerator<T>
Feature generator.

FeatureRanking
Univariate feature ranking metric.

SequenceFeature<T>
Sequence feature generator.

Interface Summary
Interface	Description
FeatureGenerator<T>	Feature generator.
FeatureRanking	Univariate feature ranking metric.
SequenceFeature<T>	Sequence feature generator.

Class Summary
Class	Description
Bag<T>	The bag-of-words feature of text used in natural language processing and information retrieval.
DateFeature	Date/time feature generator.
FeatureTransform	Feature transformation.
GAFeatureSelection	Genetic algorithm based feature selection.
MaxAbsScaler	Scales each feature by its maximum absolute value.
Normalizer	Normalize samples individually to unit norm.
OneHotEncoder	Encode categorical integer features using a one-hot aka one-of-K scheme.
RobustStandardizer	Robustly standardizes numeric feature by subtracting the median and dividing by the IQR.
Scaler	Scales all numeric variables into the range [0, 1].
SignalNoiseRatio	The signal-to-noise (S2N) metric ratio is a univariate feature ranking metric, which can be used as a feature selection criterion for binary classification problems.
SparseOneHotEncoder	Encode categorical integer features using sparse one-hot scheme.
Standardizer	Standardizes numeric feature to 0 mean and unit variance.
SumSquaresRatio	The ratio of between-groups to within-groups sum of squares is a univariate feature ranking metric, which can be used as a feature selection criterion for multi-class classification problems.
WinsorScaler	Scales all numeric variables into the range [0, 1].

Enum Summary
Enum Description

DateFeature.Type
The types of date/time features.

Normalizer.Norm
The types of data scaling.

Enum Summary
Enum	Description
DateFeature.Type	The types of date/time features.
Normalizer.Norm	The types of data scaling.

Package smile.feature Description

Feature generation, normalization and selection.

Feature generation (or constructive induction) studies methods that modify or enhance the representation of data objects. Feature generation techniques search for new features that describe the objects better than the attributes supplied with the training instances.

Many machine learning methods such as Neural Networks and SVM with Gaussian kernel also require the features properly scaled/standardized. For example, each variable is scaled into interval [0, 1] or to have mean 0 and standard deviation 1. Although some method such as decision trees can handle nominal variable directly, other methods generally require nominal variables converted to multiple binary dummy variables to indicate the presence or absence of a characteristic.

Feature selection is the technique of selecting a subset of relevant features for building robust learning models. By removing most irrelevant and redundant features from the data, feature selection helps improve the performance of learning models by alleviating the effect of the curse of dimensionality, enhancing generalization capability, speeding up learning process, etc. More importantly, feature selection also helps researchers to acquire better understanding about the data.

Feature selection algorithms typically fall into two categories: feature ranking and subset selection. Feature ranking ranks the features by a metric and eliminates all features that do not achieve an adequate score. Subset selection searches the set of possible features for the optimal subset. Clearly, an exhaustive search of optimal subset is impractical if large numbers of features are available. Commonly, heuristic methods such as genetic algorithms are employed for subset selection.