The ratio of between-groups to within-groups sum of squares is a univariate
feature ranking metric, which can be used as a feature selection criterion
for multi-class classification problems. For each variable j, this ratio is
BSS(j) / WSS(j) = ΣI(y
i = k)(x
kj - x
·j)
2 / ΣI(y
i = k)(x
ij - x
kj)
2;
where x
·j denotes the average of variable j across all
samples, x
kj denotes the average of variable j across samples
belonging to class k, and x
ij is the value of variable j of sample i.
Clearly, features with larger sum squares ratios are better for classification.
References
- S. Dudoit, J. Fridlyand and T. Speed. Comparison of discrimination methods for the classification of tumors using gene expression data. J Am Stat Assoc, 97:77-87, 2002.