public class AdaBoost extends java.lang.Object implements SoftClassifier<smile.data.Tuple>, DataFrameClassifier, TreeSHAP
AdaBoost calls a weak classifier repeatedly in a series of rounds from total T classifiers. For each call a distribution of weights is updated that indicates the importance of examples in the data set for the classification. On each round, the weights of each incorrectly classified example are increased (or alternatively, the weights of each correctly classified example are decreased), so that the new classifier focuses more on those examples.
The basic AdaBoost algorithm is only for binary classification problem. For multi-class classification, a common approach is reducing the multi-class classification problem to multiple two-class problems. This implementation is a multi-class AdaBoost without such reductions.
Constructor and Description |
---|
AdaBoost(smile.data.formula.Formula formula,
int k,
DecisionTree[] trees,
double[] alpha,
double[] error,
double[] importance)
Constructor.
|
AdaBoost(smile.data.formula.Formula formula,
int k,
DecisionTree[] trees,
double[] alpha,
double[] error,
double[] importance,
smile.util.IntSet labels)
Constructor.
|
Modifier and Type | Method and Description |
---|---|
static AdaBoost |
fit(smile.data.formula.Formula formula,
smile.data.DataFrame data)
Fits a AdaBoost model.
|
static AdaBoost |
fit(smile.data.formula.Formula formula,
smile.data.DataFrame data,
int ntrees,
int maxDepth,
int maxNodes,
int nodeSize)
Fits a AdaBoost model.
|
static AdaBoost |
fit(smile.data.formula.Formula formula,
smile.data.DataFrame data,
java.util.Properties prop)
Fits a AdaBoost model.
|
smile.data.formula.Formula |
formula()
Returns the formula associated with the model.
|
double[] |
importance()
Returns the variable importance.
|
int |
predict(smile.data.Tuple x)
Predicts the class label of an instance.
|
int |
predict(smile.data.Tuple x,
double[] posteriori)
Predicts the class label of an instance and also calculate a posteriori
probabilities.
|
smile.data.type.StructType |
schema()
Returns the design matrix schema.
|
int |
size()
Returns the number of trees in the model.
|
int[][] |
test(smile.data.DataFrame data)
Test the model on a validation dataset.
|
DecisionTree[] |
trees()
Returns the decision trees.
|
void |
trim(int ntrees)
Trims the tree model set to a smaller size in case of over-fitting.
|
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
applyAsDouble, applyAsInt, f, predict
predict
public AdaBoost(smile.data.formula.Formula formula, int k, DecisionTree[] trees, double[] alpha, double[] error, double[] importance)
formula
- a symbolic description of the model to be fitted.k
- the number of classes.trees
- forest of decision trees.alpha
- the weight of each decision tree.error
- the weighted error of each decision tree during training.importance
- variable importancepublic AdaBoost(smile.data.formula.Formula formula, int k, DecisionTree[] trees, double[] alpha, double[] error, double[] importance, smile.util.IntSet labels)
formula
- a symbolic description of the model to be fitted.k
- the number of classes.trees
- forest of decision trees.alpha
- the weight of each decision tree.error
- the weighted error of each decision tree during training.importance
- variable importancelabels
- class labelspublic static AdaBoost fit(smile.data.formula.Formula formula, smile.data.DataFrame data)
formula
- a symbolic description of the model to be fitted.data
- the data frame of the explanatory and response variables.public static AdaBoost fit(smile.data.formula.Formula formula, smile.data.DataFrame data, java.util.Properties prop)
formula
- a symbolic description of the model to be fitted.data
- the data frame of the explanatory and response variables.public static AdaBoost fit(smile.data.formula.Formula formula, smile.data.DataFrame data, int ntrees, int maxDepth, int maxNodes, int nodeSize)
formula
- a symbolic description of the model to be fitted.data
- the data frame of the explanatory and response variables.ntrees
- the number of trees.maxDepth
- the maximum depth of the tree.maxNodes
- the maximum number of leaf nodes in the tree.nodeSize
- the number of instances in a node below which the tree will
not split, setting nodeSize = 5 generally gives good results.public smile.data.formula.Formula formula()
DataFrameClassifier
formula
in interface DataFrameClassifier
formula
in interface TreeSHAP
public smile.data.type.StructType schema()
DataFrameClassifier
schema
in interface DataFrameClassifier
public double[] importance()
public int size()
public DecisionTree[] trees()
public void trim(int ntrees)
ntrees
- the new (smaller) size of tree model set.public int predict(smile.data.Tuple x)
Classifier
predict
in interface Classifier<smile.data.Tuple>
predict
in interface DataFrameClassifier
x
- the instance to be classified.public int predict(smile.data.Tuple x, double[] posteriori)
predict
in interface SoftClassifier<smile.data.Tuple>
x
- an instance to be classified.posteriori
- the array to store a posteriori probabilities on output.public int[][] test(smile.data.DataFrame data)