AdaBoost

java.lang.Object
- smile.classification.AdaBoost

All Implemented Interfaces:

java.io.Serializable, java.util.function.ToDoubleFunction<smile.data.Tuple>, java.util.function.ToIntFunction<smile.data.Tuple>, Classifier<smile.data.Tuple>, DataFrameClassifier, SoftClassifier<smile.data.Tuple>, SHAP<smile.data.Tuple>, TreeSHAP
```
public class AdaBoost
extends java.lang.Object
implements SoftClassifier<smile.data.Tuple>, DataFrameClassifier, TreeSHAP
```
AdaBoost (Adaptive Boosting) classifier with decision trees. In principle, AdaBoost is a meta-algorithm, and can be used in conjunction with many other learning algorithms to improve their performance. In practice, AdaBoost with decision trees is probably the most popular combination. AdaBoost is adaptive in the sense that subsequent classifiers built are tweaked in favor of those instances misclassified by previous classifiers. AdaBoost is sensitive to noisy data and outliers. However in some problems it can be less susceptible to the over-fitting problem than most learning algorithms.
AdaBoost calls a weak classifier repeatedly in a series of rounds from total T classifiers. For each call a distribution of weights is updated that indicates the importance of examples in the data set for the classification. On each round, the weights of each incorrectly classified example are increased (or alternatively, the weights of each correctly classified example are decreased), so that the new classifier focuses more on those examples.
The basic AdaBoost algorithm is only for binary classification problem. For multi-class classification, a common approach is reducing the multi-class classification problem to multiple two-class problems. This implementation is a multi-class AdaBoost without such reductions.
References
1. Yoav Freund, Robert E. Schapire. A Decision-Theoretic Generalization of on-Line Learning and an Application to Boosting, 1995.
2. Ji Zhu, Hui Zhou, Saharon Rosset and Trevor Hastie. Multi-class Adaboost, 2009.
See Also:

Serialized Form

Constructor Summary

Constructors
Constructor and Description
`AdaBoost(smile.data.formula.Formula formula, int k, DecisionTree[] trees, double[] alpha, double[] error, double[] importance)` Constructor.
`AdaBoost(smile.data.formula.Formula formula, int k, DecisionTree[] trees, double[] alpha, double[] error, double[] importance, smile.util.IntSet labels)` Constructor.

Method Summary

All Methods Static Methods Instance Methods Concrete Methods
Modifier and Type	Method and Description
`static AdaBoost`	`fit(smile.data.formula.Formula formula, smile.data.DataFrame data)` Fits a AdaBoost model.
`static AdaBoost`	`fit(smile.data.formula.Formula formula, smile.data.DataFrame data, int ntrees, int maxDepth, int maxNodes, int nodeSize)` Fits a AdaBoost model.
`static AdaBoost`	`fit(smile.data.formula.Formula formula, smile.data.DataFrame data, java.util.Properties prop)` Fits a AdaBoost model.
`smile.data.formula.Formula`	`formula()` Returns the formula associated with the model.
`double[]`	`importance()` Returns the variable importance.
`int`	`predict(smile.data.Tuple x)` Predicts the class label of an instance.
`int`	`predict(smile.data.Tuple x, double[] posteriori)` Predicts the class label of an instance and also calculate a posteriori probabilities.
`smile.data.type.StructType`	`schema()` Returns the design matrix schema.
`int`	`size()` Returns the number of trees in the model.
`int[][]`	`test(smile.data.DataFrame data)` Test the model on a validation dataset.
`DecisionTree[]`	`trees()` Returns the decision trees.
`void`	`trim(int ntrees)` Trims the tree model set to a smaller size in case of over-fitting.

Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

Methods inherited from interface smile.classification.Classifier
applyAsDouble, applyAsInt, f, predict

Methods inherited from interface smile.classification.DataFrameClassifier
predict

Methods inherited from interface smile.feature.TreeSHAP
shap, shap

Methods inherited from interface smile.feature.SHAP
shap

- Constructor Detail
  - AdaBoost
```
public AdaBoost(smile.data.formula.Formula formula,
                int k,
                DecisionTree[] trees,
                double[] alpha,
                double[] error,
                double[] importance)
```
    Constructor.
    
    Parameters:
    
    formula - a symbolic description of the model to be fitted.
    
    k - the number of classes.
    
    trees - forest of decision trees.
    
    alpha - the weight of each decision tree.
    
    error - the weighted error of each decision tree during training.
    
    importance - variable importance
  - AdaBoost
```
public AdaBoost(smile.data.formula.Formula formula,
                int k,
                DecisionTree[] trees,
                double[] alpha,
                double[] error,
                double[] importance,
                smile.util.IntSet labels)
```
    Constructor.
    
    Parameters:
    
    formula - a symbolic description of the model to be fitted.
    
    k - the number of classes.
    
    trees - forest of decision trees.
    
    alpha - the weight of each decision tree.
    
    error - the weighted error of each decision tree during training.
    
    importance - variable importance
    
    labels - class labels
- Method Detail
  - fit
```
public static AdaBoost fit(smile.data.formula.Formula formula,
                           smile.data.DataFrame data)
```
    Fits a AdaBoost model.
    
    Parameters:
    
    formula - a symbolic description of the model to be fitted.
    
    data - the data frame of the explanatory and response variables.
  - fit
```
public static AdaBoost fit(smile.data.formula.Formula formula,
                           smile.data.DataFrame data,
                           java.util.Properties prop)
```
    Fits a AdaBoost model.
    
    Parameters:
    
    formula - a symbolic description of the model to be fitted.
    
    data - the data frame of the explanatory and response variables.
  - fit
```
public static AdaBoost fit(smile.data.formula.Formula formula,
                           smile.data.DataFrame data,
                           int ntrees,
                           int maxDepth,
                           int maxNodes,
                           int nodeSize)
```
    Fits a AdaBoost model.
    
    Parameters:
    
    formula - a symbolic description of the model to be fitted.
    
    data - the data frame of the explanatory and response variables.
    
    ntrees - the number of trees.
    
    maxDepth - the maximum depth of the tree.
    
    maxNodes - the maximum number of leaf nodes in the tree.
    
    nodeSize - the number of instances in a node below which the tree will not split, setting nodeSize = 5 generally gives good results.
  - formula
```
public smile.data.formula.Formula formula()
```
    Description copied from interface: DataFrameClassifier
    
    Returns the formula associated with the model.
    
    Specified by:
    
    formula in interface DataFrameClassifier
    
    Specified by:
    
    formula in interface TreeSHAP
  - schema
```
public smile.data.type.StructType schema()
```
    Description copied from interface: DataFrameClassifier
    
    Returns the design matrix schema.
    
    Specified by:
    
    schema in interface DataFrameClassifier
  - importance
```
public double[] importance()
```
    Returns the variable importance. Every time a split of a node is made on variable the (GINI, information gain, etc.) impurity criterion for the two descendent nodes is less than the parent node. Adding up the decreases for each individual variable over all trees in the forest gives a simple measure of variable importance.
    
    Returns:
    
    the variable importance
  - size
```
public int size()
```
    Returns the number of trees in the model.
    
    Returns:
    
    the number of trees in the model
  - trees
```
public DecisionTree[] trees()
```
    Returns the decision trees.
    
    Specified by:
    
    trees in interface TreeSHAP
  - trim
```
public void trim(int ntrees)
```
    Trims the tree model set to a smaller size in case of over-fitting. Or if extra decision trees in the model don't improve the performance, we may remove them to reduce the model size and also improve the speed of prediction.
    
    Parameters:
    
    ntrees - the new (smaller) size of tree model set.
  - predict
```
public int predict(smile.data.Tuple x)
```
    Description copied from interface: Classifier
    
    Predicts the class label of an instance.
    
    Specified by:
    
    predict in interface Classifier<smile.data.Tuple>
    
    Specified by:
    
    predict in interface DataFrameClassifier
    
    Parameters:
    
    x - the instance to be classified.
    
    Returns:
    
    the predicted class label.
  - predict
```
public int predict(smile.data.Tuple x,
                   double[] posteriori)
```
    Predicts the class label of an instance and also calculate a posteriori probabilities. Not supported.
    
    Specified by:
    
    predict in interface SoftClassifier<smile.data.Tuple>
    
    Parameters:
    
    x - an instance to be classified.
    
    posteriori - the array to store a posteriori probabilities on output.
    
    Returns:
    
    the predicted class label
  - test
```
public int[][] test(smile.data.DataFrame data)
```
    Test the model on a validation dataset.
    
    Returns:
    
    the predictions with first 1, 2, ..., decision trees.

Class AdaBoost

References

Constructor Summary

Method Summary

Methods inherited from class java.lang.Object

Methods inherited from interface smile.classification.Classifier

Methods inherited from interface smile.classification.DataFrameClassifier

Methods inherited from interface smile.feature.TreeSHAP

Methods inherited from interface smile.feature.SHAP

Constructor Detail

AdaBoost

AdaBoost

Method Detail

fit

fit

fit

formula

schema

importance

size

trees

trim

predict

predict

test