public class GradientTreeBoost extends Object implements Regression<double[]>
Generic gradient boosting at the t-th step would fit a regression tree to pseudo-residuals. Let J be the number of its leaves. The tree partitions the input space into J disjoint regions and predicts a constant value in each region. The parameter J controls the maximum allowed level of interaction between variables in the model. With J = 2 (decision stumps), no interaction between variables is allowed. With J = 3 the model may include effects of the interaction between up to two variables, and so on. Hastie et al. comment that typically 4 ≤ J ≤ 8 work well for boosting and results are fairly insensitive to the choice of in this range, J = 2 is insufficient for many applications, and J > 10 is unlikely to be required.
Fitting the training set too closely can lead to degradation of the model's generalization ability. Several so-called regularization techniques reduce this over-fitting effect by constraining the fitting procedure. One natural regularization parameter is the number of gradient boosting iterations T (i.e. the number of trees in the model when the base learner is a decision tree). Increasing T reduces the error on training set, but setting it too high may lead to over-fitting. An optimal value of T is often selected by monitoring prediction error on a separate validation data set.
Another regularization approach is the shrinkage which times a parameter η (called the "learning rate") to update term. Empirically it has been found that using small learning rates (such as η < 0.1) yields dramatic improvements in model's generalization ability over gradient boosting without shrinking (η = 1). However, it comes at the price of increasing computational time both during training and prediction: lower learning rate requires more iterations.
Soon after the introduction of gradient boosting Friedman proposed a minor modification to the algorithm, motivated by Breiman's bagging method. Specifically, he proposed that at each iteration of the algorithm, a base learner should be fit on a subsample of the training set drawn at random without replacement. Friedman observed a substantional improvement in gradient boosting's accuracy with this modification.
Subsample size is some constant fraction f of the size of the training set. When f = 1, the algorithm is deterministic and identical to the one described above. Smaller values of f introduce randomness into the algorithm and help prevent over-fitting, acting as a kind of regularization. The algorithm also becomes faster, because regression trees have to be fit to smaller datasets at each iteration. Typically, f is set to 0.5, meaning that one half of the training set is used to build each base learner.
Also, like in bagging, sub-sampling allows one to define an out-of-bag estimate of the prediction performance improvement by evaluating predictions on those observations which were not used in the building of the next base learner. Out-of-bag estimates help avoid the need for an independent validation dataset, but often underestimate actual performance improvement and the optimal number of iterations.
Gradient tree boosting implementations often also use regularization by limiting the minimum number of observations in trees' terminal nodes. It's used in the tree building process by ignoring any splits that lead to nodes containing fewer than this number of training set instances. Imposing this limit helps to reduce variance in predictions at leaves.
Modifier and Type | Class and Description |
---|---|
static class |
GradientTreeBoost.Loss
Regression loss function.
|
static class |
GradientTreeBoost.Trainer
Trainer for GradientTreeBoost regression.
|
Constructor and Description |
---|
GradientTreeBoost(Attribute[] attributes,
double[][] x,
double[] y,
GradientTreeBoost.Loss loss,
int T,
int J,
double shrinkage,
double f)
Constructor.
|
GradientTreeBoost(Attribute[] attributes,
double[][] x,
double[] y,
int T)
Constructor.
|
GradientTreeBoost(double[][] x,
double[] y,
GradientTreeBoost.Loss loss,
int T,
int J,
double shrinkage,
double f)
Constructor.
|
GradientTreeBoost(double[][] x,
double[] y,
int T)
Constructor.
|
Modifier and Type | Method and Description |
---|---|
GradientTreeBoost.Loss |
getLossFunction()
Returns the loss function.
|
int |
getNumLeaves()
Returns the (maximum) number of leaves in decision tree.
|
double |
getSamplingRate()
Returns the sampling rate for stochastic gradient tree boosting.
|
double[] |
importance()
Returns the variable importance.
|
double |
predict(double[] x)
Predicts the dependent variable of an instance.
|
int |
size()
Returns the number of trees in the model.
|
double[] |
test(double[][] x,
double[] y)
Test the model on a validation dataset.
|
double[][] |
test(double[][] x,
double[] y,
RegressionMeasure[] measures)
Test the model on a validation dataset.
|
void |
trim(int T)
Trims the tree model set to a smaller size in case of over-fitting.
|
public GradientTreeBoost(double[][] x, double[] y, int T)
x
- the training instances.y
- the response variable.T
- the number of iterations (trees).public GradientTreeBoost(double[][] x, double[] y, GradientTreeBoost.Loss loss, int T, int J, double shrinkage, double f)
x
- the training instances.y
- the response variable.loss
- loss function for regression. By default, least absolute
deviation is employed for robust regression.T
- the number of iterations (trees).J
- the number of leaves in each tree.shrinkage
- the shrinkage parameter in (0, 1] controls the learning rate of procedure.f
- the sampling rate for stochastic tree boosting.public GradientTreeBoost(Attribute[] attributes, double[][] x, double[] y, int T)
attributes
- the attribute properties.x
- the training instances.y
- the response variable.T
- the number of iterations (trees).public GradientTreeBoost(Attribute[] attributes, double[][] x, double[] y, GradientTreeBoost.Loss loss, int T, int J, double shrinkage, double f)
attributes
- the attribute properties.x
- the training instances.y
- the response variable.loss
- loss function for regression. By default, least absolute
deviation is employed for robust regression.T
- the number of iterations (trees).J
- the number of leaves in each tree.shrinkage
- the shrinkage parameter in (0, 1] controls the learning rate of procedure.f
- the sampling fraction for stochastic tree boosting.public double[] importance()
public double getSamplingRate()
public int getNumLeaves()
public GradientTreeBoost.Loss getLossFunction()
public int size()
public void trim(int T)
T
- the new (smaller) size of tree model set.public double predict(double[] x)
Regression
predict
in interface Regression<double[]>
x
- the instance.public double[] test(double[][] x, double[] y)
x
- the test data set.y
- the test data response values.public double[][] test(double[][] x, double[] y, RegressionMeasure[] measures)
x
- the test data set.y
- the test data labels.measures
- the performance measures of regression.Copyright © 2015. All rights reserved.