public class RandomForest extends Object implements Regression<double[]>
Each tree is constructed using the following algorithm:
| Modifier and Type | Class and Description |
|---|---|
static class |
RandomForest.Trainer
Trainer for random forest.
|
| Constructor and Description |
|---|
RandomForest(Attribute[] attributes,
double[][] x,
double[] y,
int T)
Constructor.
|
RandomForest(Attribute[] attributes,
double[][] x,
double[] y,
int T,
int M,
int S)
Constructor.
|
RandomForest(double[][] x,
double[] y,
int T)
Constructor.
|
RandomForest(double[][] x,
double[] y,
int T,
int M,
int S)
Constructor.
|
| Modifier and Type | Method and Description |
|---|---|
double |
error()
Returns the out-of-bag estimation of RMSE.
|
double[] |
importance()
Returns the variable importance.
|
double |
predict(double[] x)
Predicts the dependent variable of an instance.
|
int |
size()
Returns the number of trees in the model.
|
double[] |
test(double[][] x,
double[] y)
Test the model on a validation dataset.
|
double[][] |
test(double[][] x,
double[] y,
RegressionMeasure[] measures)
Test the model on a validation dataset.
|
void |
trim(int T)
Trims the tree model set to a smaller size in case of over-fitting.
|
public RandomForest(double[][] x,
double[] y,
int T)
x - the training instances.y - the response variable.T - the number of trees.public RandomForest(double[][] x,
double[] y,
int T,
int M,
int S)
x - the training instances.y - the response variable.T - the number of trees.M - the number of input variables to be used to determine the decision
at a node of the tree. dim/3 seems to give generally good performance,
where dim is the number of variables.S - the number of instances in a node below which the tree will
not split, setting S = 5 generally gives good results.public RandomForest(Attribute[] attributes, double[][] x, double[] y, int T)
attributes - the attribute properties.x - the training instances.y - the response variable.T - the number of trees.public RandomForest(Attribute[] attributes, double[][] x, double[] y, int T, int M, int S)
attributes - the attribute properties.x - the training instances.y - the response variable.T - the number of trees.M - the number of input variables to be used to determine the decision
at a node of the tree. dim/3 seems to give generally good performance,
where dim is the number of variables.S - the number of instances in a node below which the tree will
not split, setting S = 5 generally gives good results.public double error()
public double[] importance()
public int size()
public void trim(int T)
T - the new (smaller) size of tree model set.public double predict(double[] x)
Regressionpredict in interface Regression<double[]>x - the instance.public double[] test(double[][] x,
double[] y)
x - the test data set.y - the test data response values.public double[][] test(double[][] x,
double[] y,
RegressionMeasure[] measures)
x - the test data set.y - the test data labels.measures - the performance measures of regression.Copyright © 2015. All rights reserved.