public class RandomForest extends Object implements Regression<double[]>
Each tree is constructed using the following algorithm:
Modifier and Type | Class and Description |
---|---|
static class |
RandomForest.Trainer
Trainer for random forest.
|
Constructor and Description |
---|
RandomForest(Attribute[] attributes,
double[][] x,
double[] y,
int T)
Constructor.
|
RandomForest(Attribute[] attributes,
double[][] x,
double[] y,
int T,
int M,
int S)
Constructor.
|
RandomForest(double[][] x,
double[] y,
int T)
Constructor.
|
RandomForest(double[][] x,
double[] y,
int T,
int M,
int S)
Constructor.
|
Modifier and Type | Method and Description |
---|---|
double |
error()
Returns the out-of-bag estimation of RMSE.
|
double[] |
importance()
Returns the variable importance.
|
double |
predict(double[] x)
Predicts the dependent variable of an instance.
|
int |
size()
Returns the number of trees in the model.
|
double[] |
test(double[][] x,
double[] y)
Test the model on a validation dataset.
|
double[][] |
test(double[][] x,
double[] y,
RegressionMeasure[] measures)
Test the model on a validation dataset.
|
void |
trim(int T)
Trims the tree model set to a smaller size in case of over-fitting.
|
public RandomForest(double[][] x, double[] y, int T)
x
- the training instances.y
- the response variable.T
- the number of trees.public RandomForest(double[][] x, double[] y, int T, int M, int S)
x
- the training instances.y
- the response variable.T
- the number of trees.M
- the number of input variables to be used to determine the decision
at a node of the tree. dim/3 seems to give generally good performance,
where dim is the number of variables.S
- the number of instances in a node below which the tree will
not split, setting S = 5 generally gives good results.public RandomForest(Attribute[] attributes, double[][] x, double[] y, int T)
attributes
- the attribute properties.x
- the training instances.y
- the response variable.T
- the number of trees.public RandomForest(Attribute[] attributes, double[][] x, double[] y, int T, int M, int S)
attributes
- the attribute properties.x
- the training instances.y
- the response variable.T
- the number of trees.M
- the number of input variables to be used to determine the decision
at a node of the tree. dim/3 seems to give generally good performance,
where dim is the number of variables.S
- the number of instances in a node below which the tree will
not split, setting S = 5 generally gives good results.public double error()
public double[] importance()
public int size()
public void trim(int T)
T
- the new (smaller) size of tree model set.public double predict(double[] x)
Regression
predict
in interface Regression<double[]>
x
- the instance.public double[] test(double[][] x, double[] y)
x
- the test data set.y
- the test data response values.public double[][] test(double[][] x, double[] y, RegressionMeasure[] measures)
x
- the test data set.y
- the test data labels.measures
- the performance measures of regression.Copyright © 2015. All rights reserved.