public class RandomForest extends java.lang.Object implements Regression<double[]>, java.io.Serializable
Each tree is constructed using the following algorithm:
Modifier and Type | Class and Description |
---|---|
static class |
RandomForest.Trainer
Trainer for random forest.
|
Constructor and Description |
---|
RandomForest(smile.data.Attribute[] attributes,
double[][] x,
double[] y,
int ntrees)
Constructor.
|
RandomForest(smile.data.Attribute[] attributes,
double[][] x,
double[] y,
int ntrees,
int maxNodes)
Constructor.
|
RandomForest(smile.data.Attribute[] attributes,
double[][] x,
double[] y,
int ntrees,
int maxNodes,
int nodeSize)
Constructor.
|
RandomForest(smile.data.Attribute[] attributes,
double[][] x,
double[] y,
int ntrees,
int maxNodes,
int nodeSize,
int mtry)
Constructor.
|
RandomForest(smile.data.Attribute[] attributes,
double[][] x,
double[] y,
int ntrees,
int maxNodes,
int nodeSize,
int mtry,
double subsample)
Constructor.
|
RandomForest(smile.data.Attribute[] attributes,
double[][] x,
double[] y,
int ntrees,
int maxNodes,
int nodeSize,
int mtry,
double subsample,
double[] monotonicRegression)
Constructor.
|
RandomForest(smile.data.AttributeDataset data,
int ntrees)
Constructor.
|
RandomForest(smile.data.AttributeDataset data,
int ntrees,
int maxNodes)
Constructor.
|
RandomForest(smile.data.AttributeDataset data,
int ntrees,
int maxNodes,
int nodeSize)
Constructor.
|
RandomForest(smile.data.AttributeDataset data,
int ntrees,
int maxNodes,
int nodeSize,
int mtry,
double subsample,
double[] monotonicRegression)
Constructor.
|
RandomForest(double[][] x,
double[] y,
int ntrees)
Constructor.
|
RandomForest(double[][] x,
double[] y,
int ntrees,
int maxNodes,
int nodeSize,
int mtry)
Constructor.
|
Modifier and Type | Method and Description |
---|---|
double |
error()
Returns the out-of-bag estimation of RMSE.
|
RegressionTree[] |
getTrees()
Returns the regression trees.
|
double[] |
importance()
Returns the variable importance.
|
RandomForest |
merge(RandomForest other)
Merges together two random forests and returns a new forest consisting of trees from both input forests.
|
double |
predict(double[] x)
Predicts the dependent variable of an instance.
|
int |
size()
Returns the number of trees in the model.
|
double[] |
test(double[][] x,
double[] y)
Test the model on a validation dataset.
|
double[][] |
test(double[][] x,
double[] y,
RegressionMeasure[] measures)
Test the model on a validation dataset.
|
void |
trim(int ntrees)
Trims the tree model set to a smaller size in case of over-fitting.
|
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
predict
public RandomForest(double[][] x, double[] y, int ntrees)
x
- the training instances.y
- the response variable.ntrees
- the number of trees.public RandomForest(double[][] x, double[] y, int ntrees, int maxNodes, int nodeSize, int mtry)
x
- the training instances.y
- the response variable.ntrees
- the number of trees.mtry
- the number of input variables to be used to determine the decision
at a node of the tree. p/3 seems to give generally good performance,
where p is the number of variables.nodeSize
- the number of instances in a node below which the tree will
not split, setting nodeSize = 5 generally gives good results.maxNodes
- the maximum number of leaf nodes in the tree.public RandomForest(smile.data.Attribute[] attributes, double[][] x, double[] y, int ntrees)
attributes
- the attribute properties.x
- the training instances.y
- the response variable.ntrees
- the number of trees.public RandomForest(smile.data.AttributeDataset data, int ntrees)
data
- the datasetntrees
- the number of trees.public RandomForest(smile.data.Attribute[] attributes, double[][] x, double[] y, int ntrees, int maxNodes)
attributes
- the attribute properties.x
- the training instances.y
- the response variable.ntrees
- the number of trees.maxNodes
- the maximum number of leaf nodes in the tree.public RandomForest(smile.data.AttributeDataset data, int ntrees, int maxNodes)
data
- the datasetntrees
- the number of trees.maxNodes
- the maximum number of leaf nodes in the tree.public RandomForest(smile.data.Attribute[] attributes, double[][] x, double[] y, int ntrees, int maxNodes, int nodeSize)
attributes
- the attribute properties.x
- the training instances.y
- the response variable.ntrees
- the number of trees.nodeSize
- the number of instances in a node below which the tree will
not split, setting nodeSize = 5 generally gives good results.maxNodes
- the maximum number of leaf nodes in the tree.public RandomForest(smile.data.AttributeDataset data, int ntrees, int maxNodes, int nodeSize)
data
- the datasetntrees
- the number of trees.maxNodes
- the maximum number of leaf nodes in the tree.nodeSize
- the number of instances in a node below which the tree will
not split, setting nodeSize = 5 generally gives good results.public RandomForest(smile.data.Attribute[] attributes, double[][] x, double[] y, int ntrees, int maxNodes, int nodeSize, int mtry)
attributes
- the attribute properties.x
- the training instances.y
- the response variable.ntrees
- the number of trees.mtry
- the number of input variables to be used to determine the decision
at a node of the tree. p/3 seems to give generally good performance,
where dim is the number of variables.nodeSize
- the number of instances in a node below which the tree will
not split, setting nodeSize = 5 generally gives good results.maxNodes
- the maximum number of leaf nodes in the tree.public RandomForest(smile.data.Attribute[] attributes, double[][] x, double[] y, int ntrees, int maxNodes, int nodeSize, int mtry, double subsample)
attributes
- the attribute properties.x
- the training instances.y
- the response variable.ntrees
- the number of trees.mtry
- the number of input variables to be used to determine the decision
at a node of the tree. p/3 seems to give generally good performance,
where dim is the number of variables.nodeSize
- the number of instances in a node below which the tree will
not split, setting nodeSize = 5 generally gives good results.maxNodes
- the maximum number of leaf nodes in the tree.subsample
- the sampling rate for training tree. 1.0 means sampling with replacement. < 1.0 means
sampling without replacement.public RandomForest(smile.data.AttributeDataset data, int ntrees, int maxNodes, int nodeSize, int mtry, double subsample, double[] monotonicRegression)
data
- the datasetntrees
- the number of trees.mtry
- the number of input variables to be used to determine the decision
at a node of the tree. p/3 seems to give generally good performance,
where dim is the number of variables.nodeSize
- the number of instances in a node below which the tree will
not split, setting nodeSize = 5 generally gives good results.maxNodes
- the maximum number of leaf nodes in the tree.subsample
- the sampling rate for training tree. 1.0 means sampling with replacement. < 1.0 means
sampling without replacement.public RandomForest(smile.data.Attribute[] attributes, double[][] x, double[] y, int ntrees, int maxNodes, int nodeSize, int mtry, double subsample, double[] monotonicRegression)
attributes
- the attribute properties.x
- the training instances.y
- the response variable.ntrees
- the number of trees.mtry
- the number of input variables to be used to determine the decision
at a node of the tree. p/3 seems to give generally good performance,
where dim is the number of variables.nodeSize
- the number of instances in a node below which the tree will
not split, setting nodeSize = 5 generally gives good results.maxNodes
- the maximum number of leaf nodes in the tree.subsample
- the sampling rate for training tree. 1.0 means sampling with replacement. < 1.0 means
sampling without replacement.public RandomForest merge(RandomForest other)
public double error()
public double[] importance()
public int size()
public void trim(int ntrees)
ntrees
- the new (smaller) size of tree model set.public double predict(double[] x)
Regression
predict
in interface Regression<double[]>
x
- the instance.public double[] test(double[][] x, double[] y)
x
- the test data set.y
- the test data response values.public double[][] test(double[][] x, double[] y, RegressionMeasure[] measures)
x
- the test data set.y
- the test data output values.measures
- the performance measures of regression.public RegressionTree[] getTrees()