public class RandomForest extends Object implements Classifier<double[]>
Each tree is constructed using the following algorithm:
Modifier and Type | Class and Description |
---|---|
static class |
RandomForest.Trainer
Trainer for random forest classifiers.
|
Constructor and Description |
---|
RandomForest(Attribute[] attributes,
double[][] x,
int[] y,
int T)
Constructor.
|
RandomForest(Attribute[] attributes,
double[][] x,
int[] y,
int T,
int M)
Constructor.
|
RandomForest(double[][] x,
int[] y,
int T)
Constructor.
|
RandomForest(double[][] x,
int[] y,
int T,
int M)
Constructor.
|
Modifier and Type | Method and Description |
---|---|
double |
error()
Returns the out-of-bag estimation of error rate.
|
double[] |
importance()
Returns the variable importance.
|
int |
predict(double[] x)
Predicts the class label of an instance.
|
int |
predict(double[] x,
double[] posteriori)
Predicts the class label of an instance and also calculate a posteriori
probabilities.
|
int |
size()
Returns the number of trees in the model.
|
double[] |
test(double[][] x,
int[] y)
Test the model on a validation dataset.
|
double[][] |
test(double[][] x,
int[] y,
ClassificationMeasure[] measures)
Test the model on a validation dataset.
|
void |
trim(int T)
Trims the tree model set to a smaller size in case of over-fitting.
|
public RandomForest(double[][] x, int[] y, int T)
x
- the training instances.y
- the response variable.T
- the number of trees.public RandomForest(double[][] x, int[] y, int T, int M)
x
- the training instances.y
- the response variable.T
- the number of trees.M
- the number of random selected features to be used to determine
the decision at a node of the tree. floor(sqrt(dim)) seems to give
generally good performance, where dim is the number of variables.public RandomForest(Attribute[] attributes, double[][] x, int[] y, int T)
attributes
- the attribute properties.x
- the training instances.y
- the response variable.T
- the number of trees.public RandomForest(Attribute[] attributes, double[][] x, int[] y, int T, int M)
attributes
- the attribute properties.x
- the training instances.y
- the response variable.T
- the number of trees.M
- the number of random selected features to be used to determine
the decision at a node of the tree. floor(sqrt(dim)) seems to give
generally good performance, where dim is the number of variables.public double error()
public double[] importance()
public int size()
public void trim(int T)
T
- the new (smaller) size of tree model set.public int predict(double[] x)
Classifier
predict
in interface Classifier<double[]>
x
- the instance to be classified.public int predict(double[] x, double[] posteriori)
Classifier
predict
in interface Classifier<double[]>
x
- the instance to be classified.posteriori
- the array to store a posteriori probabilities on output.public double[] test(double[][] x, int[] y)
x
- the test data set.y
- the test data response values.public double[][] test(double[][] x, int[] y, ClassificationMeasure[] measures)
x
- the test data set.y
- the test data labels.measures
- the performance measures of classification.Copyright © 2015. All rights reserved.