public class RandomForest extends java.lang.Object implements SoftClassifier<double[]>
Each tree is constructed using the following algorithm:
Modifier and Type | Class and Description |
---|---|
static class |
RandomForest.Trainer
Trainer for random forest classifiers.
|
Constructor and Description |
---|
RandomForest(smile.data.Attribute[] attributes,
double[][] x,
int[] y,
int ntrees)
Constructor.
|
RandomForest(smile.data.Attribute[] attributes,
double[][] x,
int[] y,
int ntrees,
int mtry)
Constructor.
|
RandomForest(smile.data.Attribute[] attributes,
double[][] x,
int[] y,
int ntrees,
int maxNodes,
int nodeSize,
int mtry,
double subsample)
Constructor.
|
RandomForest(smile.data.Attribute[] attributes,
double[][] x,
int[] y,
int ntrees,
int maxNodes,
int nodeSize,
int mtry,
double subsample,
DecisionTree.SplitRule rule)
Constructor.
|
RandomForest(smile.data.Attribute[] attributes,
double[][] x,
int[] y,
int ntrees,
int maxNodes,
int nodeSize,
int mtry,
double subsample,
DecisionTree.SplitRule rule,
int[] classWeight)
Constructor.
|
RandomForest(smile.data.AttributeDataset data,
int ntrees)
Constructor.
|
RandomForest(smile.data.AttributeDataset data,
int ntrees,
int mtry)
Constructor.
|
RandomForest(smile.data.AttributeDataset data,
int ntrees,
int maxNodes,
int nodeSize,
int mtry,
double subsample,
DecisionTree.SplitRule rule)
Constructor.
|
RandomForest(smile.data.AttributeDataset data,
int ntrees,
int maxNodes,
int nodeSize,
int mtry,
double subsample,
DecisionTree.SplitRule rule,
int[] classWeight)
Constructor.
|
RandomForest(double[][] x,
int[] y,
int ntrees)
Constructor.
|
RandomForest(double[][] x,
int[] y,
int ntrees,
int mtry)
Constructor.
|
Modifier and Type | Method and Description |
---|---|
double |
error()
Returns the out-of-bag estimation of error rate.
|
DecisionTree[] |
getTrees()
Returns the decision trees.
|
double[] |
importance()
Returns the variable importance.
|
int |
predict(double[] x)
Predicts the class label of an instance.
|
int |
predict(double[] x,
double[] posteriori)
Predicts the class label of an instance and also calculate a posteriori
probabilities.
|
int |
size()
Returns the number of trees in the model.
|
double[] |
test(double[][] x,
int[] y)
Test the model on a validation dataset.
|
double[][] |
test(double[][] x,
int[] y,
ClassificationMeasure[] measures)
Test the model on a validation dataset.
|
void |
trim(int ntrees)
Trims the tree model set to a smaller size in case of over-fitting.
|
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
predict
public RandomForest(double[][] x, int[] y, int ntrees)
x
- the training instances.y
- the response variable.ntrees
- the number of trees.public RandomForest(double[][] x, int[] y, int ntrees, int mtry)
x
- the training instances.y
- the response variable.ntrees
- the number of trees.mtry
- the number of random selected features to be used to determine
the decision at a node of the tree. floor(sqrt(dim)) seems to give
generally good performance, where dim is the number of variables.public RandomForest(smile.data.Attribute[] attributes, double[][] x, int[] y, int ntrees)
attributes
- the attribute properties.x
- the training instances.y
- the response variable.ntrees
- the number of trees.public RandomForest(smile.data.AttributeDataset data, int ntrees)
data
- the datasetntrees
- the number of trees.
generally good performance, where dim is the number of variables.public RandomForest(smile.data.Attribute[] attributes, double[][] x, int[] y, int ntrees, int mtry)
attributes
- the attribute properties.x
- the training instances.y
- the response variable.ntrees
- the number of trees.mtry
- the number of random selected features to be used to determine
the decision at a node of the tree. floor(sqrt(dim)) seems to give
generally good performance, where dim is the number of variables.public RandomForest(smile.data.AttributeDataset data, int ntrees, int mtry)
data
- the datasetntrees
- the number of trees.mtry
- the number of random selected features to be used to determine
the decision at a node of the tree. floor(sqrt(dim)) seems to give
generally good performance, where dim is the number of variables.public RandomForest(smile.data.Attribute[] attributes, double[][] x, int[] y, int ntrees, int maxNodes, int nodeSize, int mtry, double subsample)
attributes
- the attribute properties.x
- the training instances.y
- the response variable.ntrees
- the number of trees.mtry
- the number of random selected features to be used to determine
the decision at a node of the tree. floor(sqrt(dim)) seems to give
generally good performance, where dim is the number of variables.nodeSize
- the minimum size of leaf nodes.maxNodes
- the maximum number of leaf nodes in the tree.subsample
- the sampling rate for training tree. 1.0 means sampling with replacement. < 1.0 means
sampling without replacement.public RandomForest(smile.data.AttributeDataset data, int ntrees, int maxNodes, int nodeSize, int mtry, double subsample, DecisionTree.SplitRule rule)
data
- the datasetntrees
- the number of trees.mtry
- the number of random selected features to be used to determine
the decision at a node of the tree. floor(sqrt(dim)) seems to give
generally good performance, where dim is the number of variables.nodeSize
- the minimum size of leaf nodes.maxNodes
- the maximum number of leaf nodes in the tree.subsample
- the sampling rate for training tree. 1.0 means sampling with replacement. < 1.0 means
sampling without replacement.rule
- Decision tree split rule.public RandomForest(smile.data.Attribute[] attributes, double[][] x, int[] y, int ntrees, int maxNodes, int nodeSize, int mtry, double subsample, DecisionTree.SplitRule rule)
attributes
- the attribute properties.x
- the training instances.y
- the response variable.ntrees
- the number of trees.mtry
- the number of random selected features to be used to determine
the decision at a node of the tree. floor(sqrt(dim)) seems to give
generally good performance, where dim is the number of variables.nodeSize
- the minimum size of leaf nodes.maxNodes
- the maximum number of leaf nodes in the tree.subsample
- the sampling rate for training tree. 1.0 means sampling with replacement. < 1.0 means
sampling without replacement.rule
- Decision tree split rule.public RandomForest(smile.data.AttributeDataset data, int ntrees, int maxNodes, int nodeSize, int mtry, double subsample, DecisionTree.SplitRule rule, int[] classWeight)
data
- the datasetntrees
- the number of trees.mtry
- the number of random selected features to be used to determine
the decision at a node of the tree. floor(sqrt(dim)) seems to give
generally good performance, where dim is the number of variables.nodeSize
- the minimum size of leaf nodes.maxNodes
- the maximum number of leaf nodes in the tree.subsample
- the sampling rate for training tree. 1.0 means sampling with replacement. < 1.0 means
sampling without replacement.rule
- Decision tree split rule.classWeight
- Priors of the classes. The weight of each class
is roughly the ratio of samples in each class.
For example, if
there are 400 positive samples and 100 negative
samples, the classWeight should be [1, 4]
(assuming label 0 is of negative, label 1 is of
positive).public RandomForest(smile.data.Attribute[] attributes, double[][] x, int[] y, int ntrees, int maxNodes, int nodeSize, int mtry, double subsample, DecisionTree.SplitRule rule, int[] classWeight)
attributes
- the attribute properties.x
- the training instances.y
- the response variable.ntrees
- the number of trees.mtry
- the number of random selected features to be used to determine
the decision at a node of the tree. floor(sqrt(dim)) seems to give
generally good performance, where dim is the number of variables.nodeSize
- the minimum size of leaf nodes.maxNodes
- the maximum number of leaf nodes in the tree.subsample
- the sampling rate for training tree. 1.0 means sampling with replacement. < 1.0 means
sampling without replacement.rule
- Decision tree split rule.classWeight
- Priors of the classes. The weight of each class
is roughly the ratio of samples in each class.
For example, if
there are 400 positive samples and 100 negative
samples, the classWeight should be [1, 4]
(assuming label 0 is of negative, label 1 is of
positive).public double error()
public double[] importance()
public int size()
public void trim(int ntrees)
ntrees
- the new (smaller) size of tree model set.public int predict(double[] x)
Classifier
predict
in interface Classifier<double[]>
x
- the instance to be classified.public int predict(double[] x, double[] posteriori)
SoftClassifier
predict
in interface SoftClassifier<double[]>
x
- the instance to be classified.posteriori
- the array to store a posteriori probabilities on output.public double[] test(double[][] x, int[] y)
x
- the test data set.y
- the test data response values.public double[][] test(double[][] x, int[] y, ClassificationMeasure[] measures)
x
- the test data set.y
- the test data labels.measures
- the performance measures of classification.public DecisionTree[] getTrees()