RegressionTree

java.lang.Object
- smile.regression.RegressionTree

All Implemented Interfaces:

java.io.Serializable, Regression<double[]>
```
public class RegressionTree
extends java.lang.Object
implements Regression<double[]>, java.io.Serializable
```
Decision tree for regression. A decision tree can be learned by splitting the training set into subsets based on an attribute value test. This process is repeated on each derived subset in a recursive manner called recursive partitioning.
Classification and Regression Tree techniques have a number of advantages over many of those alternative techniques.

Simple to understand and interpret.

In most cases, the interpretation of results summarized in a tree is very simple. This simplicity is useful not only for purposes of rapid classification of new observations, but can also often yield a much simpler "model" for explaining why observations are classified or predicted in a particular manner.

Able to handle both numerical and categorical data.

Other techniques are usually specialized in analyzing datasets that have only one type of variable.

Tree methods are nonparametric and nonlinear.

The final results of using tree methods for classification or regression can be summarized in a series of (usually few) logical if-then conditions (tree nodes). Therefore, there is no implicit assumption that the underlying relationships between the predictor variables and the dependent variable are linear, follow some specific non-linear link function, or that they are even monotonic in nature. Thus, tree methods are particularly well suited for data mining tasks, where there is often little a priori knowledge nor any coherent set of theories or predictions regarding which variables are related and how. In those types of data analytics, tree methods can often reveal simple relationships between just a few variables that could have easily gone unnoticed using other analytic techniques.

One major problem with classification and regression trees is their high variance. Often a small change in the data can result in a very different series of splits, making interpretation somewhat precarious. Besides, decision-tree learners can create over-complex trees that cause over-fitting. Mechanisms such as pruning are necessary to avoid this problem. Another limitation of trees is the lack of smoothness of the prediction surface.
Some techniques such as bagging, boosting, and random forest use more than one decision tree for their analysis.

See Also:

GradientTreeBoost, RandomForest, Serialized Form

Nested Class Summary

Nested Classes
Modifier and Type	Class and Description
`static interface`	`RegressionTree.NodeOutput` An interface to calculate node output.
`static class`	`RegressionTree.Trainer` Trainer for regression tree.

Constructor Summary

Constructors
Constructor and Description
`RegressionTree(smile.data.Attribute[] attributes, double[][] x, double[] y, int maxNodes)` Constructor.
`RegressionTree(smile.data.Attribute[] attributes, double[][] x, double[] y, int maxNodes, int nodeSize)` Constructor.
`RegressionTree(smile.data.Attribute[] attributes, double[][] x, double[] y, int maxNodes, int nodeSize, int mtry, int[][] order, int[] samples, RegressionTree.NodeOutput output)` Constructor.
`RegressionTree(double[][] x, double[] y, int maxNodes)` Constructor.
`RegressionTree(double[][] x, double[] y, int maxNodes, int nodeSize)` Constructor.
`RegressionTree(int numFeatures, int[][] x, double[] y, int maxNodes)` Constructor.
`RegressionTree(int numFeatures, int[][] x, double[] y, int maxNodes, int nodeSize)` Constructor.
`RegressionTree(int numFeatures, int[][] x, double[] y, int maxNodes, int nodeSize, int[] samples, RegressionTree.NodeOutput output)` Constructor.

Method Summary

All Methods Instance Methods Concrete Methods
Modifier and Type	Method and Description
`java.lang.String`	`dot()` Returns the graphic representation in Graphviz dot format.
`double[]`	`importance()` Returns the variable importance.
`int`	`maxDepth()` Returns the maximum depth" of the tree -- the number of nodes along the longest path from the root node down to the farthest leaf node.
`double`	`predict(double[] x)` Predicts the dependent variable of an instance.
`double`	`predict(int[] x)` Predicts the dependent variable of an instance with sparse binary features.

Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

Methods inherited from interface smile.regression.Regression
predict

- Constructor Detail
  - RegressionTree
```
public RegressionTree(double[][] x,
                      double[] y,
                      int maxNodes)
```
    Constructor. Learns a regression tree with (most) given number of leaves. All attributes are assumed to be numeric.
    
    Parameters:
    
    x - the training instances.
    
    y - the response variable.
    
    maxNodes - the maximum number of leaf nodes in the tree.
  - RegressionTree
```
public RegressionTree(double[][] x,
                      double[] y,
                      int maxNodes,
                      int nodeSize)
```
    Constructor. Learns a regression tree with (most) given number of leaves. All attributes are assumed to be numeric.
    
    Parameters:
    
    x - the training instances.
    
    y - the response variable.
    
    maxNodes - the maximum number of leaf nodes in the tree.
  - RegressionTree
```
public RegressionTree(smile.data.Attribute[] attributes,
                      double[][] x,
                      double[] y,
                      int maxNodes)
```
    Constructor. Learns a regression tree with (most) given number of leaves.
    
    Parameters:
    
    attributes - the attribute properties.
    
    x - the training instances.
    
    y - the response variable.
    
    maxNodes - the maximum number of leaf nodes in the tree.
  - RegressionTree
```
public RegressionTree(smile.data.Attribute[] attributes,
                      double[][] x,
                      double[] y,
                      int maxNodes,
                      int nodeSize)
```
    Constructor. Learns a regression tree with (most) given number of leaves.
    
    Parameters:
    
    attributes - the attribute properties.
    
    x - the training instances.
    
    y - the response variable.
    
    maxNodes - the maximum number of leaf nodes in the tree.
  - RegressionTree
```
public RegressionTree(smile.data.Attribute[] attributes,
                      double[][] x,
                      double[] y,
                      int maxNodes,
                      int nodeSize,
                      int mtry,
                      int[][] order,
                      int[] samples,
                      RegressionTree.NodeOutput output)
```
    Constructor. Learns a regression tree for random forest and gradient tree boosting.
    
    Parameters:
    
    attributes - the attribute properties.
    
    x - the training instances.
    
    y - the response variable.
    
    maxNodes - the maximum number of leaf nodes in the tree.
    
    nodeSize - the number of instances in a node below which the tree will not split, setting nodeSize = 5 generally gives good results.
    
    mtry - the number of input variables to pick to split on at each node. It seems that p/3 give generally good performance, where p is the number of variables.
    
    order - the index of training values in ascending order. Note that only numeric attributes need be sorted.
    
    samples - the sample set of instances for stochastic learning. samples[i] should be 0 or 1 to indicate if the instance is used for training.
  - RegressionTree
```
public RegressionTree(int numFeatures,
                      int[][] x,
                      double[] y,
                      int maxNodes)
```
    Constructor. Learns a regression tree on sparse binary samples.
    
    Parameters:
    
    numFeatures - the number of sparse binary features.
    
    x - the training instances of sparse binary features.
    
    y - the response variable.
    
    maxNodes - the maximum number of leaf nodes in the tree.
  - RegressionTree
```
public RegressionTree(int numFeatures,
                      int[][] x,
                      double[] y,
                      int maxNodes,
                      int nodeSize)
```
    Constructor. Learns a regression tree on sparse binary samples.
    
    Parameters:
    
    numFeatures - the number of sparse binary features.
    
    x - the training instances of sparse binary features.
    
    y - the response variable.
    
    maxNodes - the maximum number of leaf nodes in the tree.
    
    nodeSize - the number of instances in a node below which the tree will not split, setting nodeSize = 5 generally gives good results.
  - RegressionTree
```
public RegressionTree(int numFeatures,
                      int[][] x,
                      double[] y,
                      int maxNodes,
                      int nodeSize,
                      int[] samples,
                      RegressionTree.NodeOutput output)
```
    Constructor. Learns a regression tree on sparse binary samples.
    
    Parameters:
    
    numFeatures - the number of sparse binary features.
    
    x - the training instances.
    
    y - the response variable.
    
    maxNodes - the maximum number of leaf nodes in the tree.
    
    nodeSize - the number of instances in a node below which the tree will not split, setting nodeSize = 5 generally gives good results.
    
    samples - the sample set of instances for stochastic learning. samples[i] should be 0 or 1 to indicate if the instance is used for training.
- Method Detail
  - importance
```
public double[] importance()
```
    Returns the variable importance. Every time a split of a node is made on variable the impurity criterion for the two descendent nodes is less than the parent node. Adding up the decreases for each individual variable over the tree gives a simple measure of variable importance.
    
    Returns:
    
    the variable importance
  - predict
```
public double predict(double[] x)
```
    Description copied from interface: Regression
    
    Predicts the dependent variable of an instance.
    
    Specified by:
    
    predict in interface Regression<double[]>
    
    Parameters:
    
    x - the instance.
    
    Returns:
    
    the predicted value of dependent variable.
  - predict
```
public double predict(int[] x)
```
    Predicts the dependent variable of an instance with sparse binary features.
    
    Parameters:
    
    x - the instance.
    
    Returns:
    
    the predicted value of dependent variable.
  - maxDepth
```
public int maxDepth()
```
    Returns the maximum depth" of the tree -- the number of nodes along the longest path from the root node down to the farthest leaf node.
  - dot
```
public java.lang.String dot()
```
    Returns the graphic representation in Graphviz dot format. Try http://viz-js.com/ to visualize the returned string.

Class RegressionTree

Nested Class Summary

Constructor Summary

Method Summary

Methods inherited from class java.lang.Object

Methods inherited from interface smile.regression.Regression

Constructor Detail

RegressionTree

RegressionTree

RegressionTree

RegressionTree

RegressionTree

RegressionTree

RegressionTree

RegressionTree

Method Detail

importance

predict

predict

maxDepth

dot