CART

java.lang.Object
- smile.base.cart.CART

All Implemented Interfaces:

java.io.Serializable, SHAP<smile.data.Tuple>

Direct Known Subclasses:

DecisionTree, RegressionTree
```
public abstract class CART
extends java.lang.Object
implements SHAP<smile.data.Tuple>, java.io.Serializable
```
Classification and regression tree.

See Also:

Serialized Form

Field Summary

Fields
Modifier and Type	Field and Description
`protected smile.data.formula.Formula`	`formula` The model formula.
`protected double[]`	`importance` Variable importance.
`protected int[]`	`index` An index of samples to their original locations in training dataset.
`protected int`	`maxDepth` The maximum depth of the tree.
`protected int`	`maxNodes` The maximum number of leaf nodes in the tree.
`protected int`	`mtry` The number of input variables to be used to determine the decision at a node of the tree.
`protected int`	`nodeSize` The number of instances in a node below which the tree will not split, setting nodeSize = 5 generally gives good results.
`protected int[][]`	`order` An index of training values.
`protected smile.data.type.StructField`	`response` The schema of response variable.
`protected Node`	`root` The root of decision tree.
`protected int[]`	`samples` The samples for training this node.
`protected smile.data.type.StructType`	`schema` The schema of predictors.
`protected smile.data.DataFrame`	`x` The training data.

Constructor Summary

Constructors
Constructor and Description
`CART(smile.data.DataFrame x, smile.data.type.StructField y, int maxDepth, int maxNodes, int nodeSize, int mtry, int[] samples, int[][] order)` Constructor.
`CART(smile.data.formula.Formula formula, smile.data.type.StructType schema, smile.data.type.StructField response, Node root, double[] importance)` Constructor.

Method Summary

All Methods Static Methods Instance Methods Abstract Methods Concrete Methods
Modifier and Type	Method and Description
`protected void`	`clear()` Clear the workspace of building tree.
`java.lang.String`	`dot()` Returns the graphic representation in Graphviz dot format.
`protected abstract java.util.Optional<Split>`	`findBestSplit(LeafNode node, int column, double impurity, int lo, int hi)` Finds the best split for given column.
`protected java.util.Optional<Split>`	`findBestSplit(LeafNode node, int lo, int hi, boolean[] unsplittable)` Finds the best attribute to split on a set of samples.
`double[]`	`importance()` Returns the variable importance.
`protected abstract double`	`impurity(LeafNode node)` Returns the impurity of node.
`protected abstract LeafNode`	`newNode(int[] nodeSamples)` Creates a new leaf node.
`static int[][]`	`order(smile.data.DataFrame x)` Returns the index of ordered samples for each ordinal column.
`protected smile.data.Tuple`	`predictors(smile.data.Tuple x)` Returns the predictors by the model formula if it is not null.
`Node`	`root()` Returs the root node.
`double[]`	`shap(smile.data.DataFrame data)` Returns the average of absolute SHAP values over a data frame.
`double[]`	`shap(smile.data.Tuple x)` Returns the SHAP values.
`int`	`size()` Returns the number of nodes in the tree.
`protected boolean`	`split(Split split, java.util.PriorityQueue<Split> queue)` Split a node into two children nodes.
`java.lang.String`	`toString()` Returns a text representation of the tree in R's rpart format.

Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait

Methods inherited from interface smile.feature.SHAP
shap

- Field Detail
  - formula
```
protected smile.data.formula.Formula formula
```
    The model formula.
  - schema
```
protected smile.data.type.StructType schema
```
    The schema of predictors.
  - response
```
protected smile.data.type.StructField response
```
    The schema of response variable.
  - root
```
protected Node root
```
    The root of decision tree.
  - maxDepth
```
protected int maxDepth
```
    The maximum depth of the tree.
  - maxNodes
```
protected int maxNodes
```
    The maximum number of leaf nodes in the tree.
  - nodeSize
```
protected int nodeSize
```
    The number of instances in a node below which the tree will not split, setting nodeSize = 5 generally gives good results.
  - mtry
```
protected int mtry
```
    The number of input variables to be used to determine the decision at a node of the tree.
  - importance
```
protected double[] importance
```
    Variable importance. Every time a split of a node is made on variable the (GINI, information gain, etc.) impurity criterion for the two descendent nodes is less than the parent node. Adding up the decreases for each individual variable over the tree gives a simple measure of variable importance.
  - x
```
protected transient smile.data.DataFrame x
```
    The training data.
  - samples
```
protected transient int[] samples
```
    The samples for training this node. Note that samples[i] is the number of sampling of dataset[i]. 0 means that the datum is not included and values of greater than 1 are possible because of sampling with replacement.
  - index
```
protected transient int[] index
```
    An index of samples to their original locations in training dataset.
  - order
```
protected transient int[][] order
```
    An index of training values. Initially, order[i] is a set of indices that iterate through the training values for attribute i in ascending order. During training, the array is rearranged so that all values for each leaf node occupy a contiguous range, but within that range they maintain the original ordering. Note that only numeric attributes will be sorted; non-numeric attributes will have a null in the corresponding place in the array.
- Constructor Detail
  - CART
```
public CART(smile.data.formula.Formula formula,
            smile.data.type.StructType schema,
            smile.data.type.StructField response,
            Node root,
            double[] importance)
```
    Constructor.
  - CART
```
public CART(smile.data.DataFrame x,
            smile.data.type.StructField y,
            int maxDepth,
            int maxNodes,
            int nodeSize,
            int mtry,
            int[] samples,
            int[][] order)
```
    Constructor.
    
    Parameters:
    
    x - the data frame of the explanatory variable.
    
    y - the response variables.
    
    maxDepth - the maximum depth of the tree.
    
    maxNodes - the maximum number of leaf nodes in the tree.
    
    nodeSize - the minimum size of leaf nodes.
    
    mtry - the number of input variables to pick to split on at each node. It seems that sqrt(p) give generally good performance, where p is the number of variables.
    
    samples - the sample set of instances for stochastic learning. samples[i] is the number of sampling for instance i.
    
    order - the index of training values in ascending order. Note that only numeric attributes need be sorted.
- Method Detail
  - size
```
public int size()
```
    Returns the number of nodes in the tree.
  - order
```
public static int[][] order(smile.data.DataFrame x)
```
    Returns the index of ordered samples for each ordinal column.
  - predictors
```
protected smile.data.Tuple predictors(smile.data.Tuple x)
```
    Returns the predictors by the model formula if it is not null. Otherwise return the input tuple.
  - clear
```
protected void clear()
```
    Clear the workspace of building tree.
  - split
```
protected boolean split(Split split,
                        java.util.PriorityQueue<Split> queue)
```
    Split a node into two children nodes. Returns a new InternalNode if split success. Otherwise, return the node.
  - findBestSplit
```
protected java.util.Optional<Split> findBestSplit(LeafNode node,
                                                  int lo,
                                                  int hi,
                                                  boolean[] unsplittable)
```
    Finds the best attribute to split on a set of samples. at the current node. Returns null if a split doesn't exists to reduce the impurity.
    
    Parameters:
    
    node - the leaf node to split.
    
    lo - the inclusive lower bound of the data partition in the reordered sample index array.
    
    hi - the exclusive upper bound of the data partition in the reordered sample index array.
    
    unsplittable - unsplittable[j] is true if the column j cannot be split further in the node.
  - impurity
```
protected abstract double impurity(LeafNode node)
```
    Returns the impurity of node.
  - newNode
```
protected abstract LeafNode newNode(int[] nodeSamples)
```
    Creates a new leaf node.
  - findBestSplit
```
protected abstract java.util.Optional<Split> findBestSplit(LeafNode node,
                                                           int column,
                                                           double impurity,
                                                           int lo,
                                                           int hi)
```
    Finds the best split for given column.
  - importance
```
public double[] importance()
```
    Returns the variable importance. Every time a split of a node is made on variable the (GINI, information gain, etc.) impurity criterion for the two descendent nodes is less than the parent node. Adding up the decreases for each individual variable over the tree gives a simple measure of variable importance.
    
    Returns:
    
    the variable importance
  - root
```
public Node root()
```
    Returs the root node.
    
    Returns:
    
    root node.
  - dot
```
public java.lang.String dot()
```
    Returns the graphic representation in Graphviz dot format. Try http://viz-js.com/ to visualize the returned string.
  - toString
```
public java.lang.String toString()
```
    Returns a text representation of the tree in R's rpart format. A semi-graphical layout of the tree. Indentation is used to convey the tree topology. Information for each node includes the node number, split, size, deviance, and fitted value. For the decision tree, the class probabilities are also printed.
    
    Overrides:
    
    toString in class java.lang.Object
  - shap
```
public double[] shap(smile.data.DataFrame data)
```
    Returns the average of absolute SHAP values over a data frame.
  - shap
```
public double[] shap(smile.data.Tuple x)
```
    Description copied from interface: SHAP
    
    Returns the SHAP values. For regression, the length of SHAP values is same as the number of features. For classification, SHAP values are of p x k, where p is the number of features and k is the classes. The first k elements are the SHAP values of first feature over k classes, respectively. The rest features follow accordingly.
    
    Specified by:
    
    shap in interface SHAP<smile.data.Tuple>
    
    Parameters:
    
    x - an instance.
    
    Returns:
    
    the SHAP values.

Class CART

Field Summary

Constructor Summary

Method Summary

Methods inherited from class java.lang.Object

Methods inherited from interface smile.feature.SHAP

Field Detail

formula

schema

response

root

maxDepth

maxNodes

nodeSize

mtry

importance

x

samples

index

order

Constructor Detail

CART

CART

Method Detail

size

order

predictors

clear

split

findBestSplit

impurity

newNode

findBestSplit

importance

root

dot

toString

shap

shap