Immutable decision tree for integer-valued features and outcomes.
Immutable decision tree for integer-valued features and outcomes.
Each data structure is an indexed sequence of properties. The ith element of each sequence is the property of node i of the decision tree.
all possible outcomes for the decision tree
stores the children of each node (as a map from feature values to node ids)
stores the feature that each node splits on; can be None for leaf nodes
for each node, stores a map of outcomes to their frequency of appearance at that node (i.e. how many times a training vector with that outcome makes it to this node during classification)
Functions for training decision trees.
A DenseVector is a feature vector with arbitrary integral features.
A DenseVector is a feature vector with arbitrary integral features.
the outcome of the feature vector
the value of each feature
A feature vector with integral features and outcome.
FeatureVectors is a convenience container for feature vectors.
FeatureVectors is a convenience container for feature vectors.
The number of features must be the same for all feature vectors in the container.
collection of FeatureVector objects
The OneVersusAll implements multi-outcome classification as a set of binary classifiers.
The OneVersusAll implements multi-outcome classification as a set of binary classifiers.
A ProbabilisticClassifier is associated with each outcome. Suppose there are three outcomes: 0, 1, 2. Then the constructor would take a sequence of three classifiers as its argument: [(0,A), (1,B), (2,C)]. To compute the outcome distribution for a new feature vector v, the OneVersusAll would normalize:
[ A.outcomeDistribution(v)(1), B.outcomeDistribution(v)(1), C.outcomeDistribution(v)(1) ]
i.e. the probability of 1 (true) according to binary classifiers A, B, and C.
QUESTION(MH): is this the best way to normalize these, or would it be better to normalize by summing the logs and then re-applying the exponential operation?
the binary classifier associated with each outcome
A OneVersusAllTrainer trains a OneVersusAll using a base ProbabilisticClassifierTrainer to train one binary classifier per outcome.
A RandomForest is a collection of decision trees.
A RandomForest is a collection of decision trees. Each decision tree gets a single vote about the outcome. The outcome distribution is the normalized histogram of the votes.
the collection of possible outcomes
the collection of decision trees
A RandomForestTrainer trains a RandomForest from a set of feature vectors.
A SparseVector is a feature vector with sparse binary features.
A SparseVector is a feature vector with sparse binary features.
the outcome of the feature vector
the number of features
the set of features with value 1
Implements C4.5 decision trees for integral labels and attributes.
Main class to use is org.allenai.nlpstack.parse.poly.decisiontree.DecisionTree. Use the companion object to build the tree. Then use ) or ) to do prediction.
The tree takes data in the form of org.allenai.nlpstack.parse.poly.decisiontree.FeatureVectors. This is a container for a collection of org.allenai.nlpstack.parse.poly.decisiontree.FeatureVector objects.
Implementations of these are org.allenai.nlpstack.parse.poly.decisiontree.SparseVector or org.allenai.nlpstack.parse.poly.decisiontree.DenseVector.