

package decisiontree

Implements C4.5 decision trees for integral labels and attributes.

Main class to use is org.allenai.nlpstack.parse.poly.decisiontree.DecisionTree. Use the companion object to build the tree. Then use ) or ) to do prediction.

The tree takes data in the form of org.allenai.nlpstack.parse.poly.decisiontree.FeatureVectors. This is a container for a collection of org.allenai.nlpstack.parse.poly.decisiontree.FeatureVector objects.

Implementations of these are org.allenai.nlpstack.parse.poly.decisiontree.SparseVector or org.allenai.nlpstack.parse.poly.decisiontree.DenseVector.

Linear Supertypes
AnyRef, Any
  1. Alphabetic
  2. By inheritance
  1. decisiontree
  2. AnyRef
  3. Any
  1. Hide All
  2. Show all
Learn more about member selection
  1. Public
  2. All

Type Members

  1. case class DecisionTree(outcomes: Iterable[Int], child: IndexedSeq[Map[Int, Int]], splittingFeature: IndexedSeq[Option[Int]], outcomeHistograms: IndexedSeq[Map[Int, Int]]) extends ProbabilisticClassifier with Product with Serializable

    Immutable decision tree for integer-valued features and outcomes.

    Immutable decision tree for integer-valued features and outcomes.

    Each data structure is an indexed sequence of properties. The ith element of each sequence is the property of node i of the decision tree.


    all possible outcomes for the decision tree


    stores the children of each node (as a map from feature values to node ids)


    stores the feature that each node splits on; can be None for leaf nodes


    for each node, stores a map of outcomes to their frequency of appearance at that node (i.e. how many times a training vector with that outcome makes it to this node during classification)

  2. class DecisionTreeTrainer extends ProbabilisticClassifierTrainer

    Functions for training decision trees.

  3. case class DenseVector(outcome: Option[Int], features: IndexedSeq[Int]) extends FeatureVector with Product with Serializable

    A DenseVector is a feature vector with arbitrary integral features.

    A DenseVector is a feature vector with arbitrary integral features.


    the outcome of the feature vector


    the value of each feature

  4. sealed trait FeatureVector extends AnyRef

    A feature vector with integral features and outcome.

  5. trait FeatureVectorSource extends AnyRef

  6. case class InMemoryFeatureVectorSource(featureVecs: IndexedSeq[FeatureVector], classificationTask: ClassificationTask) extends FeatureVectorSource with Product with Serializable

    FeatureVectors is a convenience container for feature vectors.

    FeatureVectors is a convenience container for feature vectors.

    The number of features must be the same for all feature vectors in the container.


    collection of FeatureVector objects

  7. class OmnibusTrainer extends ProbabilisticClassifierTrainer

  8. case class OneVersusAll(binaryClassifiers: Seq[(Int, ProbabilisticClassifier)]) extends ProbabilisticClassifier with Product with Serializable

    The OneVersusAll implements multi-outcome classification as a set of binary classifiers.

    The OneVersusAll implements multi-outcome classification as a set of binary classifiers.

    A ProbabilisticClassifier is associated with each outcome. Suppose there are three outcomes: 0, 1, 2. Then the constructor would take a sequence of three classifiers as its argument: [(0,A), (1,B), (2,C)]. To compute the outcome distribution for a new feature vector v, the OneVersusAll would normalize:

    [ A.outcomeDistribution(v)(1), B.outcomeDistribution(v)(1), C.outcomeDistribution(v)(1) ]

    i.e. the probability of 1 (true) according to binary classifiers A, B, and C.

    QUESTION(MH): is this the best way to normalize these, or would it be better to normalize by summing the logs and then re-applying the exponential operation?


    the binary classifier associated with each outcome

  9. class OneVersusAllTrainer extends ProbabilisticClassifierTrainer

    A OneVersusAllTrainer trains a OneVersusAll using a base ProbabilisticClassifierTrainer to train one binary classifier per outcome.

  10. trait ProbabilisticClassifier extends AnyRef

  11. trait ProbabilisticClassifierTrainer extends (FeatureVectorSource) ⇒ ProbabilisticClassifier

  12. case class RandomForest(allOutcomes: Seq[Int], decisionTrees: Seq[DecisionTree]) extends ProbabilisticClassifier with Product with Serializable

    A RandomForest is a collection of decision trees.

    A RandomForest is a collection of decision trees. Each decision tree gets a single vote about the outcome. The outcome distribution is the normalized histogram of the votes.


    the collection of possible outcomes


    the collection of decision trees

  13. class RandomForestTrainer extends ProbabilisticClassifierTrainer

    A RandomForestTrainer trains a RandomForest from a set of feature vectors.

  14. case class RemappedFeatureVectorSource(fvSource: FeatureVectorSource, outcomeRemapping: (Int) ⇒ Int) extends FeatureVectorSource with Product with Serializable

  15. case class SparseVector(outcome: Option[Int], numFeatures: Int, trueFeatures: Set[Int]) extends FeatureVector with Product with Serializable

    A SparseVector is a feature vector with sparse binary features.

    A SparseVector is a feature vector with sparse binary features.


    the outcome of the feature vector


    the number of features


    the set of features with value 1

Value Members

  1. object DecisionTree extends Serializable

  2. object ProbabilisticClassifier

  3. object RandomForest extends Serializable

Inherited from AnyRef

Inherited from Any
