org.allenai.nlpstack.parse.poly

decisiontree

package decisiontree

Implements C4.5 decision trees for integral labels and attributes.

Main class to use is org.allenai.nlpstack.parse.poly.decisiontree.DecisionTree. Use the companion object to build the tree. Then use ) or ) to do prediction.

The tree takes data in the form of org.allenai.nlpstack.parse.poly.decisiontree.FeatureVectors. This is a container for a collection of org.allenai.nlpstack.parse.poly.decisiontree.FeatureVector objects.

Implementations of these are org.allenai.nlpstack.parse.poly.decisiontree.SparseVector or org.allenai.nlpstack.parse.poly.decisiontree.DenseVector.

Linear Supertypes
AnyRef, Any
Ordering
  1. Alphabetic
  2. By inheritance
Inherited
  1. decisiontree
  2. AnyRef
  3. Any
  1. Hide All
  2. Show all
Learn more about member selection
Visibility
  1. Public
  2. All

Type Members

  1. case class DecisionTree(outcomes: Iterable[Int], child: IndexedSeq[Map[Int, Int]], splittingFeature: IndexedSeq[Option[Int]], outcomeHistograms: IndexedSeq[Map[Int, Int]]) extends ProbabilisticClassifier with Product with Serializable

    Immutable decision tree for integer-valued features and outcomes.

    Immutable decision tree for integer-valued features and outcomes.

    Each data structure is an indexed sequence of properties. The ith element of each sequence is the property of node i of the decision tree.

    outcomes

    all possible outcomes for the decision tree

    child

    stores the children of each node (as a map from feature values to node ids)

    splittingFeature

    stores the feature that each node splits on; can be None for leaf nodes

    outcomeHistograms

    for each node, stores a map of outcomes to their frequency of appearance at that node (i.e. how many times a training vector with that outcome makes it to this node during classification)

  2. class DecisionTreeTrainer extends ProbabilisticClassifierTrainer

    Functions for training decision trees.

  3. case class DenseVector(outcome: Option[Int], features: IndexedSeq[Int]) extends FeatureVector with Product with Serializable

    A DenseVector is a feature vector with arbitrary integral features.

    A DenseVector is a feature vector with arbitrary integral features.

    outcome

    the outcome of the feature vector

    features

    the value of each feature

  4. sealed trait FeatureVector extends AnyRef

    A feature vector with integral features and outcome.

  5. trait FeatureVectorSource extends AnyRef

  6. case class InMemoryFeatureVectorSource(featureVecs: IndexedSeq[FeatureVector], classificationTask: ClassificationTask) extends FeatureVectorSource with Product with Serializable

    FeatureVectors is a convenience container for feature vectors.

    FeatureVectors is a convenience container for feature vectors.

    The number of features must be the same for all feature vectors in the container.

    featureVecs

    collection of FeatureVector objects

  7. class OmnibusTrainer extends ProbabilisticClassifierTrainer

  8. case class OneVersusAll(binaryClassifiers: Seq[(Int, ProbabilisticClassifier)]) extends ProbabilisticClassifier with Product with Serializable

    The OneVersusAll implements multi-outcome classification as a set of binary classifiers.

    The OneVersusAll implements multi-outcome classification as a set of binary classifiers.

    A ProbabilisticClassifier is associated with each outcome. Suppose there are three outcomes: 0, 1, 2. Then the constructor would take a sequence of three classifiers as its argument: [(0,A), (1,B), (2,C)]. To compute the outcome distribution for a new feature vector v, the OneVersusAll would normalize:

    [ A.outcomeDistribution(v)(1), B.outcomeDistribution(v)(1), C.outcomeDistribution(v)(1) ]

    i.e. the probability of 1 (true) according to binary classifiers A, B, and C.

    QUESTION(MH): is this the best way to normalize these, or would it be better to normalize by summing the logs and then re-applying the exponential operation?

    binaryClassifiers

    the binary classifier associated with each outcome

  9. class OneVersusAllTrainer extends ProbabilisticClassifierTrainer

    A OneVersusAllTrainer trains a OneVersusAll using a base ProbabilisticClassifierTrainer to train one binary classifier per outcome.

  10. trait ProbabilisticClassifier extends AnyRef

  11. trait ProbabilisticClassifierTrainer extends (FeatureVectorSource) ⇒ ProbabilisticClassifier

  12. case class RandomForest(allOutcomes: Seq[Int], decisionTrees: Seq[DecisionTree]) extends ProbabilisticClassifier with Product with Serializable

    A RandomForest is a collection of decision trees.

    A RandomForest is a collection of decision trees. Each decision tree gets a single vote about the outcome. The outcome distribution is the normalized histogram of the votes.

    allOutcomes

    the collection of possible outcomes

    decisionTrees

    the collection of decision trees

  13. class RandomForestTrainer extends ProbabilisticClassifierTrainer

    A RandomForestTrainer trains a RandomForest from a set of feature vectors.

  14. case class RemappedFeatureVectorSource(fvSource: FeatureVectorSource, outcomeRemapping: (Int) ⇒ Int) extends FeatureVectorSource with Product with Serializable

  15. case class SparseVector(outcome: Option[Int], numFeatures: Int, trueFeatures: Set[Int]) extends FeatureVector with Product with Serializable

    A SparseVector is a feature vector with sparse binary features.

    A SparseVector is a feature vector with sparse binary features.

    outcome

    the outcome of the feature vector

    numFeatures

    the number of features

    trueFeatures

    the set of features with value 1

Value Members

  1. object DecisionTree extends Serializable

  2. object ProbabilisticClassifier

  3. object RandomForest extends Serializable

Inherited from AnyRef

Inherited from Any

Ungrouped