An Estimator that chooses all sparse features observed when training, and produces a transformer which builds a sparse vector out of them.
Caches an RDD at a given point within a Pipeline.
Caches an RDD at a given point within a Pipeline. Follows Spark's lazy evaluation conventions.
Type of the input to cache.
An optional name to set on the cached output. Useful for debugging.
Given a set of class labels, returns a binary vector that indicates when each class is present.
Given a set of class labels, returns a binary vector that indicates when each class is present.
Expects labels in the range [0, numClasses) and numClasses > 1.
Given a class label, returns a binary vector that indicates when that class is present.
Given a class label, returns a binary vector that indicates when that class is present.
Expects labels in the range [0, numClasses) and numClasses > 1.
An Estimator that chooses the most frequently observed sparse features when training, and produces a transformer which builds a sparse vector out of them
An Estimator that chooses the most frequently observed sparse features when training, and produces a transformer which builds a sparse vector out of them
Deterministically orders the feature mappings first by decreasing number of appearances, then by earliest appearance in the RDD
The number of features to keep
Transformer to densify vectors into DenseVectors.
This class performs a no-op on its input.
This class performs a no-op on its input.
Type of the input and, by definition, output.
Randomly shuffle the rows of an RDD within a pipeline.
Randomly shuffle the rows of an RDD within a pipeline. Uses a shuffle operation in Spark.
Type of the input to shuffle.
A transformer which given a feature space, maps features of the form (feature id, value) into a sparse vector
Transformer to convert vectors into SparseVectors.
Transformer that returns the indices of the largest k values of the vector, in order
Concats a Seq of DenseVectors into a single DenseVector.
This transformer splits the input vector into a number of blocks.
Converts float matrix to a double matrix.
Flattens a matrix into a vector.
Transformer that returns the index of the largest value in the vector
Object to allow creating top k classifier w/o new
An Estimator that chooses all sparse features observed when training, and produces a transformer which builds a sparse vector out of them.
Deterministically orders the feature mappings by earliest appearance in the RDD