Package

org.hammerlab.magic.rdd

keyed

Permalink

package keyed

Visibility
  1. Public
  2. All

Type Members

  1. class CappedGroupByKeyRDD[K, V] extends AnyRef

    Permalink

    Wrap an RDD and expose a cappedGroupByKey method, which behaves like org.apache.spark.rdd.PairRDDFunctions.groupByKey but with a cap on the number of values that will be accumulated for each key.

    Wrap an RDD and expose a cappedGroupByKey method, which behaves like org.apache.spark.rdd.PairRDDFunctions.groupByKey but with a cap on the number of values that will be accumulated for each key.

    Takes the first values for each key, discarding the rest; to obtain a random sampling of the elements for each key, see SampleByKeyRDD.

  2. case class FilterKeysRDD[K, V](rdd: RDD[(K, V)])(implicit evidence$1: ClassTag[K], evidence$2: ClassTag[V]) extends Product with Serializable

    Permalink
  3. class KeySamples[V] extends Serializable

    Permalink
  4. case class ReduceByKeyRDD[K, V](rdd: RDD[(K, V)])(implicit evidence$1: ClassTag[K], evidence$2: ClassTag[V], ord: Ordering[V]) extends Product with Serializable

    Permalink

    Adds maxByKey and minByKey helpers to an RDD.

  5. class SampleByKeyRDD[K, V] extends AnyRef

    Permalink
  6. case class SplitByKeyRDD[K, V](rdd: RDD[(K, V)])(implicit evidence$1: ClassTag[K], evidence$2: ClassTag[V]) extends Product with Serializable

    Permalink

    Add splitByKey method to any RDD of pairs: returns a Map from each key (K) to an RDD[V] with all the values that had that key in the original RDD (with relative order preserved for each key).

    Add splitByKey method to any RDD of pairs: returns a Map from each key (K) to an RDD[V] with all the values that had that key in the original RDD (with relative order preserved for each key).

    One shuffle stage on all keys and their values yields an RDD whose partitions are arranged in disjoint, contiguous regions corresponding to all the values for each key; this is much more efficient than a naive approach to separating RDDs by key: performing an RDD.filter for each key in the RDD;.

    However, it's worth noting that breaking up an RDD into a collection of RDDs in this way is fairly unidiomatic, and if one finds themselves wanting this it's worth pausing and considering taking different actions upstream.

    rdd

    Paired RDD to split up by key.

Value Members

  1. object CappedGroupByKeyRDD

    Permalink
  2. object FilterKeysRDD extends Serializable

    Permalink
  3. object KeySamples extends Serializable

    Permalink
  4. object ReduceByKeyRDD extends Serializable

    Permalink
  5. object SampleByKeyRDD

    Permalink
  6. object SplitByKeyRDD extends Serializable

    Permalink

Ungrouped