keyed

Type Members

class CappedGroupByKeyRDD[K, V] extends AnyRef

Wrap an RDD and expose a cappedGroupByKey method, which behaves like org.apache.spark.rdd.PairRDDFunctions.groupByKey but with a cap on the number of values that will be accumulated for each key.
Wrap an RDD and expose a cappedGroupByKey method, which behaves like org.apache.spark.rdd.PairRDDFunctions.groupByKey but with a cap on the number of values that will be accumulated for each key.
Takes the first values for each key, discarding the rest; to obtain a random sampling of the elements for each key, see SampleByKeyRDD.
case class FilterKeysRDD[K, V](rdd: RDD[(K, V)])(implicit evidence$1: ClassTag[K], evidence$2: ClassTag[V]) extends Product with Serializable
class KeySamples[V] extends Serializable
case class ReduceByKeyRDD[K, V](rdd: RDD[(K, V)])(implicit evidence$1: ClassTag[K], evidence$2: ClassTag[V], ord: Ordering[V]) extends Product with Serializable

Adds maxByKey and minByKey helpers to an RDD.
class SampleByKeyRDD[K, V] extends AnyRef
case class SplitByKeyRDD[K, V](rdd: RDD[(K, V)])(implicit evidence$1: ClassTag[K], evidence$2: ClassTag[V]) extends Product with Serializable

Add splitByKey method to any RDD of pairs: returns a Map from each key (K) to an RDD[V] with all the values that had that key in the original RDD (with relative order preserved for each key).
Add splitByKey method to any RDD of pairs: returns a Map from each key (K) to an RDD[V] with all the values that had that key in the original RDD (with relative order preserved for each key).
One shuffle stage on all keys and their values yields an RDD whose partitions are arranged in disjoint, contiguous regions corresponding to all the values for each key; this is much more efficient than a naive approach to separating RDDs by key: performing an RDD.filter for each key in the RDD;.
However, it's worth noting that breaking up an RDD into a collection of RDDs in this way is fairly unidiomatic, and if one finds themselves wanting this it's worth pausing and considering taking different actions upstream.
rdd
Paired RDD to split up by key.

Value Members

object CappedGroupByKeyRDD
object FilterKeysRDD extends Serializable
object KeySamples extends Serializable
object ReduceByKeyRDD extends Serializable
object SampleByKeyRDD
object SplitByKeyRDD extends Serializable

package keyed

Type Members

class CappedGroupByKeyRDD[K, V] extends AnyRef

case class FilterKeysRDD[K, V](rdd: RDD[(K, V)])(implicit evidence$1: ClassTag[K], evidence$2: ClassTag[V]) extends Product with Serializable

class KeySamples[V] extends Serializable

case class ReduceByKeyRDD[K, V](rdd: RDD[(K, V)])(implicit evidence$1: ClassTag[K], evidence$2: ClassTag[V], ord: Ordering[V]) extends Product with Serializable

class SampleByKeyRDD[K, V] extends AnyRef

case class SplitByKeyRDD[K, V](rdd: RDD[(K, V)])(implicit evidence$1: ClassTag[K], evidence$2: ClassTag[V]) extends Product with Serializable

Value Members

object CappedGroupByKeyRDD

object FilterKeysRDD extends Serializable

object KeySamples extends Serializable

object ReduceByKeyRDD extends Serializable

object SampleByKeyRDD

object SplitByKeyRDD extends Serializable

Ungrouped