Class/Object

org.hammerlab.magic.rdd.keyed

SplitByKeyRDD

Related Docs: object SplitByKeyRDD | package keyed

Permalink

case class SplitByKeyRDD[K, V](rdd: RDD[(K, V)])(implicit evidence$1: ClassTag[K], evidence$2: ClassTag[V]) extends Product with Serializable

Add splitByKey method to any RDD of pairs: returns a Map from each key (K) to an RDD[V] with all the values that had that key in the original RDD (with relative order preserved for each key).

One shuffle stage on all keys and their values yields an RDD whose partitions are arranged in disjoint, contiguous regions corresponding to all the values for each key; this is much more efficient than a naive approach to separating RDDs by key: performing an RDD.filter for each key in the RDD;.

However, it's worth noting that breaking up an RDD into a collection of RDDs in this way is fairly unidiomatic, and if one finds themselves wanting this it's worth pausing and considering taking different actions upstream.

rdd

Paired RDD to split up by key.

Linear Supertypes
Serializable, Serializable, Product, Equals, AnyRef, Any
Ordering
  1. Alphabetic
  2. By Inheritance
Inherited
  1. SplitByKeyRDD
  2. Serializable
  3. Serializable
  4. Product
  5. Equals
  6. AnyRef
  7. Any
  1. Hide All
  2. Show All
Visibility
  1. Public
  2. All

Instance Constructors

  1. new SplitByKeyRDD(rdd: RDD[(K, V)])(implicit arg0: ClassTag[K], arg1: ClassTag[V])

    Permalink

    rdd

    Paired RDD to split up by key.

Value Members

  1. final def !=(arg0: Any): Boolean

    Permalink
    Definition Classes
    AnyRef → Any
  2. final def ##(): Int

    Permalink
    Definition Classes
    AnyRef → Any
  3. final def ==(arg0: Any): Boolean

    Permalink
    Definition Classes
    AnyRef → Any
  4. final def asInstanceOf[T0]: T0

    Permalink
    Definition Classes
    Any
  5. def clone(): AnyRef

    Permalink
    Attributes
    protected[java.lang]
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  6. final def eq(arg0: AnyRef): Boolean

    Permalink
    Definition Classes
    AnyRef
  7. def finalize(): Unit

    Permalink
    Attributes
    protected[java.lang]
    Definition Classes
    AnyRef
    Annotations
    @throws( classOf[java.lang.Throwable] )
  8. final def getClass(): Class[_]

    Permalink
    Definition Classes
    AnyRef → Any
  9. final def isInstanceOf[T0]: Boolean

    Permalink
    Definition Classes
    Any
  10. final def ne(arg0: AnyRef): Boolean

    Permalink
    Definition Classes
    AnyRef
  11. final def notify(): Unit

    Permalink
    Definition Classes
    AnyRef
  12. final def notifyAll(): Unit

    Permalink
    Definition Classes
    AnyRef
  13. val rdd: RDD[(K, V)]

    Permalink

    Paired RDD to split up by key.

  14. def splitByKey: Map[K, RDD[V]]

    Permalink
  15. final def synchronized[T0](arg0: ⇒ T0): T0

    Permalink
    Definition Classes
    AnyRef
  16. final def wait(): Unit

    Permalink
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  17. final def wait(arg0: Long, arg1: Int): Unit

    Permalink
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  18. final def wait(arg0: Long): Unit

    Permalink
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )

Inherited from Serializable

Inherited from Serializable

Inherited from Product

Inherited from Equals

Inherited from AnyRef

Inherited from Any

Ungrouped