Package

org.hammerlab.magic

rdd

Permalink

package rdd

Visibility
  1. Public
  2. All

Type Members

  1. class CachedCountRegistry extends AnyRef

    Permalink

    CachedCountRegistry adds a .size method to RDDs that mimicks RDD.count, but caches its result.

    CachedCountRegistry adds a .size method to RDDs that mimicks RDD.count, but caches its result.

    It also exposes .sizes and .total on Seq[RDD]s, which compute the constituent RDDs' sizes (per above) in one Spark job.

    Additionally, both sets of APIs optimize computations on UnionRDDs by computing their component RDDs' sizes and caching those as well as the UnionRDD's total.

    Cached size info is keyed by a SparkContext for robustness in apps that stop their SparkContext and then resume with a new one; this is especially useful for testing!

    Usage:

    \ * import org.hammerlab.magic.rdd.CachedCountRegistry._
    val rdd1 = sc.parallelize(0 until 4)
    val rdd2 = sc.parallelize("a" :: "b" :: Nil)
    rdd1.size()
    (rdd1 :: rdd2 :: Nil).sizes()
    (rdd1 :: rdd2 :: Nil).total()
  2. class IfRDD[T] extends Serializable

    Permalink

    Hang an iff method off of RDDs, as a small bit of syntactic sugar.

  3. case class KeyPartitioner(numPartitions: Int) extends Partitioner with Product with Serializable

    Permalink

    Spark Partitioner that maps elements to a partition indicated by an Int that either is the key, or is the first element of a tuple.

  4. class OrderedRepartitionRDD[T] extends Serializable

    Permalink

    Some helpers for repartitioning an RDD while retaining the order of its elements.

  5. class RunLengthRDD[T] extends AnyRef

    Permalink

    Helper for run-length encoding an RDD.

Value Members

  1. object CachedCountRegistry

    Permalink
  2. object IfRDD extends Serializable

    Permalink
  3. object KeyPartitioner extends Serializable

    Permalink
  4. object OrderedRepartitionRDD extends Serializable

    Permalink
  5. object RunLengthRDD

    Permalink
  6. package cmp

    Permalink
  7. package grid

    Permalink
  8. package keyed

    Permalink
  9. package partitions

    Permalink
  10. package serde

    Permalink
  11. package sliding

    Permalink
  12. package sort

    Permalink
  13. package zip

    Permalink

Ungrouped