Package

org.hammerlab.magic.rdd

scan

Permalink

package scan

Visibility
  1. Public
  2. All

Type Members

  1. case class ScanRDD[T](rdd: RDD[T], partitionPrefixes: Array[T], total: T) extends Product with Serializable

    Permalink

    Holds the result of a scan operation over an RDD

    Holds the result of a scan operation over an RDD

    rdd

    post-scan RDD; elements are replaced with the "total" up to *and including* themselves. This differs from scala collections' "scan" behavior, which emits an initial "identity" element.

    partitionPrefixes

    the "sum" of all elements that precede this partition; here the first element is the identity, consistent with scala collections' behavior, but the final "total" element is moved over to the total field, so that this array has the same number of elements as there are RDD partitions.

    total

    the "sum" of all elements in the scanned RDD; Scala collections typically leave this appended to the result of a scan, but it is pulled out separately here.

  2. case class ScanValuesRDD[K, V](rdd: RDD[(K, V)], partitionPrefixes: Array[V], total: V) extends Product with Serializable

    Permalink

    Analogue of ScanRDD for scans over the values of paired RDDs.

    Analogue of ScanRDD for scans over the values of paired RDDs.

    See the ScanRDD for important discussions of field-semantics.

Value Members

  1. object ScanLeftRDD

    Permalink
  2. object ScanLeftValuesRDD

    Permalink
  3. object ScanRDD extends Serializable

    Permalink
  4. object ScanRightRDD

    Permalink

    RDD wrapper supporting methods that compute partial-sums (from right to left) across the RDD.

    RDD wrapper supporting methods that compute partial-sums (from right to left) across the RDD.

    Callers should be aware of one implementation detail: by default, scan-rights proceed by reversing the RDD, performing a scan-left, then reversing the result, which involves 3 Spark jobs.

    An alternative implementation delegates to scala.collection.Iterator.scanRight, which is likely less expensive, but materializes whole partitions into memory, which is generally a severe anti-pattern in Spark computations.

  5. object ScanRightValuesRDD

    Permalink
  6. object ScanValuesRDD extends Serializable

    Permalink

Ungrouped