Package

org.apache.spark

batch

Permalink

package batch

Visibility
  1. Public
  2. All

Type Members

  1. class MapRDD[T] extends RDD[(Int, T)]

    Permalink

    MapRDD is a map-side RDD to serialize partition data, and wait for all parent RDDs to finish.

    MapRDD is a map-side RDD to serialize partition data, and wait for all parent RDDs to finish. Note that type 'T' must be serializable.

  2. class Partition extends spark.Partition with Serializable

    Permalink

    Partition is a mirror for parent partition of original RDD, and keeps track of partition task, so we can reconstruct partitions on reduce stage.

  3. class ReduceRDD[T] extends RDD[T]

    Permalink

    ReduceRDD reduces each output from MapRDD and returns RDD that has original number of partitions and similar data distribution, meaning it is safe to rely on the same order of data in each partition.

Ungrouped