Package

org.apache.spark

rdd

Permalink

package rdd

Linear Supertypes
Ordering
  1. Alphabetic
  2. By Inheritance
Inherited
  1. rdd
  2. AnyRef
  3. Any
  1. Hide All
  2. Show All
Visibility
  1. Public
  2. All

Type Members

  1. sealed class FilteredCartesianRDD[T, U, V] extends RDD[(T, U)] with Serializable

    Permalink

    Performs a cartesian join of two RDDs using filter and refine pattern.

    Performs a cartesian join of two RDDs using filter and refine pattern.

    During RDD declaration n*m partitions will be generated, one for each possible cartesian mapping. During RDD execution summary functions will be applied in a map-side reduce to rrd1 and rdd2. These results will be collected and filtered using metapred for partitions with potential matches. Partition pairings with possible matches will be checked using pred in a refinement step.

    No shuffle from rdd1 or rdd2 will be performed by the filter step, but the records of metardds, produced using the summary functions, will be shuffled (as they must be). The metardds contain one item per partition (ex: a "bounding box" of records in parent rdd), so it is assumed that this shuffle will be low cost.

    For efficient execution it is assumed that potential matches exist for limited number of cartesian pairings, if no filtering is possible worst case scenario is full cartesian product.

Inherited from AnyRef

Inherited from Any

Ungrouped