VictimlessSortedIntervalPartitionJoin

Abstract Value Members

abstract def emptyFn(left: Iterator[(ReferenceRegion, T)], right: Iterator[(ReferenceRegion, U)]): Iterator[(RT, RU)]

Attributes
protected
Definition Classes
ShuffleRegionJoin
abstract val leftRdd: RDD[(ReferenceRegion, T)]

Attributes
protected
Definition Classes
ShuffleRegionJoin
abstract def postProcessHits(iter: Iterable[U], currentLeft: T): Iterable[(RT, RU)]

Attributes
protected
Definition Classes
ShuffleRegionJoin
abstract val rightRdd: RDD[(ReferenceRegion, U)]

Attributes
protected
Definition Classes
ShuffleRegionJoin

Concrete Value Members

final def !=(arg0: Any): Boolean

Definition Classes
AnyRef → Any
final def ##(): Int

Definition Classes
AnyRef → Any
final def ==(arg0: Any): Boolean

Definition Classes
AnyRef → Any
def advanceCache(cache: SetTheoryCache[U, RT, RU], right: BufferedIterator[(ReferenceRegion, U)], until: ReferenceRegion): Unit

Adds elements from right to cache based on the next region encountered.
Adds elements from right to cache based on the next region encountered.
cache
The cache for this partition.
right
The right iterator.
until
The next region to join with.

Attributes
protected
Definition Classes
VictimlessSortedIntervalPartitionJoin → ShuffleRegionJoin
final def asInstanceOf[T0]: T0

Definition Classes
Any
def clone(): AnyRef

Attributes
protected[java.lang]
Definition Classes
AnyRef
Annotations
@throws( ... )
def compute(): RDD[(RT, RU)]

Performs a region join between two RDDs (shuffle join).
Performs a region join between two RDDs (shuffle join). All data should be pre-shuffled and copartitioned.
returns
An RDD of joins (x, y), where x is from leftRDD, y is from rightRDD, and the region corresponding to x overlaps the region corresponding to y.

Definition Classes
ShuffleRegionJoin
final def eq(arg0: AnyRef): Boolean

Definition Classes
AnyRef
def equals(arg0: Any): Boolean

Definition Classes
AnyRef → Any
def finalize(): Unit

Attributes
protected[java.lang]
Definition Classes
AnyRef
Annotations
@throws( classOf[java.lang.Throwable] )
def finalizeHits(cache: SetTheoryCache[U, RT, RU], right: BufferedIterator[(ReferenceRegion, U)]): Iterable[(RT, RU)]

Computes all victims for the partition.
Computes all victims for the partition. NOTE: These are victimless joins so we have no victims.
cache
The cache for this partition.
right
The right iterator.
returns
An empty iterator.

Attributes
protected
Definition Classes
VictimlessSortedIntervalPartitionJoin → ShuffleRegionJoin
final def getClass(): Class[_]

Definition Classes
AnyRef → Any
def hashCode(): Int

Definition Classes
AnyRef → Any
final def isInstanceOf[T0]: Boolean

Definition Classes
Any
def makeIterator(leftIter: Iterator[(ReferenceRegion, T)], rightIter: Iterator[(ReferenceRegion, U)]): Iterator[(RT, RU)]

Attributes
protected
Definition Classes
ShuffleRegionJoin
final def ne(arg0: AnyRef): Boolean

Definition Classes
AnyRef
final def notify(): Unit

Definition Classes
AnyRef
final def notifyAll(): Unit

Definition Classes
AnyRef
def partitionAndJoin(left: RDD[(ReferenceRegion, T)], right: RDD[(ReferenceRegion, U)]): RDD[(RT, RU)]

Performs a region join between two RDDs.
Performs a region join between two RDDs.
returns
An RDD of pairs (x, y), where x is from baseRDD, y is from joinedRDD, and the region corresponding to x overlaps the region corresponding to y.

Definition Classes
ShuffleRegionJoin → RegionJoin
def processHits(cache: SetTheoryCache[U, RT, RU], currentLeft: T, currentLeftRegion: ReferenceRegion): Iterable[(RT, RU)]

Process hits for a given object in left.
Process hits for a given object in left.
cache
The cache containing potential hits.
currentLeft
The current object from the left
currentLeftRegion
The ReferenceRegion of currentLeft.
returns
An iterator containing all hits, formatted by postProcessHits.

Attributes
protected
Definition Classes
ShuffleRegionJoin
def pruneCache(cache: SetTheoryCache[U, RT, RU], to: ReferenceRegion): Unit

Removes elements from cache in place that do not meet the condition for the next region.
Removes elements from cache in place that do not meet the condition for the next region.
cache
The cache for this partition.
to
The next region in the left iterator.

Attributes
protected
Definition Classes
VictimlessSortedIntervalPartitionJoin → ShuffleRegionJoin
Note
At one point these were all variables and we built new collections and reassigned the pointers every time. We fixed this by using trimStart() and ++=() to improve performance. Overall, we see roughly 25% improvement in runtime by doing things this way.
final def synchronized[T0](arg0: ⇒ T0): T0

Definition Classes
AnyRef
def toString(): String

Definition Classes
AnyRef → Any
final def wait(): Unit

Definition Classes
AnyRef
Annotations
@throws( ... )
final def wait(arg0: Long, arg1: Int): Unit

Definition Classes
AnyRef
Annotations
@throws( ... )
final def wait(arg0: Long): Unit

Definition Classes
AnyRef
Annotations
@throws( ... )

Related Doc: package rdd

sealed trait VictimlessSortedIntervalPartitionJoin[T, U, RT, RU] extends ShuffleRegionJoin[T, U, RT, RU]

Abstract Value Members

abstract def emptyFn(left: Iterator[(ReferenceRegion, T)], right: Iterator[(ReferenceRegion, U)]): Iterator[(RT, RU)]

abstract val leftRdd: RDD[(ReferenceRegion, T)]

abstract def postProcessHits(iter: Iterable[U], currentLeft: T): Iterable[(RT, RU)]

abstract val rightRdd: RDD[(ReferenceRegion, U)]

Concrete Value Members

final def !=(arg0: Any): Boolean

final def ##(): Int

final def ==(arg0: Any): Boolean

def advanceCache(cache: SetTheoryCache[U, RT, RU], right: BufferedIterator[(ReferenceRegion, U)], until: ReferenceRegion): Unit

final def asInstanceOf[T0]: T0

def clone(): AnyRef

def compute(): RDD[(RT, RU)]

final def eq(arg0: AnyRef): Boolean

def equals(arg0: Any): Boolean

def finalize(): Unit

def finalizeHits(cache: SetTheoryCache[U, RT, RU], right: BufferedIterator[(ReferenceRegion, U)]): Iterable[(RT, RU)]

final def getClass(): Class[_]

def hashCode(): Int

final def isInstanceOf[T0]: Boolean

def makeIterator(leftIter: Iterator[(ReferenceRegion, T)], rightIter: Iterator[(ReferenceRegion, U)]): Iterator[(RT, RU)]

final def ne(arg0: AnyRef): Boolean

final def notify(): Unit

final def notifyAll(): Unit

def partitionAndJoin(left: RDD[(ReferenceRegion, T)], right: RDD[(ReferenceRegion, U)]): RDD[(RT, RU)]

def processHits(cache: SetTheoryCache[U, RT, RU], currentLeft: T, currentLeftRegion: ReferenceRegion): Iterable[(RT, RU)]

def pruneCache(cache: SetTheoryCache[U, RT, RU], to: ReferenceRegion): Unit

final def synchronized[T0](arg0: ⇒ T0): T0

def toString(): String

final def wait(): Unit

final def wait(arg0: Long, arg1: Int): Unit

final def wait(arg0: Long): Unit

Inherited from ShuffleRegionJoin[T, U, RT, RU]

Inherited from RegionJoin[T, U, RT, RU]

Inherited from Serializable

Inherited from Serializable

Inherited from AnyRef

Inherited from Any

Ungrouped