Adds elements from right to cache based on the next region encountered.
Adds elements from right to cache based on the next region encountered.
The cache for this partition.
The right iterator.
The next region to join with.
Performs a region join between two RDDs (shuffle join).
Performs a region join between two RDDs (shuffle join). All data should be pre-shuffled and copartitioned.
An RDD of joins (x, y), where x is from leftRDD, y is from rightRDD, and the region corresponding to x overlaps the region corresponding to y.
Computes all victims for the partition.
Computes all victims for the partition. NOTE: These are victimless joins so we have no victims.
The cache for this partition.
The right iterator.
An empty iterator.
Performs a region join between two RDDs.
Performs a region join between two RDDs.
An RDD of pairs (x, y), where x is from baseRDD, y is from joinedRDD, and the region corresponding to x overlaps the region corresponding to y.
Process hits for a given object in left.
Process hits for a given object in left.
The cache containing potential hits.
The current object from the left
The ReferenceRegion of currentLeft.
An iterator containing all hits, formatted by postProcessHits.
Removes elements from cache in place that do not meet the condition for the next region.
Removes elements from cache in place that do not meet the condition for the next region.
The cache for this partition.
The next region in the left iterator.
At one point these were all variables and we built new collections and reassigned the pointers every time. We fixed this by using trimStart() and ++=() to improve performance. Overall, we see roughly 25% improvement in runtime by doing things this way.