Adds elements from right to cache based on the next region encountered.
Adds elements from right to cache based on the next region encountered.
The cache for this partition.
The right iterator.
The next region to join with.
Performs a region join between two RDDs (shuffle join).
Performs a region join between two RDDs (shuffle join). All data should be pre-shuffled and copartitioned.
An RDD of joins (x, y), where x is from leftRDD, y is from rightRDD, and the region corresponding to x overlaps the region corresponding to y.
Handles the case where the left or the right iterator were empty.
Handles the case where the left or the right iterator were empty.
The left iterator.
The right iterator.
The iterator containing properly formatted tuples.
Computes all victims for the partition.
Computes all victims for the partition. NOTE: These are victimless joins so we have no victims.
The cache for this partition.
The right iterator.
An empty iterator.
Performs a region join between two RDDs.
Performs a region join between two RDDs.
An RDD of pairs (x, y), where x is from baseRDD, y is from joinedRDD, and the region corresponding to x overlaps the region corresponding to y.
Computes post processing required to complete the join and properly format hits.
Computes post processing required to complete the join and properly format hits.
The iterator of hits.
The current left value.
the post processed iterator.
Process hits for a given object in left.
Process hits for a given object in left.
The cache containing potential hits.
The current object from the left
The ReferenceRegion of currentLeft.
An iterator containing all hits, formatted by postProcessHits.
Removes elements from cache in place that do not meet the condition for the next region.
Removes elements from cache in place that do not meet the condition for the next region.
The cache for this partition.
The next region in the left iterator.
At one point these were all variables and we built new collections and reassigned the pointers every time. We fixed this by using trimStart() and ++=() to improve performance. Overall, we see roughly 25% improvement in runtime by doing things this way.