Class

org.bdgenomics.adam.rdd

LeftOuterShuffleRegionJoinAndGroupByLeft

Related Doc: package rdd

Permalink

case class LeftOuterShuffleRegionJoinAndGroupByLeft[T, U](leftRdd: RDD[(ReferenceRegion, T)], rightRdd: RDD[(ReferenceRegion, U)])(implicit evidence$9: ClassTag[T], evidence$10: ClassTag[U]) extends ShuffleRegionJoin[T, U, T, Iterable[U]] with VictimlessSortedIntervalPartitionJoin[T, U, T, Iterable[U]] with Product with Serializable

Linear Supertypes
Product, Equals, VictimlessSortedIntervalPartitionJoin[T, U, T, Iterable[U]], ShuffleRegionJoin[T, U, T, Iterable[U]], RegionJoin[T, U, T, Iterable[U]], Serializable, Serializable, AnyRef, Any
Ordering
  1. Alphabetic
  2. By Inheritance
Inherited
  1. LeftOuterShuffleRegionJoinAndGroupByLeft
  2. Product
  3. Equals
  4. VictimlessSortedIntervalPartitionJoin
  5. ShuffleRegionJoin
  6. RegionJoin
  7. Serializable
  8. Serializable
  9. AnyRef
  10. Any
  1. Hide All
  2. Show All
Visibility
  1. Public
  2. All

Instance Constructors

  1. new LeftOuterShuffleRegionJoinAndGroupByLeft(leftRdd: RDD[(ReferenceRegion, T)], rightRdd: RDD[(ReferenceRegion, U)])(implicit arg0: ClassTag[T], arg1: ClassTag[U])

    Permalink

Value Members

  1. final def !=(arg0: Any): Boolean

    Permalink
    Definition Classes
    AnyRef → Any
  2. final def ##(): Int

    Permalink
    Definition Classes
    AnyRef → Any
  3. final def ==(arg0: Any): Boolean

    Permalink
    Definition Classes
    AnyRef → Any
  4. def advanceCache(cache: SetTheoryCache[U, T, Iterable[U]], right: BufferedIterator[(ReferenceRegion, U)], until: ReferenceRegion): Unit

    Permalink

    Adds elements from right to cache based on the next region encountered.

    Adds elements from right to cache based on the next region encountered.

    cache

    The cache for this partition.

    right

    The right iterator.

    until

    The next region to join with.

    Attributes
    protected
    Definition Classes
    VictimlessSortedIntervalPartitionJoinShuffleRegionJoin
  5. final def asInstanceOf[T0]: T0

    Permalink
    Definition Classes
    Any
  6. def clone(): AnyRef

    Permalink
    Attributes
    protected[java.lang]
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  7. def compute(): RDD[(T, Iterable[U])]

    Permalink

    Performs a region join between two RDDs (shuffle join).

    Performs a region join between two RDDs (shuffle join). All data should be pre-shuffled and copartitioned.

    returns

    An RDD of joins (x, y), where x is from leftRDD, y is from rightRDD, and the region corresponding to x overlaps the region corresponding to y.

    Definition Classes
    ShuffleRegionJoin
  8. def emptyFn(left: Iterator[(ReferenceRegion, T)], right: Iterator[(ReferenceRegion, U)]): Iterator[(T, Iterable[U])]

    Permalink

    Handles the case where the left or the right iterator were empty.

    Handles the case where the left or the right iterator were empty.

    left

    The left iterator.

    right

    The right iterator.

    returns

    The iterator containing properly formatted tuples.

    Attributes
    protected
    Definition Classes
    LeftOuterShuffleRegionJoinAndGroupByLeftShuffleRegionJoin
  9. final def eq(arg0: AnyRef): Boolean

    Permalink
    Definition Classes
    AnyRef
  10. def finalize(): Unit

    Permalink
    Attributes
    protected[java.lang]
    Definition Classes
    AnyRef
    Annotations
    @throws( classOf[java.lang.Throwable] )
  11. def finalizeHits(cache: SetTheoryCache[U, T, Iterable[U]], right: BufferedIterator[(ReferenceRegion, U)]): Iterable[(T, Iterable[U])]

    Permalink

    Computes all victims for the partition.

    Computes all victims for the partition. NOTE: These are victimless joins so we have no victims.

    cache

    The cache for this partition.

    right

    The right iterator.

    returns

    An empty iterator.

    Attributes
    protected
    Definition Classes
    VictimlessSortedIntervalPartitionJoinShuffleRegionJoin
  12. final def getClass(): Class[_]

    Permalink
    Definition Classes
    AnyRef → Any
  13. final def isInstanceOf[T0]: Boolean

    Permalink
    Definition Classes
    Any
  14. val leftRdd: RDD[(ReferenceRegion, T)]

    Permalink
  15. def makeIterator(leftIter: Iterator[(ReferenceRegion, T)], rightIter: Iterator[(ReferenceRegion, U)]): Iterator[(T, Iterable[U])]

    Permalink
    Attributes
    protected
    Definition Classes
    ShuffleRegionJoin
  16. final def ne(arg0: AnyRef): Boolean

    Permalink
    Definition Classes
    AnyRef
  17. final def notify(): Unit

    Permalink
    Definition Classes
    AnyRef
  18. final def notifyAll(): Unit

    Permalink
    Definition Classes
    AnyRef
  19. def partitionAndJoin(left: RDD[(ReferenceRegion, T)], right: RDD[(ReferenceRegion, U)]): RDD[(T, Iterable[U])]

    Permalink

    Performs a region join between two RDDs.

    Performs a region join between two RDDs.

    returns

    An RDD of pairs (x, y), where x is from baseRDD, y is from joinedRDD, and the region corresponding to x overlaps the region corresponding to y.

    Definition Classes
    ShuffleRegionJoinRegionJoin
  20. def postProcessHits(iter: Iterable[U], currentLeft: T): Iterable[(T, Iterable[U])]

    Permalink

    Computes post processing required to complete the join and properly format hits.

    Computes post processing required to complete the join and properly format hits.

    iter

    The iterator of hits.

    currentLeft

    The current left value.

    returns

    the post processed iterator.

    Attributes
    protected
    Definition Classes
    LeftOuterShuffleRegionJoinAndGroupByLeftShuffleRegionJoin
  21. def processHits(cache: SetTheoryCache[U, T, Iterable[U]], currentLeft: T, currentLeftRegion: ReferenceRegion): Iterable[(T, Iterable[U])]

    Permalink

    Process hits for a given object in left.

    Process hits for a given object in left.

    cache

    The cache containing potential hits.

    currentLeft

    The current object from the left

    currentLeftRegion

    The ReferenceRegion of currentLeft.

    returns

    An iterator containing all hits, formatted by postProcessHits.

    Attributes
    protected
    Definition Classes
    ShuffleRegionJoin
  22. def pruneCache(cache: SetTheoryCache[U, T, Iterable[U]], to: ReferenceRegion): Unit

    Permalink

    Removes elements from cache in place that do not meet the condition for the next region.

    Removes elements from cache in place that do not meet the condition for the next region.

    cache

    The cache for this partition.

    to

    The next region in the left iterator.

    Attributes
    protected
    Definition Classes
    VictimlessSortedIntervalPartitionJoinShuffleRegionJoin
    Note

    At one point these were all variables and we built new collections and reassigned the pointers every time. We fixed this by using trimStart() and ++=() to improve performance. Overall, we see roughly 25% improvement in runtime by doing things this way.

  23. val rightRdd: RDD[(ReferenceRegion, U)]

    Permalink
  24. final def synchronized[T0](arg0: ⇒ T0): T0

    Permalink
    Definition Classes
    AnyRef
  25. final def wait(): Unit

    Permalink
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  26. final def wait(arg0: Long, arg1: Int): Unit

    Permalink
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  27. final def wait(arg0: Long): Unit

    Permalink
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )

Inherited from Product

Inherited from Equals

Inherited from VictimlessSortedIntervalPartitionJoin[T, U, T, Iterable[U]]

Inherited from ShuffleRegionJoin[T, U, T, Iterable[U]]

Inherited from RegionJoin[T, U, T, Iterable[U]]

Inherited from Serializable

Inherited from Serializable

Inherited from AnyRef

Inherited from Any

Ungrouped