org.apache.spark.rdd.mergejoin

MergeJoin

class MergeJoin[K, V, W, O] extends Iterator[O]

:: DeveloperApi ::

Merge-join implementation that will create a spill-able collection for the right-side to be iterated over for each matching key on the left side. This enables joins that don't require that values for any given key to be required to fit in memory, but it *will* try to buffer as many values as possible using Spark's built-in 'ExternalSorter', and it's a private class so that's why this class is packaged here.

There are numerous optimizations in place to try to minimize the work being done in the join:

1) Since the Joiner returns a value Iterator for each key, we don't need to invoke join logic on each value iteration-- instead we perform join logic for each unique key. This also allows each 'Joiner' to optimize via the methods below based on the join being performed.

2) In cases were we have a key on one side but not the other, we skip creation of the spillable collection and write the output tuples directly according to the Joiner's leftOuter/rightOuter method.

3) In cases where there are no values to emit for a particular key, the Joiner can emit an empty Iterator, in which case we will immediately move to the next key without emitting+filtering tuples for those values.

4) In cases where we have a key on both sides, we invoke the Joiner's inner method. The default implementation will create a spill-able collection for the right side that will buffer as many values as possible in memory before spilling to disk... so we only pay the penalty for spilling to disk on keys where it is absolutely necessary.

Annotations
@DeveloperApi()
Linear Supertypes
Ordering
  1. Alphabetic
  2. By inheritance
Inherited
  1. MergeJoin
  2. Iterator
  3. TraversableOnce
  4. GenTraversableOnce
  5. AnyRef
  6. Any
  1. Hide All
  2. Show all
Learn more about member selection
Visibility
  1. Public
  2. All

Instance Constructors

  1. new MergeJoin(context: TaskContext, left: Iterator[(K, V)], right: Iterator[(K, W)], joiner: Joiner[K, V, W, O])(implicit ord: Ordering[K])

    context

    The TaskContext we are executing within

    left

    The left side of the join, pre-ordered by Ordering[K]

    right

    The right side of the join, pre-ordered by Ordering[K]

Type Members

  1. class GroupedIterator[B >: A] extends AbstractIterator[Seq[B]] with Iterator[Seq[B]]

    Definition Classes
    Iterator

Value Members

  1. final def !=(arg0: AnyRef): Boolean

    Definition Classes
    AnyRef
  2. final def !=(arg0: Any): Boolean

    Definition Classes
    Any
  3. final def ##(): Int

    Definition Classes
    AnyRef → Any
  4. def ++[B >: O](that: ⇒ GenTraversableOnce[B]): Iterator[B]

    Definition Classes
    Iterator
  5. def /:[B](z: B)(op: (B, O) ⇒ B): B

    Definition Classes
    TraversableOnce → GenTraversableOnce
  6. def :\[B](z: B)(op: (O, B) ⇒ B): B

    Definition Classes
    TraversableOnce → GenTraversableOnce
  7. final def ==(arg0: AnyRef): Boolean

    Definition Classes
    AnyRef
  8. final def ==(arg0: Any): Boolean

    Definition Classes
    Any
  9. def addString(b: StringBuilder): StringBuilder

    Definition Classes
    TraversableOnce
  10. def addString(b: StringBuilder, sep: String): StringBuilder

    Definition Classes
    TraversableOnce
  11. def addString(b: StringBuilder, start: String, sep: String, end: String): StringBuilder

    Definition Classes
    TraversableOnce
  12. def aggregate[B](z: B)(seqop: (B, O) ⇒ B, combop: (B, B) ⇒ B): B

    Definition Classes
    TraversableOnce → GenTraversableOnce
  13. final def asInstanceOf[T0]: T0

    Definition Classes
    Any
  14. def buffered: BufferedIterator[O]

    Definition Classes
    Iterator
  15. def clone(): AnyRef

    Attributes
    protected[java.lang]
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  16. def collect[B](pf: PartialFunction[O, B]): Iterator[B]

    Definition Classes
    Iterator
    Annotations
    @migration
    Migration

    (Changed in version 2.8.0) collect has changed. The previous behavior can be reproduced with toSeq.

  17. def collectFirst[B](pf: PartialFunction[O, B]): Option[B]

    Definition Classes
    TraversableOnce
  18. def contains(elem: Any): Boolean

    Definition Classes
    Iterator
  19. def copyToArray[B >: O](xs: Array[B], start: Int, len: Int): Unit

    Definition Classes
    Iterator → TraversableOnce → GenTraversableOnce
  20. def copyToArray[B >: O](xs: Array[B]): Unit

    Definition Classes
    TraversableOnce → GenTraversableOnce
  21. def copyToArray[B >: O](xs: Array[B], start: Int): Unit

    Definition Classes
    TraversableOnce → GenTraversableOnce
  22. def copyToBuffer[B >: O](dest: Buffer[B]): Unit

    Definition Classes
    TraversableOnce
  23. def corresponds[B](that: GenTraversableOnce[B])(p: (O, B) ⇒ Boolean): Boolean

    Definition Classes
    Iterator
  24. def count(p: (O) ⇒ Boolean): Int

    Definition Classes
    TraversableOnce → GenTraversableOnce
  25. var currentIterator: Iterator[O]

    Attributes
    protected
  26. def drop(n: Int): Iterator[O]

    Definition Classes
    Iterator
  27. def dropWhile(p: (O) ⇒ Boolean): Iterator[O]

    Definition Classes
    Iterator
  28. def duplicate: (Iterator[O], Iterator[O])

    Definition Classes
    Iterator
  29. final def eq(arg0: AnyRef): Boolean

    Definition Classes
    AnyRef
  30. def equals(arg0: Any): Boolean

    Definition Classes
    AnyRef → Any
  31. def exists(p: (O) ⇒ Boolean): Boolean

    Definition Classes
    Iterator → TraversableOnce → GenTraversableOnce
  32. def filter(p: (O) ⇒ Boolean): Iterator[O]

    Definition Classes
    Iterator
  33. def filterNot(p: (O) ⇒ Boolean): Iterator[O]

    Definition Classes
    Iterator
  34. def finalize(): Unit

    Attributes
    protected[java.lang]
    Definition Classes
    AnyRef
    Annotations
    @throws( classOf[java.lang.Throwable] )
  35. def find(p: (O) ⇒ Boolean): Option[O]

    Definition Classes
    Iterator → TraversableOnce → GenTraversableOnce
  36. def finish: Iterator[O]

    Attributes
    protected
  37. var finished: Boolean

    Attributes
    protected
  38. def flatMap[B](f: (O) ⇒ GenTraversableOnce[B]): Iterator[B]

    Definition Classes
    Iterator
  39. def fold[A1 >: O](z: A1)(op: (A1, A1) ⇒ A1): A1

    Definition Classes
    TraversableOnce → GenTraversableOnce
  40. def foldLeft[B](z: B)(op: (B, O) ⇒ B): B

    Definition Classes
    TraversableOnce → GenTraversableOnce
  41. def foldRight[B](z: B)(op: (O, B) ⇒ B): B

    Definition Classes
    TraversableOnce → GenTraversableOnce
  42. def forall(p: (O) ⇒ Boolean): Boolean

    Definition Classes
    Iterator → TraversableOnce → GenTraversableOnce
  43. def foreach[U](f: (O) ⇒ U): Unit

    Definition Classes
    Iterator → TraversableOnce → GenTraversableOnce
  44. final def getClass(): Class[_]

    Definition Classes
    AnyRef → Any
  45. def grouped[B >: O](size: Int): GroupedIterator[B]

    Definition Classes
    Iterator
  46. def hasDefiniteSize: Boolean

    Definition Classes
    Iterator → TraversableOnce → GenTraversableOnce
  47. def hasNext: Boolean

    Definition Classes
    MergeJoin → Iterator
  48. def hashCode(): Int

    Definition Classes
    AnyRef → Any
  49. def indexOf[B >: O](elem: B): Int

    Definition Classes
    Iterator
  50. def indexWhere(p: (O) ⇒ Boolean): Int

    Definition Classes
    Iterator
  51. def isEmpty: Boolean

    Definition Classes
    Iterator → TraversableOnce → GenTraversableOnce
  52. final def isInstanceOf[T0]: Boolean

    Definition Classes
    Any
  53. def isTraversableAgain: Boolean

    Definition Classes
    Iterator → GenTraversableOnce
  54. var leftRemaining: BufferedIterator[(K, V)]

    Attributes
    protected
  55. def length: Int

    Definition Classes
    Iterator
  56. def map[B](f: (O) ⇒ B): Iterator[B]

    Definition Classes
    Iterator
  57. def max[B >: O](implicit cmp: Ordering[B]): O

    Definition Classes
    TraversableOnce → GenTraversableOnce
  58. def maxBy[B](f: (O) ⇒ B)(implicit cmp: Ordering[B]): O

    Definition Classes
    TraversableOnce → GenTraversableOnce
  59. def min[B >: O](implicit cmp: Ordering[B]): O

    Definition Classes
    TraversableOnce → GenTraversableOnce
  60. def minBy[B](f: (O) ⇒ B)(implicit cmp: Ordering[B]): O

    Definition Classes
    TraversableOnce → GenTraversableOnce
  61. def mkString: String

    Definition Classes
    TraversableOnce → GenTraversableOnce
  62. def mkString(sep: String): String

    Definition Classes
    TraversableOnce → GenTraversableOnce
  63. def mkString(start: String, sep: String, end: String): String

    Definition Classes
    TraversableOnce → GenTraversableOnce
  64. final def ne(arg0: AnyRef): Boolean

    Definition Classes
    AnyRef
  65. def next(): O

    Definition Classes
    MergeJoin → Iterator
  66. def nextIterator(): Iterator[O]

    Attributes
    protected
  67. def nonEmpty: Boolean

    Definition Classes
    TraversableOnce → GenTraversableOnce
  68. final def notify(): Unit

    Definition Classes
    AnyRef
  69. final def notifyAll(): Unit

    Definition Classes
    AnyRef
  70. implicit val ord: Ordering[K]

    Attributes
    protected
  71. def padTo[A1 >: O](len: Int, elem: A1): Iterator[A1]

    Definition Classes
    Iterator
  72. def partition(p: (O) ⇒ Boolean): (Iterator[O], Iterator[O])

    Definition Classes
    Iterator
  73. def patch[B >: O](from: Int, patchElems: Iterator[B], replaced: Int): Iterator[B]

    Definition Classes
    Iterator
  74. def product[B >: O](implicit num: Numeric[B]): B

    Definition Classes
    TraversableOnce → GenTraversableOnce
  75. def reduce[A1 >: O](op: (A1, A1) ⇒ A1): A1

    Definition Classes
    TraversableOnce → GenTraversableOnce
  76. def reduceLeft[B >: O](op: (B, O) ⇒ B): B

    Definition Classes
    TraversableOnce
  77. def reduceLeftOption[B >: O](op: (B, O) ⇒ B): Option[B]

    Definition Classes
    TraversableOnce → GenTraversableOnce
  78. def reduceOption[A1 >: O](op: (A1, A1) ⇒ A1): Option[A1]

    Definition Classes
    TraversableOnce → GenTraversableOnce
  79. def reduceRight[B >: O](op: (O, B) ⇒ B): B

    Definition Classes
    TraversableOnce → GenTraversableOnce
  80. def reduceRightOption[B >: O](op: (O, B) ⇒ B): Option[B]

    Definition Classes
    TraversableOnce → GenTraversableOnce
  81. def reversed: List[O]

    Attributes
    protected[this]
    Definition Classes
    TraversableOnce
  82. var rightRemaining: BufferedIterator[(K, W)]

    Attributes
    protected
  83. def sameElements(that: Iterator[_]): Boolean

    Definition Classes
    Iterator
  84. def scanLeft[B](z: B)(op: (B, O) ⇒ B): Iterator[B]

    Definition Classes
    Iterator
  85. def scanRight[B](z: B)(op: (O, B) ⇒ B): Iterator[B]

    Definition Classes
    Iterator
  86. def seq: Iterator[O]

    Definition Classes
    Iterator → TraversableOnce → GenTraversableOnce
  87. def size: Int

    Definition Classes
    TraversableOnce → GenTraversableOnce
  88. def slice(from: Int, until: Int): Iterator[O]

    Definition Classes
    Iterator
  89. def sliding[B >: O](size: Int, step: Int): GroupedIterator[B]

    Definition Classes
    Iterator
  90. def span(p: (O) ⇒ Boolean): (Iterator[O], Iterator[O])

    Definition Classes
    Iterator
  91. def sum[B >: O](implicit num: Numeric[B]): B

    Definition Classes
    TraversableOnce → GenTraversableOnce
  92. final def synchronized[T0](arg0: ⇒ T0): T0

    Definition Classes
    AnyRef
  93. def take(n: Int): Iterator[O]

    Definition Classes
    Iterator
  94. def takeLeftValuesForKey(key: K): Iterator[(K, V)]

    Attributes
    protected
  95. def takeRightValuesForKey(key: K): Iterator[(K, W)]

    Attributes
    protected
  96. def takeWhile(p: (O) ⇒ Boolean): Iterator[O]

    Definition Classes
    Iterator
  97. def to[Col[_]](implicit cbf: CanBuildFrom[Nothing, O, Col[O]]): Col[O]

    Definition Classes
    TraversableOnce → GenTraversableOnce
  98. def toArray[B >: O](implicit arg0: ClassTag[B]): Array[B]

    Definition Classes
    TraversableOnce → GenTraversableOnce
  99. def toBuffer[B >: O]: Buffer[B]

    Definition Classes
    TraversableOnce → GenTraversableOnce
  100. def toIndexedSeq: IndexedSeq[O]

    Definition Classes
    TraversableOnce → GenTraversableOnce
  101. def toIterable: Iterable[O]

    Definition Classes
    TraversableOnce → GenTraversableOnce
  102. def toIterator: Iterator[O]

    Definition Classes
    Iterator → GenTraversableOnce
  103. def toList: List[O]

    Definition Classes
    TraversableOnce → GenTraversableOnce
  104. def toMap[T, U](implicit ev: <:<[O, (T, U)]): Map[T, U]

    Definition Classes
    TraversableOnce → GenTraversableOnce
  105. def toSeq: Seq[O]

    Definition Classes
    TraversableOnce → GenTraversableOnce
  106. def toSet[B >: O]: Set[B]

    Definition Classes
    TraversableOnce → GenTraversableOnce
  107. def toStream: Stream[O]

    Definition Classes
    Iterator → GenTraversableOnce
  108. def toString(): String

    Definition Classes
    Iterator → AnyRef → Any
  109. def toTraversable: Traversable[O]

    Definition Classes
    Iterator → TraversableOnce → GenTraversableOnce
  110. def toVector: Vector[O]

    Definition Classes
    TraversableOnce → GenTraversableOnce
  111. final def wait(): Unit

    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  112. final def wait(arg0: Long, arg1: Int): Unit

    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  113. final def wait(arg0: Long): Unit

    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  114. def withFilter(p: (O) ⇒ Boolean): Iterator[O]

    Definition Classes
    Iterator
  115. def zip[B](that: Iterator[B]): Iterator[(O, B)]

    Definition Classes
    Iterator
  116. def zipAll[B, A1 >: O, B1 >: B](that: Iterator[B], thisElem: A1, thatElem: B1): Iterator[(A1, B1)]

    Definition Classes
    Iterator
  117. def zipWithIndex: Iterator[(O, Int)]

    Definition Classes
    Iterator

Deprecated Value Members

  1. def /:\[A1 >: O](z: A1)(op: (A1, A1) ⇒ A1): A1

    Definition Classes
    GenTraversableOnce
    Annotations
    @deprecated
    Deprecated

    (Since version 2.10.0) use fold instead

Inherited from Iterator[O]

Inherited from TraversableOnce[O]

Inherited from GenTraversableOnce[O]

Inherited from AnyRef

Inherited from Any

Ungrouped