the RDD being shuffled. Elements of this RDD are (partitionId, Row) pairs. Partition ids should be in the range [0, numPartitions - 1].
the serializer used during the shuffle.
the number of post-shuffle partitions.
the RDD being shuffled.
the RDD being shuffled. Elements of this RDD are (partitionId, Row) pairs. Partition ids should be in the range [0, numPartitions - 1].
(Since version 1.0.0) use mapPartitionsWithIndex and filter
(Since version 1.0.0) use mapPartitionsWithIndex and flatMap
(Since version 1.0.0) use mapPartitionsWithIndex and foreach
(Since version 1.2.0) use TaskContext.get
(Since version 0.7.0) use mapPartitionsWithIndex
(Since version 1.0.0) use mapPartitionsWithIndex
(Since version 1.0.0) use collect
This is a specialized version of org.apache.spark.rdd.ShuffledRDD that is optimized for shuffling rows instead of Java key-value pairs. Note that something like this should eventually be implemented in Spark core, but that is blocked by some more general refactorings to shuffle interfaces / internals.