RichRDD

Instance Constructors

new RichRDD(rdd: RDD[U])(implicit arg0: ClassTag[U])

Value Members

final def !=(arg0: Any): Boolean

Definition Classes
AnyRef → Any
final def ##(): Int

Definition Classes
AnyRef → Any
final def ==(arg0: Any): Boolean

Definition Classes
AnyRef → Any
final def asInstanceOf[T0]: T0

Definition Classes
Any
def clone(): AnyRef

Attributes
protected[java.lang]
Definition Classes
AnyRef
Annotations
@throws( ... )
def cogroupBy[K, V](rdd2: RDD[V])(key1: (U) ⇒ K, key2: (V) ⇒ K)(implicit arg0: ClassTag[K], arg1: ClassTag[V]): RDD[(K, (Iterable[U], Iterable[V]))]

A more general method to do cogroup.
A more general method to do cogroup. Allows you to specify the transformations on the RDDs that will produce the join keys
rdd2
rdd to join with
key1
transform function to generate key for first rdd
key2
transform function to generate key for second rdd
returns
rdd1 cogroup rdd2
final def eq(arg0: AnyRef): Boolean

Definition Classes
AnyRef
def equals(arg0: Any): Boolean

Definition Classes
AnyRef → Any
def finalize(): Unit

Attributes
protected[java.lang]
Definition Classes
AnyRef
Annotations
@throws( classOf[java.lang.Throwable] )
final def getClass(): Class[_]

Definition Classes
AnyRef → Any
def groupAndCount[K](keyFn: (U) ⇒ K)(implicit arg0: ClassTag[K]): RDD[(K, Long)]

An efficient implementation of count by key
An efficient implementation of count by key
keyFn
function of which to group by
returns
rdd of keys and their counts
def groupAndReduce[K, V](keyFn: (U) ⇒ K, valueFn: (U) ⇒ V, reduceFn: (V, V) ⇒ V)(implicit arg0: ClassTag[K], arg1: ClassTag[V]): RDD[(K, V)]

A more efficient implementation of groupBy and then reduceByKey
A more efficient implementation of groupBy and then reduceByKey
keyFn
function of which to group by
valueFn
function of which to generate the value from
reduceFn
combiner function
returns
rdd of keys and the reduced values
def groupAndSum[K, V](keyFn: (U) ⇒ K, valueFn: (U) ⇒ V)(implicit ctk: ClassTag[K], ctv: ClassTag[V], sg: Semigroup[V]): RDD[(K, V)]

An efficient implementation of sum by key
An efficient implementation of sum by key
keyFn
function of which to group by
valueFn
function of which to generate the value from
returns
rdd of keys their sums
def hashCode(): Int

Definition Classes
AnyRef → Any
final def isInstanceOf[T0]: Boolean

Definition Classes
Any
def joinBy[K, V](rdd2: RDD[V])(key1: (U) ⇒ K, key2: (V) ⇒ K)(implicit arg0: ClassTag[K], arg1: ClassTag[V]): RDD[(K, (U, V))]

A more general method to do left outer joins.
A more general method to do left outer joins. Allows you to specify the transformations on the RDDs that will produce the join keys
rdd2
rdd to join with
key1
transform function to generate key for first rdd
key2
transform function to generate key for second rdd
returns
rdd1 join rdd2
def leftJoinBy[K, V](rdd2: RDD[V])(key1: (U) ⇒ K, key2: (V) ⇒ K)(implicit arg0: ClassTag[K], arg1: ClassTag[V]): RDD[(K, (U, Option[V]))]

A more general method to do left joins.
A more general method to do left joins. Allows you to specify the transformations on the RDDs that will produce the join keys
rdd2
rdd to join with
key1
transform function to generate key for first rdd
key2
transform function to generate key for second rdd
returns
rdd1 leftOuterJoin rdd2
def mapWithCount[T](f: (U) ⇒ T, counterName: String)(implicit arg0: ClassTag[T]): RDD[T]

Map over the RDD with an long accumulator
Map over the RDD with an long accumulator
f
map function
counterName
accumulator name
returns
rdd.map(f)
final def ne(arg0: AnyRef): Boolean

Definition Classes
AnyRef
final def notify(): Unit

Definition Classes
AnyRef
final def notifyAll(): Unit

Definition Classes
AnyRef
def outerJoinBy[K, V](rdd2: RDD[V])(key1: (U) ⇒ K, key2: (V) ⇒ K)(implicit arg0: ClassTag[K], arg1: ClassTag[V]): RDD[(K, (Option[U], Option[V]))]

A more general method to do full outer joins.
A more general method to do full outer joins. Allows you to specify the transformations on the RDDs that will produce the join keys
rdd2
rdd to join with
key1
transform function to generate key for first rdd
key2
transform function to generate key for second rdd
returns
rdd1 fullOuterJoin rdd2
def repartitionByRecords(recordsPerPartition: Int): RDD[U]

Repartition to the min of floor(rdd.count / recordsPerPartition) + 1 and the previous number of partitions Attention: involves the execution of rdd.count() operation
Repartition to the min of floor(rdd.count / recordsPerPartition) + 1 and the previous number of partitions Attention: involves the execution of rdd.count() operation
recordsPerPartition
number of records per partition
returns
rdd with same number of partitions or floor(rdd.count / recordsPerPartition) + 1 partitions, whichever is less
def repartitionToMaxPartitions(maxPartitionCount: Int = 200): RDD[U]

Repartition to the min of maxPartitionCount and the previous number of partitions.
Repartition to the min of maxPartitionCount and the previous number of partitions. Usually you would want maxPartitionCount to be 200 to get around the 200 bug
maxPartitionCount
max number of partitions (default 200)
returns
shuffled rdd with same number of partitions or maxPartitionCount, whichever is less
def saveAsTextFile(path: String, codec: Option[Class[_ <: CompressionCodec]], jobConf: JobConf): Unit

Output the RDD to any Hadoop-supported file system, using a Hadoop OutputFormat class supporting the key and value types K and V in this RDD.
Output the RDD to any Hadoop-supported file system, using a Hadoop OutputFormat class supporting the key and value types K and V in this RDD.
path
path to write the data
codec
optional codec to use
jobConf
optional hadoop configuration

Note
We should make sure our tasks are idempotent when speculation is enabled, i.e. do not use output committer that writes data directly. There is an example in https://issues.apache.org/jira/browse/SPARK-10063 to show the bad result of using direct output committer with speculation enabled.
def split(p: (U) ⇒ Boolean): (RDD[U], RDD[U])

Splits this RDD in two RDDs according to a predicate.
Splits this RDD in two RDDs according to a predicate.
p
the predicate on which to split by.
returns
a pair of RDDs: the RDD that satisfies the predicate p and the RDD that does not.
final def synchronized[T0](arg0: ⇒ T0): T0

Definition Classes
AnyRef
def toString(): String

Definition Classes
AnyRef → Any
final def wait(): Unit

Definition Classes
AnyRef
Annotations
@throws( ... )
final def wait(arg0: Long, arg1: Int): Unit

Definition Classes
AnyRef
Annotations
@throws( ... )
final def wait(arg0: Long): Unit

Definition Classes
AnyRef
Annotations
@throws( ... )

Related Doc: package RichRDD

implicit class RichRDD[U] extends AnyRef

Instance Constructors

new RichRDD(rdd: RDD[U])(implicit arg0: ClassTag[U])

Value Members

final def !=(arg0: Any): Boolean

final def ##(): Int

final def ==(arg0: Any): Boolean

final def asInstanceOf[T0]: T0

def clone(): AnyRef

def cogroupBy[K, V](rdd2: RDD[V])(key1: (U) ⇒ K, key2: (V) ⇒ K)(implicit arg0: ClassTag[K], arg1: ClassTag[V]): RDD[(K, (Iterable[U], Iterable[V]))]

final def eq(arg0: AnyRef): Boolean

def equals(arg0: Any): Boolean

def finalize(): Unit

final def getClass(): Class[_]

def groupAndCount[K](keyFn: (U) ⇒ K)(implicit arg0: ClassTag[K]): RDD[(K, Long)]

def groupAndReduce[K, V](keyFn: (U) ⇒ K, valueFn: (U) ⇒ V, reduceFn: (V, V) ⇒ V)(implicit arg0: ClassTag[K], arg1: ClassTag[V]): RDD[(K, V)]

def groupAndSum[K, V](keyFn: (U) ⇒ K, valueFn: (U) ⇒ V)(implicit ctk: ClassTag[K], ctv: ClassTag[V], sg: Semigroup[V]): RDD[(K, V)]

def hashCode(): Int

final def isInstanceOf[T0]: Boolean

def joinBy[K, V](rdd2: RDD[V])(key1: (U) ⇒ K, key2: (V) ⇒ K)(implicit arg0: ClassTag[K], arg1: ClassTag[V]): RDD[(K, (U, V))]

def leftJoinBy[K, V](rdd2: RDD[V])(key1: (U) ⇒ K, key2: (V) ⇒ K)(implicit arg0: ClassTag[K], arg1: ClassTag[V]): RDD[(K, (U, Option[V]))]

def mapWithCount[T](f: (U) ⇒ T, counterName: String)(implicit arg0: ClassTag[T]): RDD[T]

final def ne(arg0: AnyRef): Boolean

final def notify(): Unit

final def notifyAll(): Unit

def outerJoinBy[K, V](rdd2: RDD[V])(key1: (U) ⇒ K, key2: (V) ⇒ K)(implicit arg0: ClassTag[K], arg1: ClassTag[V]): RDD[(K, (Option[U], Option[V]))]

def repartitionByRecords(recordsPerPartition: Int): RDD[U]

def repartitionToMaxPartitions(maxPartitionCount: Int = 200): RDD[U]

def saveAsTextFile(path: String, codec: Option[Class[_ <: CompressionCodec]], jobConf: JobConf): Unit

def split(p: (U) ⇒ Boolean): (RDD[U], RDD[U])

final def synchronized[T0](arg0: ⇒ T0): T0

def toString(): String

final def wait(): Unit

final def wait(arg0: Long, arg1: Int): Unit

final def wait(arg0: Long): Unit

Inherited from AnyRef

Inherited from Any

Ungrouped