Class

com.salesforce.op.utils.spark.RichRDD

RichRDD

Related Doc: package RichRDD

Permalink

implicit class RichRDD[U] extends AnyRef

An enhanced RDD with more general join methods.

Linear Supertypes
AnyRef, Any
Ordering
  1. Alphabetic
  2. By Inheritance
Inherited
  1. RichRDD
  2. AnyRef
  3. Any
  1. Hide All
  2. Show All
Visibility
  1. Public
  2. All

Instance Constructors

  1. new RichRDD(rdd: RDD[U])(implicit arg0: ClassTag[U])

    Permalink

Value Members

  1. final def !=(arg0: Any): Boolean

    Permalink
    Definition Classes
    AnyRef → Any
  2. final def ##(): Int

    Permalink
    Definition Classes
    AnyRef → Any
  3. final def ==(arg0: Any): Boolean

    Permalink
    Definition Classes
    AnyRef → Any
  4. final def asInstanceOf[T0]: T0

    Permalink
    Definition Classes
    Any
  5. def clone(): AnyRef

    Permalink
    Attributes
    protected[java.lang]
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  6. def cogroupBy[K, V](rdd2: RDD[V])(key1: (U) ⇒ K, key2: (V) ⇒ K)(implicit arg0: ClassTag[K], arg1: ClassTag[V]): RDD[(K, (Iterable[U], Iterable[V]))]

    Permalink

    A more general method to do cogroup.

    A more general method to do cogroup. Allows you to specify the transformations on the RDDs that will produce the join keys

    rdd2

    rdd to join with

    key1

    transform function to generate key for first rdd

    key2

    transform function to generate key for second rdd

    returns

    rdd1 cogroup rdd2

  7. final def eq(arg0: AnyRef): Boolean

    Permalink
    Definition Classes
    AnyRef
  8. def equals(arg0: Any): Boolean

    Permalink
    Definition Classes
    AnyRef → Any
  9. def finalize(): Unit

    Permalink
    Attributes
    protected[java.lang]
    Definition Classes
    AnyRef
    Annotations
    @throws( classOf[java.lang.Throwable] )
  10. final def getClass(): Class[_]

    Permalink
    Definition Classes
    AnyRef → Any
  11. def groupAndCount[K](keyFn: (U) ⇒ K)(implicit arg0: ClassTag[K]): RDD[(K, Long)]

    Permalink

    An efficient implementation of count by key

    An efficient implementation of count by key

    keyFn

    function of which to group by

    returns

    rdd of keys and their counts

  12. def groupAndReduce[K, V](keyFn: (U) ⇒ K, valueFn: (U) ⇒ V, reduceFn: (V, V) ⇒ V)(implicit arg0: ClassTag[K], arg1: ClassTag[V]): RDD[(K, V)]

    Permalink

    A more efficient implementation of groupBy and then reduceByKey

    A more efficient implementation of groupBy and then reduceByKey

    keyFn

    function of which to group by

    valueFn

    function of which to generate the value from

    reduceFn

    combiner function

    returns

    rdd of keys and the reduced values

  13. def groupAndSum[K, V](keyFn: (U) ⇒ K, valueFn: (U) ⇒ V)(implicit ctk: ClassTag[K], ctv: ClassTag[V], sg: Semigroup[V]): RDD[(K, V)]

    Permalink

    An efficient implementation of sum by key

    An efficient implementation of sum by key

    keyFn

    function of which to group by

    valueFn

    function of which to generate the value from

    returns

    rdd of keys their sums

  14. def hashCode(): Int

    Permalink
    Definition Classes
    AnyRef → Any
  15. final def isInstanceOf[T0]: Boolean

    Permalink
    Definition Classes
    Any
  16. def joinBy[K, V](rdd2: RDD[V])(key1: (U) ⇒ K, key2: (V) ⇒ K)(implicit arg0: ClassTag[K], arg1: ClassTag[V]): RDD[(K, (U, V))]

    Permalink

    A more general method to do left outer joins.

    A more general method to do left outer joins. Allows you to specify the transformations on the RDDs that will produce the join keys

    rdd2

    rdd to join with

    key1

    transform function to generate key for first rdd

    key2

    transform function to generate key for second rdd

    returns

    rdd1 join rdd2

  17. def leftJoinBy[K, V](rdd2: RDD[V])(key1: (U) ⇒ K, key2: (V) ⇒ K)(implicit arg0: ClassTag[K], arg1: ClassTag[V]): RDD[(K, (U, Option[V]))]

    Permalink

    A more general method to do left joins.

    A more general method to do left joins. Allows you to specify the transformations on the RDDs that will produce the join keys

    rdd2

    rdd to join with

    key1

    transform function to generate key for first rdd

    key2

    transform function to generate key for second rdd

    returns

    rdd1 leftOuterJoin rdd2

  18. def mapWithCount[T](f: (U) ⇒ T, counterName: String)(implicit arg0: ClassTag[T]): RDD[T]

    Permalink

    Map over the RDD with an long accumulator

    Map over the RDD with an long accumulator

    f

    map function

    counterName

    accumulator name

    returns

    rdd.map(f)

  19. final def ne(arg0: AnyRef): Boolean

    Permalink
    Definition Classes
    AnyRef
  20. final def notify(): Unit

    Permalink
    Definition Classes
    AnyRef
  21. final def notifyAll(): Unit

    Permalink
    Definition Classes
    AnyRef
  22. def outerJoinBy[K, V](rdd2: RDD[V])(key1: (U) ⇒ K, key2: (V) ⇒ K)(implicit arg0: ClassTag[K], arg1: ClassTag[V]): RDD[(K, (Option[U], Option[V]))]

    Permalink

    A more general method to do full outer joins.

    A more general method to do full outer joins. Allows you to specify the transformations on the RDDs that will produce the join keys

    rdd2

    rdd to join with

    key1

    transform function to generate key for first rdd

    key2

    transform function to generate key for second rdd

    returns

    rdd1 fullOuterJoin rdd2

  23. def repartitionByRecords(recordsPerPartition: Int): RDD[U]

    Permalink

    Repartition to the min of floor(rdd.count / recordsPerPartition) + 1 and the previous number of partitions Attention: involves the execution of rdd.count() operation

    Repartition to the min of floor(rdd.count / recordsPerPartition) + 1 and the previous number of partitions Attention: involves the execution of rdd.count() operation

    recordsPerPartition

    number of records per partition

    returns

    rdd with same number of partitions or floor(rdd.count / recordsPerPartition) + 1 partitions, whichever is less

  24. def repartitionToMaxPartitions(maxPartitionCount: Int = 200): RDD[U]

    Permalink

    Repartition to the min of maxPartitionCount and the previous number of partitions.

    Repartition to the min of maxPartitionCount and the previous number of partitions. Usually you would want maxPartitionCount to be 200 to get around the 200 bug

    maxPartitionCount

    max number of partitions (default 200)

    returns

    shuffled rdd with same number of partitions or maxPartitionCount, whichever is less

  25. def saveAsTextFile(path: String, codec: Option[Class[_ <: CompressionCodec]], jobConf: JobConf): Unit

    Permalink

    Output the RDD to any Hadoop-supported file system, using a Hadoop OutputFormat class supporting the key and value types K and V in this RDD.

    Output the RDD to any Hadoop-supported file system, using a Hadoop OutputFormat class supporting the key and value types K and V in this RDD.

    path

    path to write the data

    codec

    optional codec to use

    jobConf

    optional hadoop configuration

    Note

    We should make sure our tasks are idempotent when speculation is enabled, i.e. do not use output committer that writes data directly. There is an example in https://issues.apache.org/jira/browse/SPARK-10063 to show the bad result of using direct output committer with speculation enabled.

  26. def split(p: (U) ⇒ Boolean): (RDD[U], RDD[U])

    Permalink

    Splits this RDD in two RDDs according to a predicate.

    Splits this RDD in two RDDs according to a predicate.

    p

    the predicate on which to split by.

    returns

    a pair of RDDs: the RDD that satisfies the predicate p and the RDD that does not.

  27. final def synchronized[T0](arg0: ⇒ T0): T0

    Permalink
    Definition Classes
    AnyRef
  28. def toString(): String

    Permalink
    Definition Classes
    AnyRef → Any
  29. final def wait(): Unit

    Permalink
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  30. final def wait(arg0: Long, arg1: Int): Unit

    Permalink
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  31. final def wait(arg0: Long): Unit

    Permalink
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )

Inherited from AnyRef

Inherited from Any

Ungrouped