Wrap an RDD and expose a cappedGroupByKey
method, which behaves like
org.apache.spark.rdd.PairRDDFunctions.groupByKey but with a cap on the number of values that will be accumulated
for each key.
Add splitByKey method to any paired RDD: returns a Map from each key (type K) to an RDD[V] with all the values that had that key in the original RDD (in arbitrary order).