GenomicRDD

Abstract Value Members

abstract def buildTree(rdd: RDD[(ReferenceRegion, T)])(implicit tTag: ClassTag[T]): IntervalArray[ReferenceRegion, T]

Attributes
protected
abstract def getReferenceRegions(elem: T): Seq[ReferenceRegion]

Attributes
protected
abstract val rdd: RDD[T]

The RDD of genomic data that we are wrapping.
abstract def replaceRdd(newRdd: RDD[T]): U

Attributes
protected
abstract val sequences: SequenceDictionary

The sequence dictionary describing the reference assembly this data is aligned to.

Concrete Value Members

final def !=(arg0: AnyRef): Boolean

Definition Classes
AnyRef
final def !=(arg0: Any): Boolean

Definition Classes
Any
final def ##(): Int

Definition Classes
AnyRef → Any
final def ==(arg0: AnyRef): Boolean

Definition Classes
AnyRef
final def ==(arg0: Any): Boolean

Definition Classes
Any
final def asInstanceOf[T0]: T0

Definition Classes
Any
def broadcastRegionJoin[X, Y <: GenomicRDD[X, Y], Z <: GenomicRDD[(T, X), Z]](genomicRdd: GenomicRDD[X, Y])(implicit tTag: ClassTag[T], xTag: ClassTag[X]): GenomicRDD[(T, X), Z]

Performs a broadcast inner join between this RDD and another RDD.
Performs a broadcast inner join between this RDD and another RDD.
In a broadcast join, the left RDD (this RDD) is collected to the driver, and broadcast to all the nodes in the cluster. The key equality function used for this join is the reference region overlap function. Since this is an inner join, all values who do not overlap a value from the other RDD are dropped.
genomicRdd
The right RDD in the join.
returns
Returns a new genomic RDD containing all pairs of keys that overlapped in the genomic coordinate space.
def broadcastRegionJoinAndGroupByRight[X, Y <: GenomicRDD[X, Y], Z <: GenomicRDD[(Iterable[T], X), Z]](genomicRdd: GenomicRDD[X, Y])(implicit tTag: ClassTag[T], xTag: ClassTag[X]): GenomicRDD[(Iterable[T], X), Z]

Performs a broadcast inner join between this RDD and another RDD.
Performs a broadcast inner join between this RDD and another RDD.
In a broadcast join, the left RDD (this RDD) is collected to the driver, and broadcast to all the nodes in the cluster. The key equality function used for this join is the reference region overlap function. Since this is an inner join, all values who do not overlap a value from the other RDD are dropped.
genomicRdd
The right RDD in the join.
returns
Returns a new genomic RDD containing all pairs of keys that overlapped in the genomic coordinate space.
def clone(): AnyRef

Attributes
protected[java.lang]
Definition Classes
AnyRef
Annotations
@throws( ... )
final def eq(arg0: AnyRef): Boolean

Definition Classes
AnyRef
def equals(arg0: Any): Boolean

Definition Classes
AnyRef → Any
def filterByOverlappingRegion(query: ReferenceRegion): U

Runs a filter that selects data in the underlying RDD that overlaps a single genomic region.
Runs a filter that selects data in the underlying RDD that overlaps a single genomic region.
query
The region to query for.
returns
Returns a new GenomicRDD containing only data that overlaps the query region.
def filterByOverlappingRegions(querys: List[ReferenceRegion]): U

Runs a filter that selects data in the underlying RDD that overlaps several genomic regions.
Runs a filter that selects data in the underlying RDD that overlaps several genomic regions.
querys
The regions to query for.
returns
Returns a new GenomicRDD containing only data that overlaps the querys region.
def finalize(): Unit

Attributes
protected[java.lang]
Definition Classes
AnyRef
Annotations
@throws( classOf[java.lang.Throwable] )
def flattenRddByRegions(): RDD[(ReferenceRegion, T)]

Attributes
protected
def fullOuterShuffleRegionJoin[X, Y <: GenomicRDD[X, Y], Z <: GenomicRDD[(Option[T], Option[X]), Z]](genomicRdd: GenomicRDD[X, Y], optPartitions: Option[Int] = None)(implicit tTag: ClassTag[T], xTag: ClassTag[X]): GenomicRDD[(Option[T], Option[X]), Z]

Performs a sort-merge full outer join between this RDD and another RDD.
Performs a sort-merge full outer join between this RDD and another RDD.
In a sort-merge join, both RDDs are co-partitioned and sorted. The partitions are then zipped, and we do a merge join on each partition. The key equality function used for this join is the reference region overlap function. Since this is a full outer join, if a value from either RDD does not overlap any values in the other RDD, it will be paired with a None in the product of the join.
genomicRdd
The right RDD in the join.
returns
Returns a new genomic RDD containing all pairs of keys that overlapped in the genomic coordinate space, and values that did not overlap will be paired with a None.
final def getClass(): Class[_]

Definition Classes
AnyRef → Any
def hashCode(): Int

Definition Classes
AnyRef → Any
final def isInstanceOf[T0]: Boolean

Definition Classes
Any
lazy val jrdd: JavaRDD[T]

The underlying RDD of genomic data, as a JavaRDD.
def leftOuterShuffleRegionJoin[X, Y <: GenomicRDD[X, Y], Z <: GenomicRDD[(T, Option[X]), Z]](genomicRdd: GenomicRDD[X, Y], optPartitions: Option[Int] = None)(implicit tTag: ClassTag[T], xTag: ClassTag[X]): GenomicRDD[(T, Option[X]), Z]

Performs a sort-merge left outer join between this RDD and another RDD.
Performs a sort-merge left outer join between this RDD and another RDD.
In a sort-merge join, both RDDs are co-partitioned and sorted. The partitions are then zipped, and we do a merge join on each partition. The key equality function used for this join is the reference region overlap function. Since this is a left outer join, all values in the right RDD that do not overlap a value from the left RDD are dropped. If a value from the left RDD does not overlap any values in the right RDD, it will be paired with a None in the product of the join.
genomicRdd
The right RDD in the join.
returns
Returns a new genomic RDD containing all pairs of keys that overlapped in the genomic coordinate space, and all keys from the left RDD that did not overlap a key in the right RDD.
final def ne(arg0: AnyRef): Boolean

Definition Classes
AnyRef
final def notify(): Unit

Definition Classes
AnyRef
final def notifyAll(): Unit

Definition Classes
AnyRef
def pipe[X, Y <: GenomicRDD[X, Y], V <: InFormatter[T, U, V]](cmd: String, files: Seq[String] = Seq.empty, environment: Map[String, String] = Map.empty, flankSize: Int = 0)(implicit tFormatterCompanion: InFormatterCompanion[T, U, V], xFormatter: OutFormatter[X], convFn: (U, RDD[X]) ⇒ Y, tManifest: ClassTag[T], xManifest: ClassTag[X]): Y

Pipes genomic data to a subprocess that runs in parallel using Spark.
Pipes genomic data to a subprocess that runs in parallel using Spark.
Files are substituted in to the command with a $x syntax. E.g., to invoke a command that uses the first file from the files Seq, use $0.
Pipes require the presence of an InFormatterCompanion and an OutFormatter as implicit values. The InFormatterCompanion should be a singleton whose apply method builds an InFormatter given a specific type of GenomicRDD. The implicit InFormatterCompanion yields an InFormatter which is used to format the input to the pipe, and the implicit OutFormatter is used to parse the output from the pipe.
X
The type of the record created by the piped command.
Y
A GenomicRDD containing X's.
V
The InFormatter to use for formatting the data being piped to the command.
cmd
Command to run.
files
Files to make locally available to the commands being run. Default is empty.
environment
A map containing environment variable/value pairs to set in the environment for the newly created process. Default is empty.
flankSize
Number of bases to flank each command invocation by.
returns
Returns a new GenomicRDD of type Y.
def rightOuterBroadcastRegionJoin[X, Y <: GenomicRDD[X, Y], Z <: GenomicRDD[(Option[T], X), Z]](genomicRdd: GenomicRDD[X, Y])(implicit tTag: ClassTag[T], xTag: ClassTag[X]): GenomicRDD[(Option[T], X), Z]

Performs a broadcast right outer join between this RDD and another RDD.
Performs a broadcast right outer join between this RDD and another RDD.
In a broadcast join, the left RDD (this RDD) is collected to the driver, and broadcast to all the nodes in the cluster. The key equality function used for this join is the reference region overlap function. Since this is a right outer join, all values in the left RDD that do not overlap a value from the right RDD are dropped. If a value from the right RDD does not overlap any values in the left RDD, it will be paired with a None in the product of the join.
genomicRdd
The right RDD in the join.
returns
Returns a new genomic RDD containing all pairs of keys that overlapped in the genomic coordinate space, and all keys from the right RDD that did not overlap a key in the left RDD.
def rightOuterBroadcastRegionJoinAndGroupByRight[X, Y <: GenomicRDD[X, Y], Z <: GenomicRDD[(Iterable[T], X), Z]](genomicRdd: GenomicRDD[X, Y])(implicit tTag: ClassTag[T], xTag: ClassTag[X]): GenomicRDD[(Iterable[T], X), Z]

Performs a broadcast right outer join between this RDD and another RDD.
Performs a broadcast right outer join between this RDD and another RDD.
In a broadcast join, the left RDD (this RDD) is collected to the driver, and broadcast to all the nodes in the cluster. The key equality function used for this join is the reference region overlap function. Since this is a right outer join, all values in the left RDD that do not overlap a value from the right RDD are dropped. If a value from the right RDD does not overlap any values in the left RDD, it will be paired with a None in the product of the join.
genomicRdd
The right RDD in the join.
returns
Returns a new genomic RDD containing all pairs of keys that overlapped in the genomic coordinate space, and all keys from the right RDD that did not overlap a key in the left RDD.
def rightOuterShuffleRegionJoin[X, Y <: GenomicRDD[X, Y], Z <: GenomicRDD[(Option[T], X), Z]](genomicRdd: GenomicRDD[X, Y], optPartitions: Option[Int] = None)(implicit tTag: ClassTag[T], xTag: ClassTag[X]): GenomicRDD[(Option[T], X), Z]

Performs a sort-merge right outer join between this RDD and another RDD.
Performs a sort-merge right outer join between this RDD and another RDD.
In a sort-merge join, both RDDs are co-partitioned and sorted. The partitions are then zipped, and we do a merge join on each partition. The key equality function used for this join is the reference region overlap function. Since this is a right outer join, all values in the left RDD that do not overlap a value from the right RDD are dropped. If a value from the right RDD does not overlap any values in the left RDD, it will be paired with a None in the product of the join.
genomicRdd
The right RDD in the join.
returns
Returns a new genomic RDD containing all pairs of keys that overlapped in the genomic coordinate space, and all keys from the right RDD that did not overlap a key in the left RDD.
def rightOuterShuffleRegionJoinAndGroupByLeft[X, Y <: GenomicRDD[X, Y], Z <: GenomicRDD[(Option[T], Iterable[X]), Z]](genomicRdd: GenomicRDD[X, Y], optPartitions: Option[Int] = None)(implicit tTag: ClassTag[T], xTag: ClassTag[X]): GenomicRDD[(Option[T], Iterable[X]), Z]

Performs a sort-merge right outer join between this RDD and another RDD, followed by a groupBy on the left value, if not null.
Performs a sort-merge right outer join between this RDD and another RDD, followed by a groupBy on the left value, if not null.
In a sort-merge join, both RDDs are co-partitioned and sorted. The partitions are then zipped, and we do a merge join on each partition. The key equality function used for this join is the reference region overlap function. In the same operation, we group all values by the left item in the RDD. Since this is a right outer join, all values from the right RDD who did not overlap a value from the left RDD are placed into a length-1 Iterable with a None key.
genomicRdd
The right RDD in the join.
returns
Returns a new genomic RDD containing all pairs of keys that overlapped in the genomic coordinate space, grouped together by the value they overlapped in the left RDD, and all values from the right RDD that did not overlap an item in the left RDD.
def shuffleRegionJoin[X, Y <: GenomicRDD[X, Y], Z <: GenomicRDD[(T, X), Z]](genomicRdd: GenomicRDD[X, Y], optPartitions: Option[Int] = None)(implicit tTag: ClassTag[T], xTag: ClassTag[X]): GenomicRDD[(T, X), Z]

Performs a sort-merge inner join between this RDD and another RDD.
Performs a sort-merge inner join between this RDD and another RDD.
In a sort-merge join, both RDDs are co-partitioned and sorted. The partitions are then zipped, and we do a merge join on each partition. The key equality function used for this join is the reference region overlap function. Since this is an inner join, all values who do not overlap a value from the other RDD are dropped.
genomicRdd
The right RDD in the join.
returns
Returns a new genomic RDD containing all pairs of keys that overlapped in the genomic coordinate space.
def shuffleRegionJoinAndGroupByLeft[X, Y <: GenomicRDD[X, Y], Z <: GenomicRDD[(T, Iterable[X]), Z]](genomicRdd: GenomicRDD[X, Y], optPartitions: Option[Int] = None)(implicit tTag: ClassTag[T], xTag: ClassTag[X]): GenomicRDD[(T, Iterable[X]), Z]

Performs a sort-merge inner join between this RDD and another RDD, followed by a groupBy on the left value.
Performs a sort-merge inner join between this RDD and another RDD, followed by a groupBy on the left value.
In a sort-merge join, both RDDs are co-partitioned and sorted. The partitions are then zipped, and we do a merge join on each partition. The key equality function used for this join is the reference region overlap function. Since this is an inner join, all values who do not overlap a value from the other RDD are dropped. In the same operation, we group all values by the left item in the RDD.
genomicRdd
The right RDD in the join.
returns
Returns a new genomic RDD containing all pairs of keys that overlapped in the genomic coordinate space, grouped together by the value they overlapped in the left RDD..
def sort(): U

Sorts our genome aligned data by reference positions, with contigs ordered by index.
Sorts our genome aligned data by reference positions, with contigs ordered by index.
returns
Returns a new RDD containing sorted data.

Note
Does not support data that is unaligned or where objects align to multiple positions.
See also
sortLexicographically
def sortLexicographically(): U

Sorts our genome aligned data by reference positions, with contigs ordered lexicographically.
Sorts our genome aligned data by reference positions, with contigs ordered lexicographically.
returns
Returns a new RDD containing sorted data.

Note
Does not support data that is unaligned or where objects align to multiple positions.
See also
sort
final def synchronized[T0](arg0: ⇒ T0): T0

Definition Classes
AnyRef
def toString(): String

Definition Classes
AnyRef → Any
def transform(tFn: (RDD[T]) ⇒ RDD[T]): U

Applies a function that transforms the underlying RDD into a new RDD.
Applies a function that transforms the underlying RDD into a new RDD.
tFn
A function that transforms the underlying RDD.
returns
A new RDD where the RDD of genomic data has been replaced, but the metadata (sequence dictionary, and etc) is copied without modification.
final def wait(): Unit

Definition Classes
AnyRef
Annotations
@throws( ... )
final def wait(arg0: Long, arg1: Int): Unit

Definition Classes
AnyRef
Annotations
@throws( ... )
final def wait(arg0: Long): Unit

Definition Classes
AnyRef
Annotations
@throws( ... )

trait GenomicRDD[T, U <: GenomicRDD[T, U]] extends AnyRef

Abstract Value Members

abstract def buildTree(rdd: RDD[(ReferenceRegion, T)])(implicit tTag: ClassTag[T]): IntervalArray[ReferenceRegion, T]

abstract def getReferenceRegions(elem: T): Seq[ReferenceRegion]

abstract val rdd: RDD[T]

abstract def replaceRdd(newRdd: RDD[T]): U

abstract val sequences: SequenceDictionary

Concrete Value Members

final def !=(arg0: AnyRef): Boolean

final def !=(arg0: Any): Boolean

final def ##(): Int

final def ==(arg0: AnyRef): Boolean

final def ==(arg0: Any): Boolean

final def asInstanceOf[T0]: T0

def broadcastRegionJoin[X, Y <: GenomicRDD[X, Y], Z <: GenomicRDD[(T, X), Z]](genomicRdd: GenomicRDD[X, Y])(implicit tTag: ClassTag[T], xTag: ClassTag[X]): GenomicRDD[(T, X), Z]

def broadcastRegionJoinAndGroupByRight[X, Y <: GenomicRDD[X, Y], Z <: GenomicRDD[(Iterable[T], X), Z]](genomicRdd: GenomicRDD[X, Y])(implicit tTag: ClassTag[T], xTag: ClassTag[X]): GenomicRDD[(Iterable[T], X), Z]

def clone(): AnyRef

final def eq(arg0: AnyRef): Boolean

def equals(arg0: Any): Boolean

def filterByOverlappingRegion(query: ReferenceRegion): U

def filterByOverlappingRegions(querys: List[ReferenceRegion]): U

def finalize(): Unit

def flattenRddByRegions(): RDD[(ReferenceRegion, T)]

def fullOuterShuffleRegionJoin[X, Y <: GenomicRDD[X, Y], Z <: GenomicRDD[(Option[T], Option[X]), Z]](genomicRdd: GenomicRDD[X, Y], optPartitions: Option[Int] = None)(implicit tTag: ClassTag[T], xTag: ClassTag[X]): GenomicRDD[(Option[T], Option[X]), Z]

final def getClass(): Class[_]

def hashCode(): Int

final def isInstanceOf[T0]: Boolean

lazy val jrdd: JavaRDD[T]

def leftOuterShuffleRegionJoin[X, Y <: GenomicRDD[X, Y], Z <: GenomicRDD[(T, Option[X]), Z]](genomicRdd: GenomicRDD[X, Y], optPartitions: Option[Int] = None)(implicit tTag: ClassTag[T], xTag: ClassTag[X]): GenomicRDD[(T, Option[X]), Z]

final def ne(arg0: AnyRef): Boolean

final def notify(): Unit

final def notifyAll(): Unit

def rightOuterBroadcastRegionJoin[X, Y <: GenomicRDD[X, Y], Z <: GenomicRDD[(Option[T], X), Z]](genomicRdd: GenomicRDD[X, Y])(implicit tTag: ClassTag[T], xTag: ClassTag[X]): GenomicRDD[(Option[T], X), Z]

def rightOuterBroadcastRegionJoinAndGroupByRight[X, Y <: GenomicRDD[X, Y], Z <: GenomicRDD[(Iterable[T], X), Z]](genomicRdd: GenomicRDD[X, Y])(implicit tTag: ClassTag[T], xTag: ClassTag[X]): GenomicRDD[(Iterable[T], X), Z]

def rightOuterShuffleRegionJoin[X, Y <: GenomicRDD[X, Y], Z <: GenomicRDD[(Option[T], X), Z]](genomicRdd: GenomicRDD[X, Y], optPartitions: Option[Int] = None)(implicit tTag: ClassTag[T], xTag: ClassTag[X]): GenomicRDD[(Option[T], X), Z]

def rightOuterShuffleRegionJoinAndGroupByLeft[X, Y <: GenomicRDD[X, Y], Z <: GenomicRDD[(Option[T], Iterable[X]), Z]](genomicRdd: GenomicRDD[X, Y], optPartitions: Option[Int] = None)(implicit tTag: ClassTag[T], xTag: ClassTag[X]): GenomicRDD[(Option[T], Iterable[X]), Z]

def shuffleRegionJoin[X, Y <: GenomicRDD[X, Y], Z <: GenomicRDD[(T, X), Z]](genomicRdd: GenomicRDD[X, Y], optPartitions: Option[Int] = None)(implicit tTag: ClassTag[T], xTag: ClassTag[X]): GenomicRDD[(T, X), Z]

def shuffleRegionJoinAndGroupByLeft[X, Y <: GenomicRDD[X, Y], Z <: GenomicRDD[(T, Iterable[X]), Z]](genomicRdd: GenomicRDD[X, Y], optPartitions: Option[Int] = None)(implicit tTag: ClassTag[T], xTag: ClassTag[X]): GenomicRDD[(T, Iterable[X]), Z]

def sort(): U

def sortLexicographically(): U

final def synchronized[T0](arg0: ⇒ T0): T0

def toString(): String

def transform(tFn: (RDD[T]) ⇒ RDD[T]): U

final def wait(): Unit

final def wait(arg0: Long, arg1: Int): Unit

final def wait(arg0: Long): Unit

Inherited from AnyRef

Inherited from Any

Ungrouped