Class/Object

org.bdgenomics.adam.rdd

ADAMContext

Related Docs: object ADAMContext | package rdd

Permalink

class ADAMContext extends Serializable with Logging

The ADAMContext provides functions on top of a SparkContext for loading genomic data.

Linear Supertypes
Logging, Serializable, Serializable, AnyRef, Any
Ordering
  1. Alphabetic
  2. By Inheritance
Inherited
  1. ADAMContext
  2. Logging
  3. Serializable
  4. Serializable
  5. AnyRef
  6. Any
  1. Hide All
  2. Show All
Visibility
  1. Public
  2. All

Instance Constructors

  1. new ADAMContext(sc: SparkContext)

    Permalink

    sc

    The SparkContext to wrap.

Value Members

  1. final def !=(arg0: Any): Boolean

    Permalink
    Definition Classes
    AnyRef → Any
  2. final def ##(): Int

    Permalink
    Definition Classes
    AnyRef → Any
  3. final def ==(arg0: Any): Boolean

    Permalink
    Definition Classes
    AnyRef → Any
  4. final def asInstanceOf[T0]: T0

    Permalink
    Definition Classes
    Any
  5. def clone(): AnyRef

    Permalink
    Attributes
    protected[java.lang]
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  6. final def eq(arg0: AnyRef): Boolean

    Permalink
    Definition Classes
    AnyRef
  7. def equals(arg0: Any): Boolean

    Permalink
    Definition Classes
    AnyRef → Any
  8. def finalize(): Unit

    Permalink
    Attributes
    protected[java.lang]
    Definition Classes
    AnyRef
    Annotations
    @throws( classOf[java.lang.Throwable] )
  9. final def getClass(): Class[_]

    Permalink
    Definition Classes
    AnyRef → Any
  10. def hashCode(): Int

    Permalink
    Definition Classes
    AnyRef → Any
  11. final def isInstanceOf[T0]: Boolean

    Permalink
    Definition Classes
    Any
  12. def isTraceEnabled(): Boolean

    Permalink
    Attributes
    protected
    Definition Classes
    Logging
  13. def loadAlignments(filePath: String, projection: Option[Schema] = None, filePath2Opt: Option[String] = None, recordGroupOpt: Option[String] = None, stringency: ValidationStringency = ValidationStringency.STRICT): AlignmentRecordRDD

    Permalink

    Loads alignments from a given path, and infers the input type.

    Loads alignments from a given path, and infers the input type.

    This method can load:

    * AlignmentRecords via Parquet (default) * SAM/BAM/CRAM (.sam, .bam, .cram) * FASTQ (interleaved, single end, paired end) (.ifq, .fq/.fastq) * FASTA (.fa, .fasta) * NucleotideContigFragments via Parquet (.contig.adam)

    As hinted above, the input type is inferred from the file path extension.

    filePath

    Path to load data from.

    projection

    The fields to project; ignored if not Parquet.

    filePath2Opt

    The path to load a second end of FASTQ data from. Ignored if not FASTQ.

    recordGroupOpt

    Optional record group name to set if loading FASTQ.

    stringency

    Validation stringency used on FASTQ import/merging.

    returns

    Returns an AlignmentRecordRDD which wraps the RDD of reads, sequence dictionary representing the contigs these reads are aligned to if the reads are aligned, and the record group dictionary for the reads if one is available.

    See also

    loadFasta

    loadFastq

    loadInterleavedFastq

    loadParquetAlignments

    loadBam

  14. def loadBam(filePath: String, validationStringency: ValidationStringency = ValidationStringency.STRICT): AlignmentRecordRDD

    Permalink

    Loads a SAM/BAM file.

    Loads a SAM/BAM file.

    This reads the sequence and record group dictionaries from the SAM/BAM file header. SAMRecords are read from the file and converted to the AlignmentRecord schema.

    filePath

    Path to the file on disk.

    returns

    Returns an AlignmentRecordRDD which wraps the RDD of reads, sequence dictionary representing the contigs these reads are aligned to if the reads are aligned, and the record group dictionary for the reads if one is available.

    See also

    loadAlignments

  15. def loadBed(filePath: String, minPartitions: Option[Int] = None, stringency: ValidationStringency = ValidationStringency.LENIENT): FeatureRDD

    Permalink

    Loads features stored in BED6/12 format.

    Loads features stored in BED6/12 format.

    filePath

    The path to the file to load.

    minPartitions

    An optional minimum number of partitions to load. If not set, falls back to the configured Spark default parallelism.

    stringency

    Optional stringency to pass. LENIENT stringency will warn when a malformed line is encountered, SILENT will ignore the malformed line, STRICT will throw an exception.

    returns

    Returns a FeatureRDD.

  16. def loadCoverage(filePath: String): CoverageRDD

    Permalink

    Loads file of Features to a CoverageRDD.

    Loads file of Features to a CoverageRDD. Coverage is stored in the score attribute of Feature.

    filePath

    File path to load coverage from.

    returns

    CoverageRDD containing an RDD of Coverage

  17. def loadFasta(filePath: String, fragmentLength: Long): NucleotideContigFragmentRDD

    Permalink

    Loads a FASTA file.

    Loads a FASTA file.

    filePath

    The path to load from.

    fragmentLength

    The length to split contigs into. This sets the parallelism achievable.

    returns

    Returns a NucleotideContigFragmentRDD containing the contigs.

  18. def loadFastq(filePath1: String, filePath2Opt: Option[String], recordGroupOpt: Option[String] = None, stringency: ValidationStringency = ValidationStringency.STRICT): AlignmentRecordRDD

    Permalink

    Loads (possibly paired) FASTQ data.

    Loads (possibly paired) FASTQ data.

    filePath1

    The path where the first set of reads are.

    filePath2Opt

    The path where the second set of reads are, if provided.

    recordGroupOpt

    The optional record group name to associate to the reads.

    stringency

    The validation stringency to use when validating the reads.

    returns

    Returns the reads as an unaligned AlignmentRecordRDD.

    See also

    loadUnpairedFastq

    loadPairedFastq

  19. def loadFeatures(filePath: String, projection: Option[Schema] = None, minPartitions: Option[Int] = None): FeatureRDD

    Permalink

    Loads Features from a file, autodetecting the file type.

    Loads Features from a file, autodetecting the file type.

    Loads files ending in .bed as BED6/12, .gff3 as GFF3, .gtf/.gff as GTF/GFF2, .narrow[pP]eak as NarrowPeak, and .interval_list as IntervalList. If none of these match, we fall back to Parquet.

    filePath

    The path to the file to load.

    projection

    An optional projection to push down.

    minPartitions

    An optional minimum number of partitions to use. For textual formats, if this is None, we fall back to the Spark default parallelism.

    returns

    Returns a FeatureRDD.

    See also

    loadParquetFeatures

    loadIntervalList

    loadNarrowPeak

    loadGff3

    loadGtf

    loadBed

  20. def loadFragments(filePath: String): FragmentRDD

    Permalink

    Auto-detects the file type and loads a FragmentRDD.

    Auto-detects the file type and loads a FragmentRDD.

    This method can load:

    * Fragments via Parquet (default) * SAM/BAM/CRAM (.sam, .bam, .cram) * FASTQ (interleaved only --> .ifq) * Autodetects AlignmentRecord as Parquet with .reads.adam extension.

    filePath

    Path to load data from.

    returns

    Returns the loaded data as a FragmentRDD.

  21. def loadGenotypes(filePath: String, projection: Option[Schema] = None): GenotypeRDD

    Permalink

    Auto-detects the file type and loads a GenotypeRDD.

    Auto-detects the file type and loads a GenotypeRDD.

    If the file has a .vcf/.vcf.gz/.vcf.bgzf/.vcf.bgz extension, loads as VCF. Else, falls back to Parquet.

    filePath

    The path to load.

    projection

    An optional subset of fields to load.

    returns

    Returns a GenotypeRDD.

    See also

    loadParquetGenotypes

    loadVcf

  22. def loadGff3(filePath: String, minPartitions: Option[Int] = None, stringency: ValidationStringency = ValidationStringency.LENIENT): FeatureRDD

    Permalink

    Loads features stored in GFF3 format.

    Loads features stored in GFF3 format.

    filePath

    The path to the file to load.

    minPartitions

    An optional minimum number of partitions to load. If not set, falls back to the configured Spark default parallelism.

    stringency

    Optional stringency to pass. LENIENT stringency will warn when a malformed line is encountered, SILENT will ignore the malformed line, STRICT will throw an exception.

    returns

    Returns a FeatureRDD.

  23. def loadGtf(filePath: String, minPartitions: Option[Int] = None, stringency: ValidationStringency = ValidationStringency.LENIENT): FeatureRDD

    Permalink

    Loads features stored in GFF2/GTF format.

    Loads features stored in GFF2/GTF format.

    filePath

    The path to the file to load.

    minPartitions

    An optional minimum number of partitions to load. If not set, falls back to the configured Spark default parallelism.

    stringency

    Optional stringency to pass. LENIENT stringency will warn when a malformed line is encountered, SILENT will ignore the malformed line, STRICT will throw an exception.

    returns

    Returns a FeatureRDD.

  24. def loadIndexedBam(filePath: String, viewRegion: ReferenceRegion): AlignmentRecordRDD

    Permalink

    Functions like loadBam, but uses bam index files to look at fewer blocks, and only returns records within a specified ReferenceRegion.

    Functions like loadBam, but uses bam index files to look at fewer blocks, and only returns records within a specified ReferenceRegion. Bam index file required.

    filePath

    The path to the input data. Currently this path must correspond to a single Bam file. The bam index file associated needs to have the same name.

    viewRegion

    The ReferenceRegion we are filtering on

  25. def loadIndexedBam(filePath: String, viewRegions: Iterable[ReferenceRegion])(implicit s: DummyImplicit): AlignmentRecordRDD

    Permalink
  26. def loadIndexedBam(filePath: String, parsedLoci: ParsedLoci, includeUnmappedMates: Boolean = false)(implicit s: DummyImplicit): AlignmentRecordRDD

    Permalink

    Functions like loadBam, but uses bam index files to look at fewer blocks, and only returns records within the specified ReferenceRegions.

    Functions like loadBam, but uses bam index files to look at fewer blocks, and only returns records within the specified ReferenceRegions. Bam index file required.

    filePath

    The path to the input data. Currently this path must correspond to a single Bam file. The bam index file associated needs to have the same name.

    parsedLoci

    Iterable of ReferenceRegions we are filtering on

  27. def loadIndexedVcf(filePath: String, viewRegions: Iterable[ReferenceRegion], stringency: ValidationStringency = ValidationStringency.STRICT)(implicit s: DummyImplicit): VariantContextRDD

    Permalink

    Loads a VCF file indexed by a tabix (tbi) file into an RDD.

    Loads a VCF file indexed by a tabix (tbi) file into an RDD.

    filePath

    The file to load.

    viewRegions

    Iterator of ReferenceRegions we are filtering on.

    stringency

    The validation stringency to use when validating the VCF.

    returns

    Returns a VariantContextRDD.

  28. def loadIndexedVcf(filePath: String, viewRegion: ReferenceRegion): VariantContextRDD

    Permalink

    Loads a VCF file indexed by a tabix (tbi) file into an RDD.

    Loads a VCF file indexed by a tabix (tbi) file into an RDD.

    filePath

    The file to load.

    viewRegion

    ReferenceRegions we are filtering on.

    returns

    Returns a VariantContextRDD.

  29. def loadInterleavedFastq(filePath: String): AlignmentRecordRDD

    Permalink

    Loads reads from interleaved FASTQ.

    Loads reads from interleaved FASTQ.

    In interleaved FASTQ, the two reads from a paired sequencing protocol are interleaved in a single file. This is a zipped representation of the typical paired FASTQ.

    filePath

    Path to load.

    returns

    Returns the file as an unaligned AlignmentRecordRDD.

  30. def loadInterleavedFastqAsFragments(filePath: String): FragmentRDD

    Permalink

    Loads interleaved FASTQ data as Fragments.

    Loads interleaved FASTQ data as Fragments.

    Fragments represent all of the reads from a single sequenced fragment as a single object, which is a useful representation for some tasks.

    filePath

    The path to load.

    returns

    Returns a FragmentRDD containing the paired reads grouped by sequencing fragment.

  31. def loadIntervalList(filePath: String, minPartitions: Option[Int] = None, stringency: ValidationStringency = ValidationStringency.LENIENT): FeatureRDD

    Permalink

    Loads features stored in IntervalList format.

    Loads features stored in IntervalList format.

    filePath

    The path to the file to load.

    minPartitions

    An optional minimum number of partitions to load. If not set, falls back to the configured Spark default parallelism.

    stringency

    Optional stringency to pass. LENIENT stringency will warn when a malformed line is encountered, SILENT will ignore the malformed line, STRICT will throw an exception.

    returns

    Returns a FeatureRDD.

  32. def loadNarrowPeak(filePath: String, minPartitions: Option[Int] = None, stringency: ValidationStringency = ValidationStringency.LENIENT): FeatureRDD

    Permalink

    Loads features stored in NarrowPeak format.

    Loads features stored in NarrowPeak format.

    filePath

    The path to the file to load.

    minPartitions

    An optional minimum number of partitions to load. If not set, falls back to the configured Spark default parallelism.

    stringency

    Optional stringency to pass. LENIENT stringency will warn when a malformed line is encountered, SILENT will ignore the malformed line, STRICT will throw an exception.

    returns

    Returns a FeatureRDD.

  33. def loadPairedFastq(filePath1: String, filePath2: String, recordGroupOpt: Option[String], stringency: ValidationStringency): AlignmentRecordRDD

    Permalink

    Loads paired FASTQ data from two files.

    Loads paired FASTQ data from two files.

    filePath1

    The path where the first set of reads are.

    filePath2

    The path where the second set of reads are.

    recordGroupOpt

    The optional record group name to associate to the reads.

    stringency

    The validation stringency to use when validating the reads.

    returns

    Returns the reads as an unaligned AlignmentRecordRDD.

    See also

    loadFastq

  34. def loadParquet[T](filePath: String, predicate: Option[FilterPredicate] = None, projection: Option[Schema] = None)(implicit ev1: (T) ⇒ SpecificRecord, ev2: Manifest[T]): RDD[T]

    Permalink

    This method will create a new RDD.

    This method will create a new RDD.

    T

    The type of records to return

    filePath

    The path to the input data

    predicate

    An optional pushdown predicate to use when reading the data

    projection

    An option projection schema to use when reading the data

    returns

    An RDD with records of the specified type

  35. def loadParquetAlignments(filePath: String, predicate: Option[FilterPredicate] = None, projection: Option[Schema] = None): AlignmentRecordRDD

    Permalink

    Loads alignment data from a Parquet file.

    Loads alignment data from a Parquet file.

    filePath

    The path of the file to load.

    predicate

    An optional predicate to push down into the file.

    projection

    An optional schema designating the fields to project.

    returns

    Returns an AlignmentRecordRDD which wraps the RDD of reads, sequence dictionary representing the contigs these reads are aligned to if the reads are aligned, and the record group dictionary for the reads if one is available.

    Note

    The sequence dictionary is read from an avro file stored at filePath/_seqdict.avro and the record group dictionary is read from an avro file stored at filePath/_rgdict.avro. These files are pure avro, not Parquet.

    See also

    loadAlignments

  36. def loadParquetContigFragments(filePath: String, predicate: Option[FilterPredicate] = None, projection: Option[Schema] = None): NucleotideContigFragmentRDD

    Permalink

    Loads NucleotideContigFragments stored in Parquet, with metadata.

    Loads NucleotideContigFragments stored in Parquet, with metadata.

    filePath

    The path to load files from.

    predicate

    An optional predicate to push down into the file.

    projection

    An optional projection to use for reading.

    returns

    Returns a NucleotideContigFragmentRDD.

  37. def loadParquetCoverage(filePath: String, predicate: Option[FilterPredicate] = None): CoverageRDD

    Permalink

    Loads Parquet file of Features to a CoverageRDD.

    Loads Parquet file of Features to a CoverageRDD. Coverage is stored in the score attribute of Feature.

    filePath

    File path to load coverage from.

    predicate

    An optional predicate to push down into the file.

    returns

    CoverageRDD containing an RDD of Coverage

  38. def loadParquetFeatures(filePath: String, predicate: Option[FilterPredicate] = None, projection: Option[Schema] = None): FeatureRDD

    Permalink

    Loads Features stored in Parquet, with accompanying metadata.

    Loads Features stored in Parquet, with accompanying metadata.

    filePath

    The path to load files from.

    predicate

    An optional predicate to push down into the file.

    projection

    An optional projection to use for reading.

    returns

    Returns a FeatureRDD.

  39. def loadParquetFragments(filePath: String, predicate: Option[FilterPredicate] = None, projection: Option[Schema] = None): FragmentRDD

    Permalink

    Loads Fragments stored in Parquet, with accompanying metadata.

    Loads Fragments stored in Parquet, with accompanying metadata.

    filePath

    The path to load files from.

    predicate

    An optional predicate to push down into the file.

    projection

    An optional projection to use for reading.

    returns

    Returns a FragmentRDD.

  40. def loadParquetGenotypes(filePath: String, predicate: Option[FilterPredicate] = None, projection: Option[Schema] = None): GenotypeRDD

    Permalink

    Loads Genotypes stored in Parquet with accompanying metadata.

    Loads Genotypes stored in Parquet with accompanying metadata.

    filePath

    The path to load files from.

    predicate

    An optional predicate to push down into the file.

    projection

    An optional projection to use for reading.

    returns

    Returns a GenotypeRDD.

  41. def loadParquetVariantAnnotations(filePath: String, predicate: Option[FilterPredicate] = None, projection: Option[Schema] = None): VariantAnnotationRDD

    Permalink

    Loads VariantAnnotations stored in Parquet, with metadata.

    Loads VariantAnnotations stored in Parquet, with metadata.

    filePath

    The path to load files from.

    predicate

    An optional predicate to push down into the file.

    projection

    An optional projection to use for reading.

    returns

    Returns VariantAnnotationRDD.

  42. def loadParquetVariants(filePath: String, predicate: Option[FilterPredicate] = None, projection: Option[Schema] = None): VariantRDD

    Permalink

    Loads Variants stored in Parquet with accompanying metadata.

    Loads Variants stored in Parquet with accompanying metadata.

    filePath

    The path to load files from.

    predicate

    An optional predicate to push down into the file.

    projection

    An optional projection to use for reading.

    returns

    Returns a VariantRDD.

  43. def loadReferenceFile(filePath: String, fragmentLength: Long): ReferenceFile

    Permalink

    Auto-detects the file type and loads a broadcastable ReferenceFile.

    Auto-detects the file type and loads a broadcastable ReferenceFile.

    If the file type is 2bit, loads a 2bit file. Else, uses loadSequences to load the reference as an RDD, which is then collected to the driver.

    filePath

    The path to load.

    fragmentLength

    The length of fragment to use for splitting.

    returns

    Returns a broadcastable ReferenceFile.

    See also

    loadSequences

  44. def loadSequences(filePath: String, projection: Option[Schema] = None, fragmentLength: Long = 10000): NucleotideContigFragmentRDD

    Permalink

    Auto-detects the file type and loads contigs as a NucleotideContigFragmentRDD.

    Auto-detects the file type and loads contigs as a NucleotideContigFragmentRDD.

    Loads files ending in .fa/.fasta/.fa.gz/.fasta.gz as FASTA, else, falls back to Parquet.

    filePath

    The path to load.

    projection

    An optional subset of fields to load.

    fragmentLength

    The length of fragment to use for splitting.

    returns

    Returns a NucleotideContigFragmentRDD.

    See also

    loadReferenceFile

    loadParquetContigFragments

    loadFasta

  45. def loadUnpairedFastq(filePath: String, recordGroupOpt: Option[String] = None, setFirstOfPair: Boolean = false, setSecondOfPair: Boolean = false, stringency: ValidationStringency = ValidationStringency.STRICT): AlignmentRecordRDD

    Permalink

    Loads unpaired FASTQ data from two files.

    Loads unpaired FASTQ data from two files.

    filePath

    The path where the first set of reads are.

    recordGroupOpt

    The optional record group name to associate to the reads.

    setFirstOfPair

    If true, sets the read as first from the fragment.

    setSecondOfPair

    If true, sets the read as second from the fragment.

    stringency

    The validation stringency to use when validating the reads.

    returns

    Returns the reads as an unaligned AlignmentRecordRDD.

    See also

    loadFastq

  46. def loadVariantAnnotations(filePath: String, projection: Option[Schema] = None): VariantAnnotationRDD

    Permalink

    Loads VariantAnnotations into an RDD, and automatically detects the underlying storage format.

    Loads VariantAnnotations into an RDD, and automatically detects the underlying storage format.

    Can load variant annotations from either Parquet or VCF.

    filePath

    The path to load files from.

    projection

    An optional projection to use for reading.

    returns

    Returns VariantAnnotationRDD.

    See also

    loadParquetVariantAnnotations

    loadVcfAnnotations

  47. def loadVariants(filePath: String, projection: Option[Schema] = None): VariantRDD

    Permalink

    Auto-detects the file type and loads a VariantRDD.

    Auto-detects the file type and loads a VariantRDD.

    If the file has a .vcf/.vcf.gz/.vcf.bgzf/.vcf.bgz extension, loads as VCF. Else, falls back to Parquet.

    filePath

    The path to load.

    projection

    An optional subset of fields to load.

    returns

    Returns a VariantRDD.

    See also

    loadParquetVariants

    loadVcf

  48. def loadVcf(filePath: String, stringency: ValidationStringency = ValidationStringency.STRICT): VariantContextRDD

    Permalink

    Loads a VCF file into an RDD.

    Loads a VCF file into an RDD.

    filePath

    The file to load.

    stringency

    The validation stringency to use when validating the VCF.

    returns

    Returns a VariantContextRDD.

    See also

    loadVcfAnnotations

  49. def loadVcfAnnotations(filePath: String): VariantAnnotationRDD

    Permalink

    Loads variant annotations stored in VCF format.

    Loads variant annotations stored in VCF format.

    filePath

    The path to the VCF file(s) to load annotations from.

    returns

    Returns VariantAnnotationRDD.

  50. def log: Logger

    Permalink
    Attributes
    protected
    Definition Classes
    Logging
  51. def logDebug(msg: ⇒ String, throwable: Throwable): Unit

    Permalink
    Attributes
    protected
    Definition Classes
    Logging
  52. def logDebug(msg: ⇒ String): Unit

    Permalink
    Attributes
    protected
    Definition Classes
    Logging
  53. def logError(msg: ⇒ String, throwable: Throwable): Unit

    Permalink
    Attributes
    protected
    Definition Classes
    Logging
  54. def logError(msg: ⇒ String): Unit

    Permalink
    Attributes
    protected
    Definition Classes
    Logging
  55. def logInfo(msg: ⇒ String, throwable: Throwable): Unit

    Permalink
    Attributes
    protected
    Definition Classes
    Logging
  56. def logInfo(msg: ⇒ String): Unit

    Permalink
    Attributes
    protected
    Definition Classes
    Logging
  57. def logName: String

    Permalink
    Attributes
    protected
    Definition Classes
    Logging
  58. def logTrace(msg: ⇒ String, throwable: Throwable): Unit

    Permalink
    Attributes
    protected
    Definition Classes
    Logging
  59. def logTrace(msg: ⇒ String): Unit

    Permalink
    Attributes
    protected
    Definition Classes
    Logging
  60. def logWarning(msg: ⇒ String, throwable: Throwable): Unit

    Permalink
    Attributes
    protected
    Definition Classes
    Logging
  61. def logWarning(msg: ⇒ String): Unit

    Permalink
    Attributes
    protected
    Definition Classes
    Logging
  62. final def ne(arg0: AnyRef): Boolean

    Permalink
    Definition Classes
    AnyRef
  63. final def notify(): Unit

    Permalink
    Definition Classes
    AnyRef
  64. final def notifyAll(): Unit

    Permalink
    Definition Classes
    AnyRef
  65. val sc: SparkContext

    Permalink

    The SparkContext to wrap.

  66. final def synchronized[T0](arg0: ⇒ T0): T0

    Permalink
    Definition Classes
    AnyRef
  67. def toString(): String

    Permalink
    Definition Classes
    AnyRef → Any
  68. final def wait(): Unit

    Permalink
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  69. final def wait(arg0: Long, arg1: Int): Unit

    Permalink
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  70. final def wait(arg0: Long): Unit

    Permalink
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )

Inherited from Logging

Inherited from Serializable

Inherited from Serializable

Inherited from AnyRef

Inherited from Any

Ungrouped