Class/Object

org.bdgenomics.adam.rdd

ADAMContext

Related Docs: object ADAMContext | package rdd

Permalink

class ADAMContext extends Serializable with Logging

The ADAMContext provides functions on top of a SparkContext for loading genomic data.

Linear Supertypes
Logging, Serializable, Serializable, AnyRef, Any
Ordering
  1. Alphabetic
  2. By Inheritance
Inherited
  1. ADAMContext
  2. Logging
  3. Serializable
  4. Serializable
  5. AnyRef
  6. Any
  1. Hide All
  2. Show All
Visibility
  1. Public
  2. All

Instance Constructors

  1. new ADAMContext(sc: SparkContext)

    Permalink

    sc

    The SparkContext to wrap.

Value Members

  1. final def !=(arg0: Any): Boolean

    Permalink
    Definition Classes
    AnyRef → Any
  2. final def ##(): Int

    Permalink
    Definition Classes
    AnyRef → Any
  3. final def ==(arg0: Any): Boolean

    Permalink
    Definition Classes
    AnyRef → Any
  4. final def asInstanceOf[T0]: T0

    Permalink
    Definition Classes
    Any
  5. def clone(): AnyRef

    Permalink
    Attributes
    protected[java.lang]
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  6. def debug(mkr: Marker, msg: ⇒ Any, t: ⇒ Throwable): Unit

    Permalink
    Attributes
    protected
    Definition Classes
    Logging
  7. def debug(msg: ⇒ Any, t: ⇒ Throwable): Unit

    Permalink
    Attributes
    protected
    Definition Classes
    Logging
  8. def debug(msg: ⇒ Any): Unit

    Permalink
    Attributes
    protected
    Definition Classes
    Logging
  9. final def eq(arg0: AnyRef): Boolean

    Permalink
    Definition Classes
    AnyRef
  10. def equals(arg0: Any): Boolean

    Permalink
    Definition Classes
    AnyRef → Any
  11. def error(mkr: Marker, msg: ⇒ Any, t: ⇒ Throwable): Unit

    Permalink
    Attributes
    protected
    Definition Classes
    Logging
  12. def error(msg: ⇒ Any, t: ⇒ Throwable): Unit

    Permalink
    Attributes
    protected
    Definition Classes
    Logging
  13. def error(msg: ⇒ Any): Unit

    Permalink
    Attributes
    protected
    Definition Classes
    Logging
  14. def finalize(): Unit

    Permalink
    Attributes
    protected[java.lang]
    Definition Classes
    AnyRef
    Annotations
    @throws( classOf[java.lang.Throwable] )
  15. final def getClass(): Class[_]

    Permalink
    Definition Classes
    AnyRef → Any
  16. def getFiles(path: Path, fs: FileSystem): Array[Path]

    Permalink

    Elaborates out a directory/glob/plain path.

    Elaborates out a directory/glob/plain path.

    path

    Path to elaborate.

    fs

    The underlying file system that this path is on.

    returns

    Returns an array of Paths to load.

    Attributes
    protected
    Exceptions thrown

    FileNotFoundException if the path does not match any files.

    See also

    getFsAndFiles

  17. def getFsAndFiles(path: Path): Array[Path]

    Permalink

    Elaborates out a directory/glob/plain path.

    Elaborates out a directory/glob/plain path.

    path

    Path to elaborate.

    returns

    Returns an array of Paths to load.

    Attributes
    protected
    Exceptions thrown

    FileNotFoundException if the path does not match any files.

    See also

    getFiles

  18. def getFsAndFilesWithFilter(pathName: String, filter: PathFilter): Array[Path]

    Permalink

    Elaborates out a directory/glob/plain path name.

    Elaborates out a directory/glob/plain path name.

    pathName

    Path name to elaborate.

    filter

    Filter to discard paths.

    returns

    Returns an array of Paths to load.

    Attributes
    protected
    Exceptions thrown

    FileNotFoundException if the path does not match any files.

    See also

    getFiles

  19. def hashCode(): Int

    Permalink
    Definition Classes
    AnyRef → Any
  20. def info(mkr: Marker, msg: ⇒ Any, t: ⇒ Throwable): Unit

    Permalink
    Attributes
    protected
    Definition Classes
    Logging
  21. def info(msg: ⇒ Any, t: ⇒ Throwable): Unit

    Permalink
    Attributes
    protected
    Definition Classes
    Logging
  22. def info(msg: ⇒ Any): Unit

    Permalink
    Attributes
    protected
    Definition Classes
    Logging
  23. def isDebugEnabled: Boolean

    Permalink
    Attributes
    protected
    Definition Classes
    Logging
  24. def isErrorEnabled: Boolean

    Permalink
    Attributes
    protected
    Definition Classes
    Logging
  25. def isInfoEnabled: Boolean

    Permalink
    Attributes
    protected
    Definition Classes
    Logging
  26. final def isInstanceOf[T0]: Boolean

    Permalink
    Definition Classes
    Any
  27. def isPartitioned(pathName: String): Boolean

    Permalink

    Return true if the specified path of Parquet + Avro files is partitioned.

    Return true if the specified path of Parquet + Avro files is partitioned.

    pathName

    Path in which to look for partitioned flag.

    returns

    Return true if the specified path of Parquet + Avro files is partitioned. Behavior is undefined if some paths in glob contain _partitionedByStartPos flag file and some do not.

  28. def isTraceEnabled: Boolean

    Permalink
    Attributes
    protected
    Definition Classes
    Logging
  29. def isWarnEnabled: Boolean

    Permalink
    Attributes
    protected
    Definition Classes
    Logging
  30. def loadAlignments(df: DataFrame, references: SequenceDictionary, readGroups: ReadGroupDictionary, processingSteps: Seq[ProcessingStep]): AlignmentDataset

    Permalink

    Load the specified data frame, references, read groups, and processing steps into a AlignmentDataset.

    Load the specified data frame, references, read groups, and processing steps into a AlignmentDataset.

    df

    Data frame to load from.

    references

    References for the AlignmentDataset, may be empty.

    readGroups

    Read groups for the AlignmentDataset, may be empty.

    processingSteps

    Processing steps for the AlignmentDataset, may be empty.

    returns

    Returns a new AlignmentDataset loaded from the specified data frame, references, read groups, and processing steps.

  31. def loadAlignments(df: DataFrame, metadataPathName: String): AlignmentDataset

    Permalink

    Load the specified data frame into a AlignmentDataset, with metadata loaded from the specified metadata path name.

    Load the specified data frame into a AlignmentDataset, with metadata loaded from the specified metadata path name.

    df

    Data frame to load from.

    metadataPathName

    Path name to load metadata from.

    returns

    Returns a new AlignmentDataset loaded from the specified data frame, with metadata loaded from the specified metadata path name.

  32. def loadAlignments(df: DataFrame): AlignmentDataset

    Permalink

    Load the specified data frame into a AlignmentDataset, with empty metadata.

    Load the specified data frame into a AlignmentDataset, with empty metadata.

    df

    Data frame to load from.

    returns

    Returns a new AlignmentDataset loaded from the specified data frame, with empty metadata.

  33. def loadAlignments(pathName: String, optPathName2: Option[String] = None, optReadGroup: Option[String] = None, optPredicate: Option[FilterPredicate] = None, optProjection: Option[Schema] = None, stringency: ValidationStringency = ValidationStringency.STRICT): AlignmentDataset

    Permalink

    Load alignments into an AlignmentDataset.

    Load alignments into an AlignmentDataset.

    Loads path names ending in: * .bam/.cram/.sam as BAM/CRAM/SAM format, * .fa/.fasta as FASTA format, * .fq/.fastq as FASTQ format, and * .ifq as interleaved FASTQ format.

    If none of these match, fall back to Parquet + Avro.

    For FASTA, FASTQ, and interleaved FASTQ formats, compressed files are supported through compression codecs configured in Hadoop, which by default include .gz and .bz2, but can include more.

    pathName

    The path name to load alignments from. Globs/directories are supported, although file extension must be present for BAM/CRAM/SAM, FASTA, and FASTQ formats.

    optPathName2

    The optional path name to load the second set of alignment records from, if loading paired FASTQ format. Globs/directories are supported, although file extension must be present. Defaults to None.

    optReadGroup

    The optional read group identifier to associate to the alignment records. Defaults to None.

    optPredicate

    An optional pushdown predicate to use when reading Parquet + Avro. Defaults to None.

    optProjection

    An option projection schema to use when reading Parquet + Avro. Defaults to None.

    stringency

    The validation stringency to use when validating BAM/CRAM/SAM or FASTQ formats. Defaults to ValidationStringency.STRICT.

    returns

    Returns an AlignmentDataset which wraps the genomic dataset of alignments, sequence dictionary representing reference sequences the alignments may be aligned to, and the read group dictionary for the alignments if one is available.

    See also

    loadParquetAlignments

    loadInterleavedFastq

    loadFastaDna(String, Long)

    loadFastq

    loadBam

  34. def loadBam(pathName: String, stringency: ValidationStringency = ValidationStringency.STRICT): AlignmentDataset

    Permalink

    Load alignments from BAM/CRAM/SAM into an AlignmentDataset.

    Load alignments from BAM/CRAM/SAM into an AlignmentDataset.

    This reads the sequence and read group dictionaries from the BAM/CRAM/SAM file header. SAMRecords are read from the file and converted to the Alignment schema.

    pathName

    The path name to load BAM/CRAM/SAM formatted alignments from. Globs/directories are supported.

    stringency

    The validation stringency to use when validating the BAM/CRAM/SAM format header. Defaults to ValidationStringency.STRICT.

    returns

    Returns an AlignmentDataset which wraps the genomic dataset of alignments, sequence dictionary representing reference sequences the alignments may be aligned to, and the read group dictionary for the alignments if one is available.

  35. def loadBed(pathName: String, optSequenceDictionary: Option[SequenceDictionary] = None, optMinPartitions: Option[Int] = None, stringency: ValidationStringency = ValidationStringency.STRICT): FeatureDataset

    Permalink

    Load a path name in BED6/12 format into a FeatureDataset.

    Load a path name in BED6/12 format into a FeatureDataset.

    pathName

    The path name to load features in BED6/12 format from. Globs/directories are supported.

    optSequenceDictionary

    Optional sequence dictionary. Defaults to None.

    optMinPartitions

    An optional minimum number of partitions to load. If not set, falls back to the configured Spark default parallelism. Defaults to None.

    stringency

    The validation stringency to use when validating BED6/12 format. Defaults to ValidationStringency.STRICT.

    returns

    Returns a FeatureDataset.

  36. def loadCoverage(pathName: String, optSequenceDictionary: Option[SequenceDictionary] = None, optMinPartitions: Option[Int] = None, optPredicate: Option[FilterPredicate] = None, optProjection: Option[Schema] = None, stringency: ValidationStringency = ValidationStringency.STRICT): CoverageDataset

    Permalink

    Load features into a FeatureDataset and convert to a CoverageDataset.

    Load features into a FeatureDataset and convert to a CoverageDataset. Coverage is stored in the score field of Feature.

    Loads path names ending in: * .bed as BED6/12 format, * .gff3 as GFF3 format, * .gtf/.gff as GTF/GFF2 format, * .narrow[pP]eak as NarrowPeak format, and * .interval_list as IntervalList format.

    If none of these match, fall back to Parquet + Avro.

    For BED6/12, GFF3, GTF/GFF2, NarrowPeak, and IntervalList formats, compressed files are supported through compression codecs configured in Hadoop, which by default include .gz and .bz2, but can include more.

    pathName

    The path name to load features from. Globs/directories are supported, although file extension must be present for BED6/12, GFF3, GTF/GFF2, NarrowPeak, or IntervalList formats.

    optSequenceDictionary

    Optional sequence dictionary. Defaults to None.

    optMinPartitions

    An optional minimum number of partitions to use. For textual formats, if this is None, fall back to the Spark default parallelism. Defaults to None.

    optPredicate

    An optional pushdown predicate to use when reading Parquet + Avro. Defaults to None.

    optProjection

    An option projection schema to use when reading Parquet + Avro. Defaults to None.

    stringency

    The validation stringency to use when validating BED6/12, GFF3, GTF/GFF2, NarrowPeak, or IntervalList formats. Defaults to ValidationStringency.STRICT.

    returns

    Returns a FeatureDataset converted to a CoverageDataset.

    See also

    loadParquetFeatures

    loadIntervalList

    loadNarrowPeak

    loadGff3

    loadGtf

    loadBed

  37. def loadDnaSequences(pathName: String, optPredicate: Option[FilterPredicate] = None, optProjection: Option[Schema] = None): SequenceDataset

    Permalink

    Load DNA sequences into a SequenceDataset.

    Load DNA sequences into a SequenceDataset.

    If the path name has a .fa/.fasta extension, load as FASTA format. Else, fall back to Parquet + Avro.

    For FASTA format, compressed files are supported through compression codecs configured in Hadoop, which by default include .gz and .bz2, but can include more.

    pathName

    The path name to load sequences from. Globs/directories are supported, although file extension must be present for FASTA format.

    optPredicate

    An optional pushdown predicate to use when reading Parquet + Avro. Defaults to None.

    optProjection

    An optional projection schema to use when reading Parquet + Avro. Defaults to None.

    returns

    Returns a SequenceDataset containing DNA sequences.

    See also

    loadParquetSequences

    loadFastaDna

  38. def loadFastaDna(pathName: String, maximumLength: Long): SliceDataset

    Permalink

    Load DNA slices from FASTA into a SliceDataset.

    Load DNA slices from FASTA into a SliceDataset.

    pathName

    The path name to load slices from. Globs/directories are supported.

    maximumLength

    Maximum fragment length. Values greater than 1e9 should be avoided.

    returns

    Returns a SliceDataset containing DNA slices.

  39. def loadFastaDna(pathName: String): SequenceDataset

    Permalink

    Load DNA sequences from FASTA into a SequenceDataset.

    Load DNA sequences from FASTA into a SequenceDataset.

    pathName

    The path name to load sequences from. Globs/directories are supported.

    returns

    Returns a SequenceDataset containing DNA sequences.

  40. def loadFastaProtein(pathName: String): SequenceDataset

    Permalink

    Load protein sequences from FASTA into a SequenceDataset.

    Load protein sequences from FASTA into a SequenceDataset.

    pathName

    The path name to load sequences from. Globs/directories are supported.

    returns

    Returns a SequenceDataset containing protein sequences.

  41. def loadFastaRna(pathName: String): SequenceDataset

    Permalink

    Load RNA sequences from FASTA into a SequenceDataset.

    Load RNA sequences from FASTA into a SequenceDataset.

    pathName

    The path name to load sequences from. Globs/directories are supported.

    returns

    Returns a SequenceDataset containing RNA sequences.

  42. def loadFastq(pathName1: String, optPathName2: Option[String], optReadGroup: Option[String] = None, stringency: ValidationStringency = ValidationStringency.STRICT): AlignmentDataset

    Permalink

    Load unaligned alignments from (possibly paired) FASTQ into an AlignmentDataset.

    Load unaligned alignments from (possibly paired) FASTQ into an AlignmentDataset.

    pathName1

    The path name to load the first set of unaligned alignments from. Globs/directories are supported.

    optPathName2

    The path name to load the second set of unaligned alignments from, if provided. Globs/directories are supported.

    optReadGroup

    The optional read group identifier to associate to the unaligned alignment records. Defaults to None.

    stringency

    The validation stringency to use when validating (possibly paired) FASTQ format. Defaults to ValidationStringency.STRICT.

    returns

    Returns an unaligned AlignmentDataset.

    See also

    loadUnpairedFastq

    loadPairedFastq

  43. def loadFeatures(df: DataFrame, references: SequenceDictionary, samples: Seq[Sample]): FeatureDataset

    Permalink

    Load the specified data frame, references, and samples into a FeatureDataset.

    Load the specified data frame, references, and samples into a FeatureDataset.

    df

    Data frame to load from.

    references

    References for the FeatureDataset, may be empty.

    samples

    Samples for the FeatureDataset, may be empty.

    returns

    Returns a new FeatureDataset loaded from the specified data frame, references, and samples.

  44. def loadFeatures(df: DataFrame, metadataPathName: String): FeatureDataset

    Permalink

    Load the specified data frame into a FeatureDataset, with metadata loaded from the specified metadata path name.

    Load the specified data frame into a FeatureDataset, with metadata loaded from the specified metadata path name.

    df

    Data frame to load from.

    metadataPathName

    Path name to load metadata from.

    returns

    Returns a new FeatureDataset loaded from the specified data frame, with metadata loaded from the specified metadata path name.

  45. def loadFeatures(df: DataFrame): FeatureDataset

    Permalink

    Load the specified data frame into a FeatureDataset, with empty metadata.

    Load the specified data frame into a FeatureDataset, with empty metadata.

    df

    Data frame to load from.

    returns

    Returns a new FeatureDataset loaded from the specified data frame, with empty metadata.

  46. def loadFeatures(pathName: String, optSequenceDictionary: Option[SequenceDictionary] = None, optMinPartitions: Option[Int] = None, optPredicate: Option[FilterPredicate] = None, optProjection: Option[Schema] = None, stringency: ValidationStringency = ValidationStringency.STRICT): FeatureDataset

    Permalink

    Load features into a FeatureDataset.

    Load features into a FeatureDataset.

    Loads path names ending in: * .bed as BED6/12 format, * .gff3 as GFF3 format, * .gtf/.gff as GTF/GFF2 format, * .narrow[pP]eak as NarrowPeak format, and * .interval_list as IntervalList format.

    If none of these match, fall back to Parquet + Avro.

    For BED6/12, GFF3, GTF/GFF2, NarrowPeak, and IntervalList formats, compressed files are supported through compression codecs configured in Hadoop, which by default include .gz and .bz2, but can include more.

    pathName

    The path name to load features from. Globs/directories are supported, although file extension must be present for BED6/12, GFF3, GTF/GFF2, NarrowPeak, or IntervalList formats.

    optSequenceDictionary

    Optional sequence dictionary. Defaults to None.

    optMinPartitions

    An optional minimum number of partitions to use. For textual formats, if this is None, fall back to the Spark default parallelism. Defaults to None.

    optPredicate

    An optional pushdown predicate to use when reading Parquet + Avro. Defaults to None.

    optProjection

    An option projection schema to use when reading Parquet + Avro. Defaults to None.

    stringency

    The validation stringency to use when validating BED6/12, GFF3, GTF/GFF2, NarrowPeak, or IntervalList formats. Defaults to ValidationStringency.STRICT.

    returns

    Returns a FeatureDataset.

    See also

    loadParquetFeatures

    loadIntervalList

    loadNarrowPeak

    loadGff3

    loadGtf

    loadBed

  47. def loadFragments(df: DataFrame, references: SequenceDictionary, readGroups: ReadGroupDictionary, processingSteps: Seq[ProcessingStep]): FragmentDataset

    Permalink

    Load the specified data frame, references, read groups, and processing steps into a FragmentDataset.

    Load the specified data frame, references, read groups, and processing steps into a FragmentDataset.

    df

    Data frame to load from.

    references

    References for the FragmentDataset, may be empty.

    readGroups

    Read groups for the FragmentDataset, may be empty.

    processingSteps

    Processing steps for the FragmentDataset, may be empty.

    returns

    Returns a new FragmentDataset loaded from the specified data frame, references, read groups, and processing steps.

  48. def loadFragments(df: DataFrame, metadataPathName: String): FragmentDataset

    Permalink

    Load the specified data frame into a FragmentDataset, with metadata loaded from the specified metadata path name.

    Load the specified data frame into a FragmentDataset, with metadata loaded from the specified metadata path name.

    df

    Data frame to load from.

    metadataPathName

    Path name to load metadata from.

    returns

    Returns a new FragmentDataset loaded from the specified data frame, with metadata loaded from the specified metadata path name.

  49. def loadFragments(df: DataFrame): FragmentDataset

    Permalink

    Load the specified data frame into a FragmentDataset, with empty metadata.

    Load the specified data frame into a FragmentDataset, with empty metadata.

    df

    Data frame to load from.

    returns

    Returns a new FragmentDataset loaded from the specified data frame, with empty metadata.

  50. def loadFragments(pathName: String, optPredicate: Option[FilterPredicate] = None, optProjection: Option[Schema] = None, stringency: ValidationStringency = ValidationStringency.STRICT): FragmentDataset

    Permalink

    Load fragments into a FragmentDataset.

    Load fragments into a FragmentDataset.

    Loads path names ending in: * .bam/.cram/.sam as BAM/CRAM/SAM format and * .ifq as interleaved FASTQ format.

    If none of these match, fall back to Parquet + Avro.

    For interleaved FASTQ format, compressed files are supported through compression codecs configured in Hadoop, which by default include .gz and .bz2, but can include more.

    pathName

    The path name to load fragments from. Globs/directories are supported, although file extension must be present for BAM/CRAM/SAM and FASTQ formats.

    optPredicate

    An optional pushdown predicate to use when reading Parquet + Avro. Defaults to None.

    optProjection

    An option projection schema to use when reading Parquet + Avro. Defaults to None.

    stringency

    The validation stringency to use when validating BAM/CRAM/SAM or FASTQ formats. Defaults to ValidationStringency.STRICT.

    returns

    Returns a FragmentDataset.

    See also

    loadParquetFragments

    loadInterleavedFastqAsFragments

    loadAlignments

    loadBam

  51. def loadGenotypes(df: DataFrame, references: SequenceDictionary, samples: Seq[Sample], headerLines: Seq[VCFHeaderLine]): GenotypeDataset

    Permalink

    Load the specified data frame, references, samples, and header lines into a GenotypeDataset.

    Load the specified data frame, references, samples, and header lines into a GenotypeDataset.

    df

    Data frame to load from.

    references

    References for the GenotypeDataset, may be empty.

    samples

    Samples for the GenotypeDataset, may be empty.

    headerLines

    Header lines for the GenotypeDataset, may be empty.

    returns

    Returns a new GenotypeDataset loaded from the specified data frame, references, samples, and header lines.

  52. def loadGenotypes(df: DataFrame, references: SequenceDictionary, samples: Seq[Sample]): GenotypeDataset

    Permalink

    Load the specified data frame, references, samples, and header lines into a GenotypeDataset, with the default header lines.

    Load the specified data frame, references, samples, and header lines into a GenotypeDataset, with the default header lines.

    df

    Data frame to load from.

    references

    References for the GenotypeDataset, may be empty.

    samples

    Samples for the GenotypeDataset, may be empty.

    returns

    Returns a new GenotypeDataset loaded from the specified data frame, references, and samples, with the default header lines.

  53. def loadGenotypes(df: DataFrame, metadataPathName: String): GenotypeDataset

    Permalink

    Load the specified data frame into a GenotypeDataset, with metadata loaded from the specified metadata path name.

    Load the specified data frame into a GenotypeDataset, with metadata loaded from the specified metadata path name.

    df

    Data frame to load from.

    metadataPathName

    Path name to load metadata from.

    returns

    Returns a new GenotypeDataset loaded from the specified data frame, with metadata loaded from the specified metadata path name.

  54. def loadGenotypes(df: DataFrame): GenotypeDataset

    Permalink

    Load the specified data frame into a GenotypeDataset, with empty metadata and the default header lines.

    Load the specified data frame into a GenotypeDataset, with empty metadata and the default header lines.

    df

    Data frame to load from.

    returns

    Returns a new GenotypeDataset loaded from the specified data frame, with empty metadata and the default header lines.

  55. def loadGenotypes(pathName: String, optPredicate: Option[FilterPredicate] = None, optProjection: Option[Schema] = None, stringency: ValidationStringency = ValidationStringency.STRICT): GenotypeDataset

    Permalink

    Load genotypes into a GenotypeDataset.

    Load genotypes into a GenotypeDataset.

    If the path name has a .vcf/.vcf.gz/.vcf.bgz extension, load as VCF format. Else, fall back to Parquet + Avro.

    pathName

    The path name to load genotypes from. Globs/directories are supported, although file extension must be present for VCF format.

    optPredicate

    An optional pushdown predicate to use when reading Parquet + Avro. Defaults to None.

    optProjection

    An option projection schema to use when reading Parquet + Avro. Defaults to None.

    stringency

    The validation stringency to use when validating VCF format. Defaults to ValidationStringency.STRICT.

    returns

    Returns a GenotypeDataset.

    See also

    loadParquetGenotypes

    loadVcf

  56. def loadGff3(pathName: String, optSequenceDictionary: Option[SequenceDictionary] = None, optMinPartitions: Option[Int] = None, stringency: ValidationStringency = ValidationStringency.STRICT): FeatureDataset

    Permalink

    Load a path name in GFF3 format into a FeatureDataset.

    Load a path name in GFF3 format into a FeatureDataset.

    pathName

    The path name to load features in GFF3 format from. Globs/directories are supported.

    optSequenceDictionary

    Optional sequence dictionary. Defaults to None.

    optMinPartitions

    An optional minimum number of partitions to load. If not set, falls back to the configured Spark default parallelism. Defaults to None.

    stringency

    The validation stringency to use when validating GFF3 format. Defaults to ValidationStringency.STRICT.

    returns

    Returns a FeatureDataset.

  57. def loadGtf(pathName: String, optSequenceDictionary: Option[SequenceDictionary] = None, optMinPartitions: Option[Int] = None, stringency: ValidationStringency = ValidationStringency.STRICT): FeatureDataset

    Permalink

    Load a path name in GTF/GFF2 format into a FeatureDataset.

    Load a path name in GTF/GFF2 format into a FeatureDataset.

    pathName

    The path name to load features in GTF/GFF2 format from. Globs/directories are supported.

    optSequenceDictionary

    Optional sequence dictionary. Defaults to None.

    optMinPartitions

    An optional minimum number of partitions to load. If not set, falls back to the configured Spark default parallelism. Defaults to None.

    stringency

    The validation stringency to use when validating GTF/GFF2 format. Defaults to ValidationStringency.STRICT.

    returns

    Returns a FeatureDataset.

  58. def loadIndexedBam(pathName: String, viewRegions: Iterable[ReferenceRegion], stringency: ValidationStringency = ValidationStringency.STRICT)(implicit s: DummyImplicit): AlignmentDataset

    Permalink

    Functions like loadBam, but uses BAM index files to look at fewer blocks, and only returns records within the specified ReferenceRegions.

    Functions like loadBam, but uses BAM index files to look at fewer blocks, and only returns records within the specified ReferenceRegions. BAM index file required.

    pathName

    The path name to load indexed BAM formatted alignments from. Globs/directories are supported.

    viewRegions

    Iterable of ReferenceRegion we are filtering on.

    stringency

    The validation stringency to use when validating the BAM/CRAM/SAM format header. Defaults to ValidationStringency.STRICT.

    returns

    Returns an AlignmentDataset which wraps the genomic dataset of alignments, sequence dictionary representing reference sequences the alignments may be aligned to, and the read group dictionary for the alignments if one is available.

  59. def loadIndexedBam(pathName: String, viewRegion: ReferenceRegion): AlignmentDataset

    Permalink

    Functions like loadBam, but uses BAM index files to look at fewer blocks, and only returns records within a specified ReferenceRegion.

    Functions like loadBam, but uses BAM index files to look at fewer blocks, and only returns records within a specified ReferenceRegion. BAM index file required.

    pathName

    The path name to load indexed BAM formatted alignments from. Globs/directories are supported.

    viewRegion

    The ReferenceRegion we are filtering on.

    returns

    Returns an AlignmentDataset which wraps the genomic dataset of alignments, sequence dictionary representing reference sequences the alignments may be aligned to, and the read group dictionary for the alignments if one is available.

  60. def loadIndexedVcf(pathName: String, viewRegions: Iterable[ReferenceRegion], stringency: ValidationStringency = ValidationStringency.STRICT)(implicit s: DummyImplicit): VariantContextDataset

    Permalink

    Load variant context records from VCF indexed by tabix (tbi) into a VariantContextDataset.

    Load variant context records from VCF indexed by tabix (tbi) into a VariantContextDataset.

    pathName

    The path name to load VCF variant context records from. Globs/directories are supported.

    viewRegions

    Iterator of ReferenceRegions we are filtering on.

    stringency

    The validation stringency to use when validating VCF format. Defaults to ValidationStringency.STRICT.

    returns

    Returns a VariantContextDataset.

  61. def loadIndexedVcf(pathName: String, viewRegion: ReferenceRegion): VariantContextDataset

    Permalink

    Load variant context records from VCF indexed by tabix (tbi) into a VariantContextDataset.

    Load variant context records from VCF indexed by tabix (tbi) into a VariantContextDataset.

    pathName

    The path name to load VCF variant context records from. Globs/directories are supported.

    viewRegion

    ReferenceRegion we are filtering on.

    returns

    Returns a VariantContextDataset.

  62. def loadInterleavedFastq(pathName: String): AlignmentDataset

    Permalink

    Load unaligned alignments from interleaved FASTQ into an AlignmentDataset.

    Load unaligned alignments from interleaved FASTQ into an AlignmentDataset.

    In interleaved FASTQ, the two reads from a paired sequencing protocol are interleaved in a single file. This is a zipped representation of the typical paired FASTQ.

    pathName

    The path name to load unaligned alignments from. Globs/directories are supported.

    returns

    Returns an unaligned AlignmentDataset.

  63. def loadInterleavedFastqAsFragments(pathName: String): FragmentDataset

    Permalink

    Load paired unaligned alignments grouped by sequencing fragment from interleaved FASTQ into an FragmentDataset.

    Load paired unaligned alignments grouped by sequencing fragment from interleaved FASTQ into an FragmentDataset.

    In interleaved FASTQ, the two reads from a paired sequencing protocol are interleaved in a single file. This is a zipped representation of the typical paired FASTQ.

    Fragments represent all of the reads from a single sequenced fragment as a single object, which is a useful representation for some tasks.

    pathName

    The path name to load unaligned alignments from. Globs/directories are supported.

    returns

    Returns a FragmentDataset containing the paired reads grouped by sequencing fragment.

  64. def loadIntervalList(pathName: String, optMinPartitions: Option[Int] = None, stringency: ValidationStringency = ValidationStringency.STRICT): FeatureDataset

    Permalink

    Load a path name in IntervalList format into a FeatureDataset.

    Load a path name in IntervalList format into a FeatureDataset.

    pathName

    The path name to load features in IntervalList format from. Globs/directories are supported.

    optMinPartitions

    An optional minimum number of partitions to load. If not set, falls back to the configured Spark default parallelism. Defaults to None.

    stringency

    The validation stringency to use when validating IntervalList format. Defaults to ValidationStringency.STRICT.

    returns

    Returns a FeatureDataset.

  65. def loadNarrowPeak(pathName: String, optSequenceDictionary: Option[SequenceDictionary] = None, optMinPartitions: Option[Int] = None, stringency: ValidationStringency = ValidationStringency.STRICT): FeatureDataset

    Permalink

    Load a path name in NarrowPeak format into a FeatureDataset.

    Load a path name in NarrowPeak format into a FeatureDataset.

    pathName

    The path name to load features in NarrowPeak format from. Globs/directories are supported.

    optSequenceDictionary

    Optional sequence dictionary. Defaults to None.

    optMinPartitions

    An optional minimum number of partitions to load. If not set, falls back to the configured Spark default parallelism. Defaults to None.

    stringency

    The validation stringency to use when validating NarrowPeak format. Defaults to ValidationStringency.STRICT.

    returns

    Returns a FeatureDataset.

  66. def loadPairedFastq(pathName1: String, pathName2: String, optReadGroup: Option[String] = None, persistLevel: Option[StorageLevel] = Some(StorageLevel.MEMORY_ONLY), stringency: ValidationStringency = ValidationStringency.STRICT): AlignmentDataset

    Permalink

    Load unaligned alignments from paired FASTQ into an AlignmentDataset.

    Load unaligned alignments from paired FASTQ into an AlignmentDataset.

    pathName1

    The path name to load the first set of unaligned alignments from. Globs/directories are supported.

    pathName2

    The path name to load the second set of unaligned alignments from. Globs/directories are supported.

    optReadGroup

    The optional read group identifier to associate to the unaligned alignment records. Defaults to None.

    persistLevel

    An optional persistance level to set. If this level is set, then reads will be cached (at the given persistance) level as part of validation. Defaults to StorageLevel.MEMORY_ONLY.

    stringency

    The validation stringency to use when validating paired FASTQ format. Defaults to ValidationStringency.STRICT.

    returns

    Returns an unaligned AlignmentDataset.

  67. def loadPairedFastqAsFragments(pathName1: String, pathName2: String, optReadGroup: Option[String] = None, persistLevel: Option[StorageLevel] = Some(StorageLevel.MEMORY_ONLY), stringency: ValidationStringency = ValidationStringency.STRICT): FragmentDataset

    Permalink

    Load paired unaligned alignments grouped by sequencing fragment from paired FASTQ files into an FragmentDataset.

    Load paired unaligned alignments grouped by sequencing fragment from paired FASTQ files into an FragmentDataset.

    Fragments represent all of the reads from a single sequenced fragment as a single object, which is a useful representation for some tasks.

    pathName1

    The path name to load the first set of unaligned alignments from. Globs/directories are supported.

    pathName2

    The path name to load the second set of unaligned alignments from. Globs/directories are supported.

    optReadGroup

    The optional read group identifier to associate to the unaligned alignment records. Defaults to None.

    persistLevel

    An optional persistance level to set. If this level is set, then reads will be cached (at the given persistance) level as part of validation. Defaults to StorageLevel.MEMORY_ONLY.

    stringency

    The validation stringency to use when validating paired FASTQ format. Defaults to ValidationStringency.STRICT.

    returns

    Returns a FragmentDataset containing the paired reads grouped by sequencing fragment.

  68. def loadParquet[T](pathName: String, optPredicate: Option[FilterPredicate] = None, optProjection: Option[Schema] = None)(implicit ev1: (T) ⇒ SpecificRecord, ev2: Manifest[T]): RDD[T]

    Permalink

    Load a path name in Parquet + Avro format into an RDD.

    Load a path name in Parquet + Avro format into an RDD.

    T

    The type of records to return.

    pathName

    The path name to load Parquet + Avro formatted data from. Globs/directories are supported.

    optPredicate

    An optional pushdown predicate to use when reading Parquet + Avro. Defaults to None.

    optProjection

    An option projection schema to use when reading Parquet + Avro. Defaults to None.

    returns

    An RDD with records of the specified type.

  69. def loadParquetAlignments(pathName: String, optPredicate: Option[FilterPredicate] = None, optProjection: Option[Schema] = None): AlignmentDataset

    Permalink

    Load a path name in Parquet + Avro format into an AlignmentDataset.

    Load a path name in Parquet + Avro format into an AlignmentDataset.

    pathName

    The path name to load alignments from. Globs/directories are supported.

    optPredicate

    An optional pushdown predicate to use when reading Parquet + Avro. Defaults to None.

    optProjection

    An option projection schema to use when reading Parquet + Avro. Defaults to None.

    returns

    Returns an AlignmentDataset which wraps the genomic dataset of alignments, sequence dictionary representing reference sequences the alignments may be aligned to, and the read group dictionary for the alignments if one is available.

    Note

    The sequence dictionary is read from an Avro file stored at pathName/_references.avro and the read group dictionary is read from an Avro file stored at pathName/_readGroups.avro. These files are pure Avro, not Parquet + Avro.

  70. def loadParquetCoverage(pathName: String, optPredicate: Option[FilterPredicate] = None, forceRdd: Boolean = false): CoverageDataset

    Permalink

    Load a path name in Parquet + Avro format into a FeatureDataset and convert to a CoverageDataset.

    Load a path name in Parquet + Avro format into a FeatureDataset and convert to a CoverageDataset. Coverage is stored in the score field of Feature.

    pathName

    The path name to load features from. Globs/directories are supported.

    optPredicate

    An optional pushdown predicate to use when reading Parquet + Avro. Defaults to None.

    forceRdd

    Forces loading the RDD.

    returns

    Returns a FeatureDataset converted to a CoverageDataset.

  71. def loadParquetFeatures(pathName: String, optPredicate: Option[FilterPredicate] = None, optProjection: Option[Schema] = None): FeatureDataset

    Permalink

    Load a path name in Parquet + Avro format into a FeatureDataset.

    Load a path name in Parquet + Avro format into a FeatureDataset.

    pathName

    The path name to load features from. Globs/directories are supported.

    optPredicate

    An optional pushdown predicate to use when reading Parquet + Avro. Defaults to None.

    optProjection

    An option projection schema to use when reading Parquet + Avro. Defaults to None.

    returns

    Returns a FeatureDataset.

  72. def loadParquetFragments(pathName: String, optPredicate: Option[FilterPredicate] = None, optProjection: Option[Schema] = None): FragmentDataset

    Permalink

    Load a path name in Parquet + Avro format into a FragmentDataset.

    Load a path name in Parquet + Avro format into a FragmentDataset.

    pathName

    The path name to load fragments from. Globs/directories are supported.

    optPredicate

    An optional pushdown predicate to use when reading Parquet + Avro. Defaults to None.

    optProjection

    An option projection schema to use when reading Parquet + Avro. Defaults to None.

    returns

    Returns a FragmentDataset.

  73. def loadParquetGenotypes(pathName: String, optPredicate: Option[FilterPredicate] = None, optProjection: Option[Schema] = None): GenotypeDataset

    Permalink

    Load a path name in Parquet + Avro format into a GenotypeDataset.

    Load a path name in Parquet + Avro format into a GenotypeDataset.

    pathName

    The path name to load genotypes from. Globs/directories are supported.

    optPredicate

    An optional pushdown predicate to use when reading Parquet + Avro. Defaults to None.

    optProjection

    An option projection schema to use when reading Parquet + Avro. Defaults to None.

    returns

    Returns a GenotypeDataset.

  74. def loadParquetReads(pathName: String, optPredicate: Option[FilterPredicate] = None, optProjection: Option[Schema] = None): ReadDataset

    Permalink

    Load a path name in Parquet + Avro format into a ReadDataset.

    Load a path name in Parquet + Avro format into a ReadDataset.

    pathName

    The path name to load reads from. Globs/directories are supported.

    optPredicate

    An optional pushdown predicate to use when reading Parquet + Avro. Defaults to None.

    optProjection

    An optional projection schema to use when reading Parquet + Avro. Defaults to None.

    returns

    Returns a ReadDataset.

  75. def loadParquetSequences(pathName: String, optPredicate: Option[FilterPredicate] = None, optProjection: Option[Schema] = None): SequenceDataset

    Permalink

    Load a path name in Parquet + Avro format into a SequenceDataset.

    Load a path name in Parquet + Avro format into a SequenceDataset.

    pathName

    The path name to load sequences from. Globs/directories are supported.

    optPredicate

    An optional pushdown predicate to use when reading Parquet + Avro. Defaults to None.

    optProjection

    An optional projection schema to use when reading Parquet + Avro. Defaults to None.

    returns

    Returns a SequenceDataset.

  76. def loadParquetSlices(pathName: String, optPredicate: Option[FilterPredicate] = None, optProjection: Option[Schema] = None): SliceDataset

    Permalink

    Load a path name in Parquet + Avro format into a SliceDataset.

    Load a path name in Parquet + Avro format into a SliceDataset.

    pathName

    The path name to load slices from. Globs/directories are supported.

    optPredicate

    An optional pushdown predicate to use when reading Parquet + Avro. Defaults to None.

    optProjection

    An optional projection schema to use when reading Parquet + Avro. Defaults to None.

    returns

    Returns a SliceDataset.

  77. def loadParquetVariantContexts(pathName: String): VariantContextDataset

    Permalink

    Load a path name in Parquet + Avro format into a VariantContextDataset.

    Load a path name in Parquet + Avro format into a VariantContextDataset.

    pathName

    The path name to load variant context records from. Globs/directories are supported.

    returns

    Returns a VariantContextDataset.

  78. def loadParquetVariants(pathName: String, optPredicate: Option[FilterPredicate] = None, optProjection: Option[Schema] = None): VariantDataset

    Permalink

    Load a path name in Parquet format into a VariantDataset.

    Load a path name in Parquet format into a VariantDataset.

    pathName

    The path name to load variants from. Globs/directories are supported.

    optPredicate

    An optional pushdown predicate to use when reading Parquet + Avro. Defaults to None.

    optProjection

    An option projection schema to use when reading Parquet + Avro. Defaults to None.

    returns

    Returns a VariantDataset.

  79. def loadPartitionedParquetAlignments(pathName: String, regions: Iterable[ReferenceRegion] = Iterable.empty, optLookbackPartitions: Option[Int] = Some(1)): AlignmentDataset

    Permalink

    Load a path name with range binned partitioned Parquet format into an AlignmentDataset.

    Load a path name with range binned partitioned Parquet format into an AlignmentDataset.

    pathName

    The path name to load alignments from. Globs/directories are supported.

    regions

    Optional list of genomic regions to load.

    optLookbackPartitions

    Number of partitions to lookback to find beginning of an overlapping region when using the filterByOverlappingRegions function on the returned dataset. Defaults to one partition.

    returns

    Returns an AlignmentDataset.

    Note

    The sequence dictionary is read from an Avro file stored at pathName/_references.avro and the read group dictionary is read from an Avro file stored at pathName/_readGroups.avro. These files are pure Avro, not Parquet + Avro.

  80. def loadPartitionedParquetFeatures(pathName: String, regions: Iterable[ReferenceRegion] = Iterable.empty, optLookbackPartitions: Option[Int] = Some(1)): FeatureDataset

    Permalink

    Load a path name with range binned partitioned Parquet format into a FeatureDataset.

    Load a path name with range binned partitioned Parquet format into a FeatureDataset.

    pathName

    The path name to load alignments from. Globs/directories are supported.

    regions

    Optional list of genomic regions to load.

    optLookbackPartitions

    Number of partitions to lookback to find beginning of an overlapping region when using the filterByOverlappingRegions function on the returned dataset. Defaults to one partition.

    returns

    Returns a FeatureDataset.

  81. def loadPartitionedParquetGenotypes(pathName: String, regions: Iterable[ReferenceRegion] = Iterable.empty, optLookbackPartitions: Option[Int] = Some(1)): GenotypeDataset

    Permalink

    Load a path name with range binned partitioned Parquet format into a GenotypeDataset.

    Load a path name with range binned partitioned Parquet format into a GenotypeDataset.

    pathName

    The path name to load alignments from. Globs/directories are supported.

    regions

    Optional list of genomic regions to load.

    optLookbackPartitions

    Number of partitions to lookback to find beginning of an overlapping region when using the filterByOverlappingRegions function on the returned dataset. Defaults to one partition.

    returns

    Returns a GenotypeDataset.

  82. def loadPartitionedParquetVariantContexts(pathName: String, regions: Iterable[ReferenceRegion] = Iterable.empty, optLookbackPartitions: Option[Int] = Some(1)): VariantContextDataset

    Permalink

    Load a path name with range binned partitioned Parquet format into a VariantContextDataset.

    Load a path name with range binned partitioned Parquet format into a VariantContextDataset.

    pathName

    The path name to load variant context records from. Globs/directories are supported.

    regions

    Optional list of genomic regions to load.

    optLookbackPartitions

    Number of partitions to lookback to find beginning of an overlapping region when using the filterByOverlappingRegions function on the returned dataset. Defaults to one partition.

    returns

    Returns a VariantContextDataset.

  83. def loadPartitionedParquetVariants(pathName: String, regions: Iterable[ReferenceRegion] = Iterable.empty, optLookbackPartitions: Option[Int] = Some(1)): VariantDataset

    Permalink

    Load a path name with range binned partitioned Parquet format into a VariantDataset.

    Load a path name with range binned partitioned Parquet format into a VariantDataset.

    pathName

    The path name to load alignments from. Globs/directories are supported.

    regions

    Optional list of genomic regions to load.

    optLookbackPartitions

    Number of partitions to lookback to find beginning of an overlapping region when using the filterByOverlappingRegions function on the returned dataset. Defaults to one partition.

    returns

    Returns a VariantDataset.

  84. def loadProteinSequences(pathName: String, optPredicate: Option[FilterPredicate] = None, optProjection: Option[Schema] = None): SequenceDataset

    Permalink

    Load protein sequences into a SequenceDataset.

    Load protein sequences into a SequenceDataset.

    If the path name has a .fa/.fasta extension, load as FASTA format. Else, fall back to Parquet + Avro.

    For FASTA format, compressed files are supported through compression codecs configured in Hadoop, which by default include .gz and .bz2, but can include more.

    pathName

    The path name to load sequences from. Globs/directories are supported, although file extension must be present for FASTA format.

    optPredicate

    An optional pushdown predicate to use when reading Parquet + Avro. Defaults to None.

    optProjection

    An optional projection schema to use when reading Parquet + Avro. Defaults to None.

    returns

    Returns a SequenceRDD containing protein sequences.

    See also

    loadParquetSequences

    loadFastaProtein

  85. def loadReads(df: DataFrame, references: SequenceDictionary): ReadDataset

    Permalink

    Load the specified data frame and references into a ReadDataset.

    Load the specified data frame and references into a ReadDataset.

    df

    Data frame to load from.

    references

    References for the ReadDataset, may be empty.

    returns

    Returns a new ReadDataset loaded from the specified data frame and references.

  86. def loadReads(df: DataFrame, metadataPathName: String): ReadDataset

    Permalink

    Load the specified data frame into a ReadDataset, with metadata loaded from the specified metadata path name.

    Load the specified data frame into a ReadDataset, with metadata loaded from the specified metadata path name.

    df

    Data frame to load from.

    metadataPathName

    Path name to load metadata from.

    returns

    Returns a new ReadDataset loaded from the specified data frame, with metadata loaded from the specified metadata path name.

  87. def loadReads(df: DataFrame): ReadDataset

    Permalink

    Load the specified data frame into a ReadDataset, with empty metadata.

    Load the specified data frame into a ReadDataset, with empty metadata.

    df

    Data frame to load from.

    returns

    Returns a new ReadDataset loaded from the specified data frame, with empty metadata.

  88. def loadReferenceFile(pathName: String, maximumLength: Long): ReferenceFile

    Permalink

    Load reference sequences into a broadcastable ReferenceFile.

    Load reference sequences into a broadcastable ReferenceFile.

    If the path name has a .2bit extension, loads a 2bit file. Else, uses loadSlices to load the reference as an RDD, which is then collected to the driver.

    pathName

    The path name to load reference sequences from. Globs/directories for 2bit format are not supported.

    maximumLength

    Maximum fragment length. Defaults to 10000L. Values greater than 1e9 should be avoided.

    returns

    Returns a broadcastable ReferenceFile.

    See also

    loadSlices

  89. def loadRnaSequences(pathName: String, optPredicate: Option[FilterPredicate] = None, optProjection: Option[Schema] = None): SequenceDataset

    Permalink

    Load RNA sequences into a SequenceDataset.

    Load RNA sequences into a SequenceDataset.

    If the path name has a .fa/.fasta extension, load as FASTA format. Else, fall back to Parquet + Avro.

    For FASTA format, compressed files are supported through compression codecs configured in Hadoop, which by default include .gz and .bz2, but can include more.

    pathName

    The path name to load sequences from. Globs/directories are supported, although file extension must be present for FASTA format.

    optPredicate

    An optional pushdown predicate to use when reading Parquet + Avro. Defaults to None.

    optProjection

    An optional projection schema to use when reading Parquet + Avro. Defaults to None.

    returns

    Returns a SequenceDataset containing RNA sequences.

    See also

    loadParquetSequences

    loadFastaRna

  90. def loadSequenceDictionary(pathName: String): SequenceDictionary

    Permalink

    Load a sequence dictionary.

    Load a sequence dictionary.

    Loads path names ending in: * .dict as HTSJDK sequence dictionary format, * .genome as Bedtools genome file format, * .txt as UCSC Genome Browser chromInfo files.

    Compressed files are supported through compression codecs configured in Hadoop, which by default include .gz and .bz2, but can include more.

    pathName

    The path name to load a sequence dictionary from.

    returns

    Returns a sequence dictionary.

    Exceptions thrown

    IllegalArgumentException if pathName file extension not one of .dict, .genome, or .txt

  91. def loadSequences(df: DataFrame, references: SequenceDictionary): SequenceDataset

    Permalink

    Load the specified data frame and references into a SequenceDataset.

    Load the specified data frame and references into a SequenceDataset.

    df

    Data frame to load from.

    references

    References for the SequenceDataset, may be empty.

    returns

    Returns a new SequenceDataset loaded from the specified data frame and references.

  92. def loadSequences(df: DataFrame, metadataPathName: String): SequenceDataset

    Permalink

    Load the specified data frame into a SequenceDataset, with metadata loaded from the specified metadata path name.

    Load the specified data frame into a SequenceDataset, with metadata loaded from the specified metadata path name.

    df

    Data frame to load from.

    metadataPathName

    Path name to load metadata from.

    returns

    Returns a new SequenceDataset loaded from the specified data frame, with metadata loaded from the specified metadata path name.

  93. def loadSequences(df: DataFrame): SequenceDataset

    Permalink

    Load the specified data frame into a SequenceDataset, with empty metadata.

    Load the specified data frame into a SequenceDataset, with empty metadata.

    df

    Data frame to load from.

    returns

    Returns a new SequenceDataset loaded from the specified data frame, with empty metadata.

  94. def loadSlices(df: DataFrame, references: SequenceDictionary): SliceDataset

    Permalink

    Load the specified data frame and references into a SliceDataset.

    Load the specified data frame and references into a SliceDataset.

    df

    Data frame to load from.

    references

    References for the SliceDataset, may be empty.

    returns

    Returns a new SliceDataset loaded from the specified data frame and references.

  95. def loadSlices(df: DataFrame, metadataPathName: String): SliceDataset

    Permalink

    Load the specified data frame into a SliceDataset, with metadata loaded from the specified metadata path name.

    Load the specified data frame into a SliceDataset, with metadata loaded from the specified metadata path name.

    df

    Data frame to load from.

    metadataPathName

    Path name to load metadata from.

    returns

    Returns a new SliceDataset loaded from the specified data frame, with metadata loaded from the specified metadata path name.

  96. def loadSlices(df: DataFrame): SliceDataset

    Permalink

    Load the specified data frame into a SliceDataset, with empty metadata.

    Load the specified data frame into a SliceDataset, with empty metadata.

    df

    Data frame to load from.

    returns

    Returns a new SliceDataset loaded from the specified data frame, with empty metadata.

  97. def loadSlices(pathName: String, maximumLength: Long = 10000L, optPredicate: Option[FilterPredicate] = None, optProjection: Option[Schema] = None): SliceDataset

    Permalink

    Load slices into a SliceDataset.

    Load slices into a SliceDataset.

    If the path name has a .fa/.fasta extension, load as DNA in FASTA format. Else, fall back to Parquet + Avro.

    For FASTA format, compressed files are supported through compression codecs configured in Hadoop, which by default include .gz and .bz2, but can include more.

    pathName

    The path name to load DNA slices from. Globs/directories are supported, although file extension must be present for FASTA format.

    maximumLength

    Maximum slice length. Defaults to 10000L. Values greater than 1e9 should be avoided.

    optPredicate

    An optional pushdown predicate to use when reading Parquet + Avro. Defaults to None.

    optProjection

    An optional projection schema to use when reading Parquet + Avro. Defaults to None.

    returns

    Returns a SliceDataset.

    See also

    loadParquetSlices

    loadFastaDna(String, Long)

  98. def loadUnpairedFastq(pathName: String, setFirstOfPair: Boolean = false, setSecondOfPair: Boolean = false, optReadGroup: Option[String] = None, stringency: ValidationStringency = ValidationStringency.STRICT): AlignmentDataset

    Permalink

    Load unaligned alignments from unpaired FASTQ into an AlignmentDataset.

    Load unaligned alignments from unpaired FASTQ into an AlignmentDataset.

    pathName

    The path name to load unaligned alignments from. Globs/directories are supported.

    setFirstOfPair

    If true, sets the unaligned alignment as first from the fragment. Defaults to false.

    setSecondOfPair

    If true, sets the unaligned alignment as second from the fragment. Defaults to false.

    optReadGroup

    The optional read group identifier to associate to the unaligned alignment records. Defaults to None.

    stringency

    The validation stringency to use when validating unpaired FASTQ format. Defaults to ValidationStringency.STRICT.

    returns

    Returns an unaligned AlignmentDataset.

  99. def loadVariantContexts(df: DataFrame, references: SequenceDictionary, samples: Seq[Sample], headerLines: Seq[VCFHeaderLine]): VariantContextDataset

    Permalink

    Load the specified data frame, references, samples, and header lines into a VariantContextDataset.

    Load the specified data frame, references, samples, and header lines into a VariantContextDataset.

    df

    Data frame to load from.

    references

    References for the VariantContextDataset, may be empty.

    samples

    Samples for the VariantContextDataset, may be empty.

    headerLines

    Header lines for the VariantContextDataset, may be empty.

    returns

    Returns a new VariantContextDataset loaded from the specified data frame, references, samples, and header lines.

  100. def loadVariantContexts(df: DataFrame, references: SequenceDictionary, samples: Seq[Sample]): VariantContextDataset

    Permalink

    Load the specified data frame, references, and samples into a VariantContextDataset, with the default header lines.

    Load the specified data frame, references, and samples into a VariantContextDataset, with the default header lines.

    df

    Data frame to load from.

    references

    References for the VariantContextDataset, may be empty.

    samples

    Samples for the GenotypeDataset, may be empty.

    returns

    Returns a new VariantContextDataset loaded from the specified data frame, references, and samples, with the default header lines.

  101. def loadVariantContexts(df: DataFrame, metadataPathName: String): VariantContextDataset

    Permalink

    Load the specified data frame into a VariantContextDataset, with metadata loaded from the specified metadata path name.

    Load the specified data frame into a VariantContextDataset, with metadata loaded from the specified metadata path name.

    df

    Data frame to load from.

    metadataPathName

    Path name to load metadata from.

    returns

    Returns a new VariantContextDataset loaded from the specified data frame, with metadata loaded from the specified metadata path name.

  102. def loadVariantContexts(df: DataFrame): VariantContextDataset

    Permalink

    Load the specified data frame into a VariantContextDataset, with empty metadata and the default header lines.

    Load the specified data frame into a VariantContextDataset, with empty metadata and the default header lines.

    df

    Data frame to load from.

    returns

    Returns a new VariantContextDataset loaded from the specified data frame, with empty metadata and the default header lines.

  103. def loadVariantContexts(pathName: String): VariantContextDataset

    Permalink

    Load a path name in VCF or Parquet format into a VariantContextDataset.

    Load a path name in VCF or Parquet format into a VariantContextDataset.

    pathName

    The path name to load variant context records from. Globs/directories are supported.

    returns

    Returns a VariantContextDataset.

  104. def loadVariantContexts(pathName: String, stringency: ValidationStringency): VariantContextDataset

    Permalink

    Load a path name in VCF or Parquet format into a VariantContextDataset.

    Load a path name in VCF or Parquet format into a VariantContextDataset.

    pathName

    The path name to load variant context records from. Globs/directories are supported.

    stringency

    The validation stringency to use when validating VCF format.

    returns

    Returns a VariantContextDataset.

  105. def loadVariants(df: DataFrame, references: SequenceDictionary, headerLines: Seq[VCFHeaderLine]): VariantDataset

    Permalink

    Load the specified data frame, references, and header lines into a VariantDataset.

    Load the specified data frame, references, and header lines into a VariantDataset.

    df

    Data frame to load from.

    references

    References for the VariantDataset, may be empty.

    headerLines

    Header lines for the VariantDataset, may be empty.

    returns

    Returns a new VariantDataset loaded from the specified data frame, references, and header lines.

  106. def loadVariants(df: DataFrame, references: SequenceDictionary): VariantDataset

    Permalink

    Load the specified data frame and references into a VariantDataset, with the default header lines.

    Load the specified data frame and references into a VariantDataset, with the default header lines.

    df

    Data frame to load from.

    references

    References for the VariantDataset, may be empty.

    returns

    Returns a new VariantDataset loaded from the specified data frame and references, with the default header lines.

  107. def loadVariants(df: DataFrame, metadataPathName: String): VariantDataset

    Permalink

    Load the specified data frame into a VariantDataset, with metadata loaded from the specified metadata path name.

    Load the specified data frame into a VariantDataset, with metadata loaded from the specified metadata path name.

    df

    Data frame to load from.

    metadataPathName

    Path name to load metadata from.

    returns

    Returns a new VariantDataset loaded from the specified data frame, with metadata loaded from the specified metadata path name.

  108. def loadVariants(df: DataFrame): VariantDataset

    Permalink

    Load the specified data frame into a VariantDataset, with empty metadata and the default header lines.

    Load the specified data frame into a VariantDataset, with empty metadata and the default header lines.

    df

    Data frame to load from.

    returns

    Returns a new VariantDataset loaded from the specified data frame, with empty metadata and the default header lines.

  109. def loadVariants(pathName: String, optPredicate: Option[FilterPredicate] = None, optProjection: Option[Schema] = None, stringency: ValidationStringency = ValidationStringency.STRICT): VariantDataset

    Permalink

    Load variants into a VariantDataset.

    Load variants into a VariantDataset.

    If the path name has a .vcf/.vcf.gz/.vcf.bgz extension, load as VCF format. Else, fall back to Parquet + Avro.

    pathName

    The path name to load variants from. Globs/directories are supported, although file extension must be present for VCF format.

    optPredicate

    An optional pushdown predicate to use when reading Parquet + Avro. Defaults to None.

    optProjection

    An option projection schema to use when reading Parquet + Avro. Defaults to None.

    stringency

    The validation stringency to use when validating VCF format. Defaults to ValidationStringency.STRICT.

    returns

    Returns a VariantDataset.

    See also

    loadParquetVariants

    loadVcf

  110. def loadVcf(pathName: String, stringency: ValidationStringency = ValidationStringency.STRICT): VariantContextDataset

    Permalink

    Load variant context records from VCF into a VariantContextDataset.

    Load variant context records from VCF into a VariantContextDataset.

    pathName

    The path name to load VCF variant context records from. Globs/directories are supported.

    stringency

    The validation stringency to use when validating VCF format. Defaults to ValidationStringency.STRICT.

    returns

    Returns a VariantContextDataset.

  111. def loadVcfWithProjection(pathName: String, infoFields: Set[String], formatFields: Set[String], stringency: ValidationStringency = ValidationStringency.STRICT): VariantContextDataset

    Permalink

    Load variant context records from VCF into a VariantContextDataset.

    Load variant context records from VCF into a VariantContextDataset.

    Only converts the core Genotype/Variant fields, and the fields set in the requested projection. Core variant fields include:

    * Names (ID) * Filters (FILTER)

    Core genotype fields include:

    * Allelic depth (AD) * Read depth (DP) * Min read depth (MIN_DP) * Genotype quality (GQ) * Genotype likelihoods (GL/PL) * Strand bias components (SB) * Phase info (PS,PQ)

    pathName

    The path name to load VCF variant context records from. Globs/directories are supported.

    infoFields

    The info fields to include, in addition to the ID and FILTER attributes.

    formatFields

    The format fields to include, in addition to the core fields listed above.

    stringency

    The validation stringency to use when validating VCF format. Defaults to ValidationStringency.STRICT.

    returns

    Returns a VariantContextDataset.

  112. def logger: Logger

    Permalink
    Attributes
    protected
    Definition Classes
    Logging
  113. def loggerName: String

    Permalink
    Attributes
    protected
    Definition Classes
    Logging
  114. final def ne(arg0: AnyRef): Boolean

    Permalink
    Definition Classes
    AnyRef
  115. final def notify(): Unit

    Permalink
    Definition Classes
    AnyRef
  116. final def notifyAll(): Unit

    Permalink
    Definition Classes
    AnyRef
  117. val sc: SparkContext

    Permalink

    The SparkContext to wrap.

  118. lazy val spark: SparkSession

    Permalink
  119. final def synchronized[T0](arg0: ⇒ T0): T0

    Permalink
    Definition Classes
    AnyRef
  120. def toString(): String

    Permalink
    Definition Classes
    AnyRef → Any
  121. def trace(mkr: Marker, msg: ⇒ Any, t: ⇒ Throwable): Unit

    Permalink
    Attributes
    protected
    Definition Classes
    Logging
  122. def trace(msg: ⇒ Any, t: ⇒ Throwable): Unit

    Permalink
    Attributes
    protected
    Definition Classes
    Logging
  123. def trace(msg: ⇒ Any): Unit

    Permalink
    Attributes
    protected
    Definition Classes
    Logging
  124. final def wait(): Unit

    Permalink
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  125. final def wait(arg0: Long, arg1: Int): Unit

    Permalink
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  126. final def wait(arg0: Long): Unit

    Permalink
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  127. def warn(mkr: Marker, msg: ⇒ Any, t: ⇒ Throwable): Unit

    Permalink
    Attributes
    protected
    Definition Classes
    Logging
  128. def warn(msg: ⇒ Any, t: ⇒ Throwable): Unit

    Permalink
    Attributes
    protected
    Definition Classes
    Logging
  129. def warn(msg: ⇒ Any): Unit

    Permalink
    Attributes
    protected
    Definition Classes
    Logging

Inherited from Logging

Inherited from Serializable

Inherited from Serializable

Inherited from AnyRef

Inherited from Any

Ungrouped