The ADAMContext to wrap.
Returns the Java Spark Context associated with this Java ADAM Context.
Load alignment records into an AlignmentRecordRDD (java-friendly method).
Load alignment records into an AlignmentRecordRDD (java-friendly method).
Loads path names ending in: * .bam/.cram/.sam as BAM/CRAM/SAM format, * .fa/.fasta as FASTA format, * .fq/.fastq as FASTQ format, and * .ifq as interleaved FASTQ format.
If none of these match, fall back to Parquet + Avro.
For FASTA, FASTQ, and interleaved FASTQ formats, compressed files are supported through compression codecs configured in Hadoop, which by default include .gz and .bz2, but can include more.
The path name to load alignment records from. Globs/directories are supported, although file extension must be present for BAM/CRAM/SAM, FASTA, and FASTQ formats.
The validation stringency to use when validating BAM/CRAM/SAM or FASTQ formats.
Returns an AlignmentRecordRDD which wraps the RDD of alignment records, sequence dictionary representing contigs the alignment records may be aligned to, and the record group dictionary for the alignment records if one is available.
ADAMContext#loadAlignments
Load alignment records into an AlignmentRecordRDD (java-friendly method).
Load alignment records into an AlignmentRecordRDD (java-friendly method).
Loads path names ending in: * .bam/.cram/.sam as BAM/CRAM/SAM format, * .fa/.fasta as FASTA format, * .fq/.fastq as FASTQ format, and * .ifq as interleaved FASTQ format.
If none of these match, fall back to Parquet + Avro.
For FASTA, FASTQ, and interleaved FASTQ formats, compressed files are supported through compression codecs configured in Hadoop, which by default include .gz and .bz2, but can include more.
The path name to load alignment records from. Globs/directories are supported, although file extension must be present for BAM/CRAM/SAM, FASTA, and FASTQ formats.
Returns an AlignmentRecordRDD which wraps the RDD of alignment records, sequence dictionary representing contigs the alignment records may be aligned to, and the record group dictionary for the alignment records if one is available.
ADAMContext#loadAlignments
Load nucleotide contig fragments into a NucleotideContigFragmentRDD (java-friendly method).
Load nucleotide contig fragments into a NucleotideContigFragmentRDD (java-friendly method).
If the path name has a .fa/.fasta extension, load as FASTA format. Else, fall back to Parquet + Avro.
For FASTA format, compressed files are supported through compression codecs configured in Hadoop, which by default include .gz and .bz2, but can include more.
The path name to load nucleotide contig fragments from. Globs/directories are supported, although file extension must be present for FASTA format.
Returns a NucleotideContigFragmentRDD.
ADAMContext#loadContigFragments
Load features into a FeatureRDD and convert to a CoverageRDD (java-friendly method).
Load features into a FeatureRDD and convert to a CoverageRDD (java-friendly method). Coverage is stored in the score field of Feature.
Loads path names ending in: * .bed as BED6/12 format, * .gff3 as GFF3 format, * .gtf/.gff as GTF/GFF2 format, * .narrow[pP]eak as NarrowPeak format, and * .interval_list as IntervalList format.
If none of these match, fall back to Parquet + Avro.
For BED6/12, GFF3, GTF/GFF2, NarrowPeak, and IntervalList formats, compressed files are supported through compression codecs configured in Hadoop, which by default include .gz and .bz2, but can include more.
The path name to load features from. Globs/directories are supported, although file extension must be present for BED6/12, GFF3, GTF/GFF2, NarrowPeak, or IntervalList formats.
The validation stringency to use when validating BED6/12, GFF3, GTF/GFF2, NarrowPeak, or IntervalList formats.
Returns a FeatureRDD converted to a CoverageRDD.
ADAMContext#loadCoverage
Load features into a FeatureRDD and convert to a CoverageRDD (java-friendly method).
Load features into a FeatureRDD and convert to a CoverageRDD (java-friendly method). Coverage is stored in the score field of Feature.
Loads path names ending in: * .bed as BED6/12 format, * .gff3 as GFF3 format, * .gtf/.gff as GTF/GFF2 format, * .narrow[pP]eak as NarrowPeak format, and * .interval_list as IntervalList format.
If none of these match, fall back to Parquet + Avro.
For BED6/12, GFF3, GTF/GFF2, NarrowPeak, and IntervalList formats, compressed files are supported through compression codecs configured in Hadoop, which by default include .gz and .bz2, but can include more.
The path name to load features from. Globs/directories are supported, although file extension must be present for BED6/12, GFF3, GTF/GFF2, NarrowPeak, or IntervalList formats.
Returns a FeatureRDD converted to a CoverageRDD.
ADAMContext#loadCoverage
Load features into a FeatureRDD (java-friendly method).
Load features into a FeatureRDD (java-friendly method).
Loads path names ending in: * .bed as BED6/12 format, * .gff3 as GFF3 format, * .gtf/.gff as GTF/GFF2 format, * .narrow[pP]eak as NarrowPeak format, and * .interval_list as IntervalList format.
If none of these match, fall back to Parquet + Avro.
For BED6/12, GFF3, GTF/GFF2, NarrowPeak, and IntervalList formats, compressed files are supported through compression codecs configured in Hadoop, which by default include .gz and .bz2, but can include more.
The path name to load features from. Globs/directories are supported, although file extension must be present for BED6/12, GFF3, GTF/GFF2, NarrowPeak, or IntervalList formats.
The validation stringency to use when validating BED6/12, GFF3, GTF/GFF2, NarrowPeak, or IntervalList formats.
Returns a FeatureRDD.
ADAMContext#loadFeatures
Load features into a FeatureRDD (java-friendly method).
Load features into a FeatureRDD (java-friendly method).
Loads path names ending in: * .bed as BED6/12 format, * .gff3 as GFF3 format, * .gtf/.gff as GTF/GFF2 format, * .narrow[pP]eak as NarrowPeak format, and * .interval_list as IntervalList format.
If none of these match, fall back to Parquet + Avro.
For BED6/12, GFF3, GTF/GFF2, NarrowPeak, and IntervalList formats, compressed files are supported through compression codecs configured in Hadoop, which by default include .gz and .bz2, but can include more.
The path name to load features from. Globs/directories are supported, although file extension must be present for BED6/12, GFF3, GTF/GFF2, NarrowPeak, or IntervalList formats.
Returns a FeatureRDD.
ADAMContext#loadFeatures
Load fragments into a FragmentRDD (java-friendly method).
Load fragments into a FragmentRDD (java-friendly method).
Loads path names ending in: * .bam/.cram/.sam as BAM/CRAM/SAM format and * .ifq as interleaved FASTQ format.
If none of these match, fall back to Parquet + Avro.
For interleaved FASTQ format, compressed files are supported through compression codecs configured in Hadoop, which by default include .gz and .bz2, but can include more.
The path name to load fragments from. Globs/directories are supported, although file extension must be present for BAM/CRAM/SAM and FASTQ formats.
The validation stringency to use when validating BAM/CRAM/SAM or FASTQ formats.
Returns a FragmentRDD.
ADAMContext#loadFragments
Load fragments into a FragmentRDD (java-friendly method).
Load fragments into a FragmentRDD (java-friendly method).
Loads path names ending in: * .bam/.cram/.sam as BAM/CRAM/SAM format and * .ifq as interleaved FASTQ format.
If none of these match, fall back to Parquet + Avro.
For interleaved FASTQ format, compressed files are supported through compression codecs configured in Hadoop, which by default include .gz and .bz2, but can include more.
The path name to load fragments from. Globs/directories are supported, although file extension must be present for BAM/CRAM/SAM and FASTQ formats.
Returns a FragmentRDD.
ADAMContext#loadFragments
Load genotypes into a GenotypeRDD (java-friendly method).
Load genotypes into a GenotypeRDD (java-friendly method).
If the path name has a .vcf/.vcf.gz/.vcf.bgzf/.vcf.bgz extension, load as VCF format. Else, fall back to Parquet + Avro.
The path name to load genotypes from. Globs/directories are supported, although file extension must be present for VCF format.
The validation stringency to use when validating VCF format.
Returns a GenotypeRDD.
ADAMContext#loadGenotypes
Load genotypes into a GenotypeRDD (java-friendly method).
Load genotypes into a GenotypeRDD (java-friendly method).
If the path name has a .vcf/.vcf.gz/.vcf.bgzf/.vcf.bgz extension, load as VCF format. Else, fall back to Parquet + Avro.
The path name to load genotypes from. Globs/directories are supported, although file extension must be present for VCF format.
Returns a GenotypeRDD.
ADAMContext#loadGenotypes
Load reference sequences into a broadcastable ReferenceFile (java-friendly method).
Load reference sequences into a broadcastable ReferenceFile (java-friendly method).
If the path name has a .2bit extension, loads a 2bit file. Else, uses loadContigFragments to load the reference as an RDD, which is then collected to the driver. Uses a maximum fragment length of 10kbp.
The path name to load reference sequences from. Globs/directories for 2bit format are not supported.
Returns a broadcastable ReferenceFile.
loadContigFragments
Load reference sequences into a broadcastable ReferenceFile (java-friendly method).
Load reference sequences into a broadcastable ReferenceFile (java-friendly method).
If the path name has a .2bit extension, loads a 2bit file. Else, uses loadContigFragments to load the reference as an RDD, which is then collected to the driver.
The path name to load reference sequences from. Globs/directories for 2bit format are not supported.
Maximum fragment length. Defaults to 10000L. Values greater than 1e9 should be avoided.
Returns a broadcastable ReferenceFile.
loadContigFragments
Load variants into a VariantRDD (java-friendly method).
Load variants into a VariantRDD (java-friendly method).
If the path name has a .vcf/.vcf.gz/.vcf.bgzf/.vcf.bgz extension, load as VCF format. Else, fall back to Parquet + Avro.
The path name to load variants from. Globs/directories are supported, although file extension must be present for VCF format.
The validation stringency to use when validating VCF format.
Returns a VariantRDD.
ADAMContext#loadVariants
Load variants into a VariantRDD (java-friendly method).
Load variants into a VariantRDD (java-friendly method).
If the path name has a .vcf/.vcf.gz/.vcf.bgzf/.vcf.bgz extension, load as VCF format. Else, fall back to Parquet + Avro.
The path name to load variants from. Globs/directories are supported, although file extension must be present for VCF format.
Returns a VariantRDD.
ADAMContext#loadVariants
The JavaADAMContext provides java-friendly functions on top of ADAMContext.