The ADAMContext to wrap.
Returns the Java Spark Context associated with this Java ADAM Context.
(Java-specific) Load alignments into an AlignmentDataset.
(Java-specific) Load alignments into an AlignmentDataset.
Loads path names ending in: * .bam/.cram/.sam as BAM/CRAM/SAM format, * .fa/.fasta as FASTA format, * .fq/.fastq as FASTQ format, and * .ifq as interleaved FASTQ format.
If none of these match, fall back to Parquet + Avro.
For FASTA, FASTQ, and interleaved FASTQ formats, compressed files are supported through compression codecs configured in Hadoop, which by default include .gz and .bz2, but can include more.
The path name to load alignments from. Globs/directories are supported, although file extension must be present for BAM/CRAM/SAM, FASTA, and FASTQ formats.
The validation stringency to use when validating BAM/CRAM/SAM or FASTQ formats.
Returns an AlignmentDataset which wraps the genomic dataset of alignments, sequence dictionary representing contigs the alignments may be aligned to, and the read group dictionary for the alignments if one is available.
ADAMContext#loadAlignments
(Java-specific) Load alignments into an AlignmentDataset.
(Java-specific) Load alignments into an AlignmentDataset.
Loads path names ending in: * .bam/.cram/.sam as BAM/CRAM/SAM format, * .fa/.fasta as FASTA format, * .fq/.fastq as FASTQ format, and * .ifq as interleaved FASTQ format.
If none of these match, fall back to Parquet + Avro.
For FASTA, FASTQ, and interleaved FASTQ formats, compressed files are supported through compression codecs configured in Hadoop, which by default include .gz and .bz2, but can include more.
The path name to load alignments from. Globs/directories are supported, although file extension must be present for BAM/CRAM/SAM, FASTA, and FASTQ formats.
Returns an AlignmentDataset which wraps the genomic dataset of alignments, sequence dictionary representing contigs the alignments may be aligned to, and the read group dictionary for the alignments if one is available.
ADAMContext#loadAlignments
(Java-specific) Load features into a FeatureDataset and convert to a CoverageDataset.
(Java-specific) Load features into a FeatureDataset and convert to a CoverageDataset. Coverage is stored in the score field of Feature.
Loads path names ending in: * .bed as BED6/12 format, * .gff3 as GFF3 format, * .gtf/.gff as GTF/GFF2 format, * .narrow[pP]eak as NarrowPeak format, and * .interval_list as IntervalList format.
If none of these match, fall back to Parquet + Avro.
For BED6/12, GFF3, GTF/GFF2, NarrowPeak, and IntervalList formats, compressed files are supported through compression codecs configured in Hadoop, which by default include .gz and .bz2, but can include more.
The path name to load features from. Globs/directories are supported, although file extension must be present for BED6/12, GFF3, GTF/GFF2, NarrowPeak, or IntervalList formats.
The validation stringency to use when validating BED6/12, GFF3, GTF/GFF2, NarrowPeak, or IntervalList formats.
Returns a FeatureDataset converted to a CoverageDataset.
ADAMContext#loadCoverage
(Java-specific) Load features into a FeatureDataset and convert to a CoverageDataset.
(Java-specific) Load features into a FeatureDataset and convert to a CoverageDataset. Coverage is stored in the score field of Feature.
Loads path names ending in: * .bed as BED6/12 format, * .gff3 as GFF3 format, * .gtf/.gff as GTF/GFF2 format, * .narrow[pP]eak as NarrowPeak format, and * .interval_list as IntervalList format.
If none of these match, fall back to Parquet + Avro.
For BED6/12, GFF3, GTF/GFF2, NarrowPeak, and IntervalList formats, compressed files are supported through compression codecs configured in Hadoop, which by default include .gz and .bz2, but can include more.
The path name to load features from. Globs/directories are supported, although file extension must be present for BED6/12, GFF3, GTF/GFF2, NarrowPeak, or IntervalList formats.
Returns a FeatureDataset converted to a CoverageDataset.
ADAMContext#loadCoverage
(Java-specific) Load DNA sequences into a SequenceDataset.
(Java-specific) Load DNA sequences into a SequenceDataset.
If the path name has a .fa/.fasta extension, load as FASTA format. Else, fall back to Parquet + Avro.
For FASTA format, compressed files are supported through compression codecs configured in Hadoop, which by default include .gz and .bz2, but can include more.
The path name to load sequences from. Globs/directories are supported, although file extension must be present for FASTA format.
Returns a SequenceDataset containing DNA sequences.
ADAMContext#loadParquetSequences
ADAMContext#loadFastaDna
(Java-specific) Load features into a FeatureDataset.
(Java-specific) Load features into a FeatureDataset.
Loads path names ending in: * .bed as BED6/12 format, * .gff3 as GFF3 format, * .gtf/.gff as GTF/GFF2 format, * .narrow[pP]eak as NarrowPeak format, and * .interval_list as IntervalList format.
If none of these match, fall back to Parquet + Avro.
For BED6/12, GFF3, GTF/GFF2, NarrowPeak, and IntervalList formats, compressed files are supported through compression codecs configured in Hadoop, which by default include .gz and .bz2, but can include more.
The path name to load features from. Globs/directories are supported, although file extension must be present for BED6/12, GFF3, GTF/GFF2, NarrowPeak, or IntervalList formats.
The validation stringency to use when validating BED6/12, GFF3, GTF/GFF2, NarrowPeak, or IntervalList formats.
Returns a FeatureDataset.
ADAMContext#loadFeatures
(Java-specific) Load features into a FeatureDataset.
(Java-specific) Load features into a FeatureDataset.
Loads path names ending in: * .bed as BED6/12 format, * .gff3 as GFF3 format, * .gtf/.gff as GTF/GFF2 format, * .narrow[pP]eak as NarrowPeak format, and * .interval_list as IntervalList format.
If none of these match, fall back to Parquet + Avro.
For BED6/12, GFF3, GTF/GFF2, NarrowPeak, and IntervalList formats, compressed files are supported through compression codecs configured in Hadoop, which by default include .gz and .bz2, but can include more.
The path name to load features from. Globs/directories are supported, although file extension must be present for BED6/12, GFF3, GTF/GFF2, NarrowPeak, or IntervalList formats.
Returns a FeatureDataset.
ADAMContext#loadFeatures
(Java-specific) Load fragments into a FragmentDataset.
(Java-specific) Load fragments into a FragmentDataset.
Loads path names ending in: * .bam/.cram/.sam as BAM/CRAM/SAM format and * .ifq as interleaved FASTQ format.
If none of these match, fall back to Parquet + Avro.
For interleaved FASTQ format, compressed files are supported through compression codecs configured in Hadoop, which by default include .gz and .bz2, but can include more.
The path name to load fragments from. Globs/directories are supported, although file extension must be present for BAM/CRAM/SAM and FASTQ formats.
The validation stringency to use when validating BAM/CRAM/SAM or FASTQ formats.
Returns a FragmentDataset.
ADAMContext#loadFragments
(Java-specific) Load fragments into a FragmentDataset.
(Java-specific) Load fragments into a FragmentDataset.
Loads path names ending in: * .bam/.cram/.sam as BAM/CRAM/SAM format and * .ifq as interleaved FASTQ format.
If none of these match, fall back to Parquet + Avro.
For interleaved FASTQ format, compressed files are supported through compression codecs configured in Hadoop, which by default include .gz and .bz2, but can include more.
The path name to load fragments from. Globs/directories are supported, although file extension must be present for BAM/CRAM/SAM and FASTQ formats.
Returns a FragmentDataset.
ADAMContext#loadFragments
(Java-specific) Load genotypes into a GenotypeDataset.
(Java-specific) Load genotypes into a GenotypeDataset.
If the path name has a .vcf/.vcf.gz/.vcf.bgzf/.vcf.bgz extension, load as VCF format. Else, fall back to Parquet + Avro.
The path name to load genotypes from. Globs/directories are supported, although file extension must be present for VCF format.
The validation stringency to use when validating VCF format.
Returns a GenotypeDataset.
ADAMContext#loadGenotypes
(Java-specific) Load genotypes into a GenotypeDataset.
(Java-specific) Load genotypes into a GenotypeDataset.
If the path name has a .vcf/.vcf.gz/.vcf.bgzf/.vcf.bgz extension, load as VCF format. Else, fall back to Parquet + Avro.
The path name to load genotypes from. Globs/directories are supported, although file extension must be present for VCF format.
Returns a GenotypeDataset.
ADAMContext#loadGenotypes
(Java-specific) Functions like loadBam, but uses BAM index files to look at fewer blocks, and only returns records within the specified ReferenceRegions.
(Java-specific) Functions like loadBam, but uses BAM index files to look at fewer blocks, and only returns records within the specified ReferenceRegions. BAM index file required.
The path name to load indexed BAM formatted alignments from. Globs/directories are supported.
Iterable of ReferenceRegion we are filtering on.
The validation stringency to use when validating the BAM/CRAM/SAM format header. Defaults to ValidationStringency.STRICT.
Returns an AlignmentDataset which wraps the genomic dataset of alignments, sequence dictionary representing contigs the alignments may be aligned to, and the read group dictionary for the alignments if one is available.
(Java-specific) Load protein sequences into a SequenceDataset.
(Java-specific) Load protein sequences into a SequenceDataset.
If the path name has a .fa/.fasta extension, load as FASTA format. Else, fall back to Parquet + Avro.
For FASTA format, compressed files are supported through compression codecs configured in Hadoop, which by default include .gz and .bz2, but can include more.
The path name to load sequences from. Globs/directories are supported, although file extension must be present for FASTA format.
Returns a SequenceDataset containing protein sequences.
ADAMContext#loadParquetSequences
ADAMContext#loadFastaProtein
(Java-specific) Load reference sequences into a broadcastable ReferenceFile.
(Java-specific) Load reference sequences into a broadcastable ReferenceFile.
If the path name has a .2bit extension, loads a 2bit file. Else, uses loadSlices to load the reference as an RDD, which is then collected to the driver. Uses a maximum fragment length of 10kbp.
The path name to load reference sequences from. Globs/directories for 2bit format are not supported.
Returns a broadcastable ReferenceFile.
ADAMContext#loadSlices
(Java-specific) Load reference sequences into a broadcastable ReferenceFile.
(Java-specific) Load reference sequences into a broadcastable ReferenceFile.
If the path name has a .2bit extension, loads a 2bit file. Else, uses loadSlices to load the reference as an RDD, which is then collected to the driver.
The path name to load reference sequences from. Globs/directories for 2bit format are not supported.
Maximum fragment length. Defaults to 10000L. Values greater than 1e9 should be avoided.
Returns a broadcastable ReferenceFile.
ADAMContext#loadSlices
(Java-specific) Load RNA sequences into a SequenceDataset.
(Java-specific) Load RNA sequences into a SequenceDataset.
If the path name has a .fa/.fasta extension, load as FASTA format. Else, fall back to Parquet + Avro.
For FASTA format, compressed files are supported through compression codecs configured in Hadoop, which by default include .gz and .bz2, but can include more.
The path name to load sequences from. Globs/directories are supported, although file extension must be present for FASTA format.
Returns a SequenceDataset containing RNA sequences.
ADAMContext#loadParquetSequences
ADAMContext#loadFastaRna
(R-specific) Load slices into a SliceDataset.
(R-specific) Load slices into a SliceDataset.
If the path name has a .fa/.fasta extension, load as DNA in FASTA format. Else, fall back to Parquet + Avro.
For FASTA format, compressed files are supported through compression codecs configured in Hadoop, which by default include .gz and .bz2, but can include more.
The path name to load DNA slices from. Globs/directories are supported, although file extension must be present for FASTA format.
Maximum fragment length, in Double data type to support dispatch from SparkR.
Returns a SliceDataset.
(Java/Python-specific) Load slices into a SliceDataset.
(Java/Python-specific) Load slices into a SliceDataset.
If the path name has a .fa/.fasta extension, load as DNA in FASTA format. Else, fall back to Parquet + Avro.
For FASTA format, compressed files are supported through compression codecs configured in Hadoop, which by default include .gz and .bz2, but can include more.
The path name to load DNA slices from. Globs/directories are supported, although file extension must be present for FASTA format.
Maximum slice length, reduced to Integer data type to support dispatch from Python.
Returns a SliceDataset.
(Java-specific) Load variants into a VariantDataset.
(Java-specific) Load variants into a VariantDataset.
If the path name has a .vcf/.vcf.gz/.vcf.bgzf/.vcf.bgz extension, load as VCF format. Else, fall back to Parquet + Avro.
The path name to load variants from. Globs/directories are supported, although file extension must be present for VCF format.
The validation stringency to use when validating VCF format.
Returns a VariantDataset.
ADAMContext#loadVariants
(Java-specific) Load variants into a VariantDataset.
(Java-specific) Load variants into a VariantDataset.
If the path name has a .vcf/.vcf.gz/.vcf.bgzf/.vcf.bgz extension, load as VCF format. Else, fall back to Parquet + Avro.
The path name to load variants from. Globs/directories are supported, although file extension must be present for VCF format.
Returns a VariantDataset.
ADAMContext#loadVariants
The JavaADAMContext provides java-friendly functions on top of ADAMContext.