Loads and extracts sequences directly from indexed fasta or fa files.
Implements a traversable collection that is backed by a Parquet file.
A broadcastable ReferenceFile backed by a map containing contig name -> Seq[NucleotideContigFragment] pairs.
A broadcastable ReferenceFile backed by a map containing contig name -> Seq[NucleotideContigFragment] pairs.
a map containing a Seq of contig fragments per contig.
File that contains a reference assembly that can be broadcasted
Represents a set of reference sequences backed by a .2bit file.
Represents a set of reference sequences backed by a .2bit file.
See http://genome.ucsc.edu/FAQ/FAQformat.html#format7 for the spec.
AttributeUtils is a utility object for parsing optional fields from a BAM file, or the attributes column from an ADAM file.
Object for reading Bedtools genome files from disk.
Object for reading Bedtools genome files from disk. Also supports UCSC Genome Browser chromInfo files.
Helper object for setting the logging level for Parquet.
Helper singleton for converting Phred scores to/from probabilities.
Helper singleton for converting Phred scores to/from probabilities.
As a reminder, given an error probability \epsilon, the Phred score q is:
q = -10 log_{10} \epsilon
Companion object for creating a ReferenceContigMap from an RDD of contig fragments.
Object for reading sequence dictionary files (.dict) from disk.
Writes an RDD to disk as text and optionally merges.
Loads and extracts sequences directly from indexed fasta or fa files. filePath requires fai index in the same directory with same naming convention.
path to fasta or fa index