All Classes and Interfaces
Class
Description
Abstract class that is designed to be extended and specialized to provide an asynchronous
wrapper around any kind of Writer class that takes an object and writes it out somehow.
Provides basic, generic capabilities to be used reading BAM index files.
Simple basic class providing much of the basic functionality of codecs
Every concrete subclass must implement
FeatureCodec.canDecode(String)
to indicate whether it can decode the file.jrobinso
An abstract implementation of the index class.
Base class of implementing iterators.
Javascript filter with HEADER type containing TYPE records.
The unit of iteration for AbstractLocusIterator.
Iterator that traverses a SAM File, accumulating information on a per-locus basis.
Abstract implementation of a Little progress logging class to facilitate consistent output of useful information when progressing
through a stream of SAM records.
Holds a SAMRecord plus the zero-based offset into that SAMRecord's bases and quality scores that corresponds
to the base and quality at the genomic position described the containing AbstractLocusInfo.
Classifies whether the given event is a match, insertion, or deletion.
Base class for the various concrete records in a SAM header, providing uniform
access to the attributes.
Aggregates multiple filters and provides a method for applying them all to a given record with
one method call.
Filter to either include or exclude aligned reads
$Id$
Represents the contiguous alignment of a subset of read bases to a reference
sequence.
An AlignmentContext represents mapping information related to a collection of reads, or a single
CRAMCompressionRecord
, Slice
, or Container
.A span of over a single reference.
Immutable representation of an allele.
This is a copy of
TabixIndexCreator
, except sequence
names are populated from the header, not from the ones that are seen.A convenience base class for codecs that want to read in features from ASCII lines.
A simple class that provides
AsciiLineReader.readLine()
functionality around a PositionalBufferedStream
BufferedReader
and its BufferedReader.readLine()
method should be used in preference to this class (when the
LocationAware
functionality is not required) because it offers greater performance.A class that iterates over the lines and line positions in an
AsciiLineReader
.Fast (I hope) buffered Writer that converts char to byte merely by casting, rather than charset conversion.
Asynchronous read-ahead implementation of
BlockCompressedInputStream
.Iterator that uses a dedicated background thread to perform read-ahead to improve
throughput at the expense of increased latency.
Implementation of a FastqWriter that provides asynchronous output.
AsyncVariantContextWriter that can be wrapped around an underlying AsyncVariantContextWriter to provide asynchronous output.
Implementation of an asynchronous writer pool.
Class used to construct a BAI index for a CRAM file.
InternalAPI
Base class for
BundleResourceType.FMT_READS_BAM
codecs.BAM v1.0 codec.
InternalAPI
Base class for
BundleResourceType.FMT_READS_BAM
decoders.Decoder options specific to BAM decoders.
BAM v1.0 decoder.
InternalAPI
Base class for
BundleResourceType.FMT_READS_BAM
encoders.Encoder options specific to BAM encoders.
BAM v1.0 encoder.
Class for reading and querying BAM files.
An ordered list of chunks, capable of representing a set of discontiguous
regions in the BAM file.
Concrete implementation of SAMFileWriter for writing gzipped BAM files.
A basic interface for querying BAM indices.
Class for both constructing BAM index content and writing it out.
Merges BAM index files for (headerless) parts of a BAM file into a single
index file.
Metadata about the bam index contained within the bam index.
Class to validate (at two different levels of thoroughness) the index for a BAM file.
Filters out records that do not match any of the given intervals and query type.
Wrapper class for binary BAM records.
Class for translating between in-memory and disk representation of BAMRecord.
Writes SBI files for BAM files, as understood by
SBIIndex
.A decorating iterator that filters out records that do not match the given reference and start position.
Class for writing SAMRecords in BAM format to an output stream.
A read feature representing a single quality score in a read.
In general FastqWriterFactory should be used so that AsyncFastqWriter can be enabled, but there are some
cases in which that behavior is explicitly not wanted.
Decode BCF2 files
See #BCFWriter for documentation on this classes role in encoding BCF2 files
See #BCFWriter for documentation on this classes role in encoding BCF2 files
Specialized int encoder for atomic (non-list) integers
See #BCFWriter for documentation on this classes role in encoding BCF2 files
See #BCFWriter for documentation on this classes role in encoding BCF2 files
An efficient scheme for building and obtaining specialized
genotype field decoders.
Decoder a field (implicit from creation) encoded as
typeDescriptor in the decoder object in the GenotypeBuilders
one for each sample in order.
Lazy version of genotypes decoder for BCF2 genotypes
BCF2 types and associated information
Common utilities for working with BCF2 files
Includes convenience methods for encoding, decoding BCF2 type descriptors (size + type)
Simple holder for BCF version information
User: depristo
Date: 8/2/12
Time: 2:16 PM
Codec for parsing BED file, as described by UCSC
See https://genome.ucsc.edu/FAQ/FAQformat.html#format1
Indicate whether co-ordinates or 0-based or 1-based.
Annotation indicating that a package, class, method, or type is release level "BETA", and is not part
of the stable public API.
Encodes integers by adding a constant offset value to a range of values in order to reduce
the necessary number of bits needed to store each value.
An individual bin in a BAM file.
Encapsulates file representation of various primitive data types.
Implements common methods of
FeatureCodec
s that read from PositionalBufferedStream
s.Converter between disk and in-memory representation of a SAMRecord tag.
Provides a list of all bins which could exist in the BAM file.
Builder for a BinningIndexContent object.
coordinates are 1-based, inclusive
In-memory representation of the binning index for a single reference.
This class is used to encapsulate the list of Bins store in the BAMIndexContent
While it is currently represented as an array, we may decide to change it to an ArrayList or other structure
A modified version of the Apache Math implementation of binomial
coefficient calculation
Derived from code within the CombinatoricsUtils and FastMath classes
within Commons Math3 (https://commons.apache.org/proper/commons-math/)
Included here for use in Genotype Likelihoods calculation, instead
of adding Commons Math3 as a dependency
Commons Math3 is licensed using the Apache License 2.0
Full text of this license can be found here:
https://www.apache.org/licenses/LICENSE-2.0.txt
This product includes software developed at
The Apache Software Foundation (http://www.apache.org/).
An individual bin of a CSI index for BAM files.
An interface to describe the requirements for reading bit data as opposed to bytes.
An interface to describe the requirements for writing out bits as opposed to bytes.
Class representing CRAM block concept and some methods to operate with block content.
Represents a contiguous block of bytes in a file, defined by a start position and size (in bytes)
Static for manipulating virtual file pointers in BGZF files.
A block-compressed FASTA file driven by an index for fast lookups.
Utility class for reading BGZF block compressed files.
Writer for a file that is a series of gzip blocks (BGZF format).
Constants shared by BlockCompressed{Input,Output}Stream classes
The block compression methods specified by Section 8 of the CRAM spec.
The block content types specified by Section 8.1 of the CRAM spec
Alternative to GZIPInputStream, for decompressing GZIP blocks that are already loaded into a byte[].
An index interface with additional functionality for querying and inspecting the structure of a BAM index.
Implementation of LineReader that is a thin wrapper around BufferedReader.
An immutable collection of related resources, including a (single, required) primary resource, such as "reads",
"variants", "features", or "reference", plus zero or more related secondary resources ("index", "dictionary",
"MD5", etc.).
A builder class for
Bundle
s.Methods for serializing and deserializing Bundles to and from JSON strings.
Interface defined for bundle resource objects that may be included in a
Bundle
.Base class for
BundleResource
implementations.Constants for specifying standard content types and formats for resources contained in a
Bundle
.NOTE: this encoding can be a hybrid encoding in that it ALLOWS for the possibility to split it's data
between the core block and an external block (i.e., if lenEncoding is CORE and byteEncoding is EXTERNAL)
This has implications for data access, since some of it's data is interleaved with other data in the
core block.
Created by vadim on 23/03/2015.
CRAMEncoding class for Huffman byte values.
CRAMEncoding class for Huffman integer values.
Represents an index on a specific chromosome
A [start,stop) file pointer pairing into the BAM file, stored
as a BAM file index.
A list of CigarElements, which describes how a read aligns with the reference.
One component of a cigar string.
The operators that can appear in a cigar string, and information about their disk representations.
Utility class that can scan for classes in the classpath and find all the ones
annotated with a particular annotation.
This interface is used by iterators that use releasable resources during iteration.
The basic iterator we use in Tribble, which allows closing and basic iteration.
Utility to close things that implement Closeable
WARNING: This should only be used for Closeable things open for read, because it ignores exceptions, and
the caller will probably want to know about exceptions when closing a file being written to, because
this may indicate a failure to flush.
Class CodecLineParsingException
a generic exception we use if the codec has trouble parsing the line its given
Miscellaneous util methods that don't fit anywhere else.
Small utility methods for dealing with collection classes.
A defaulting map, which returns a default value when a value that does not exist in the map is looked up.
Simple multi-map for convenience of storing collections in map values.
Deprecated.
use Collectors.groupingBy instead
Common utility routines for VariantContext and Genotype
A simple extension of the Tuple class that, for comparable Types, allows comparing Tuples of non-null elements.
A Predicate on VariantContexts that returns true when either all its sub-predicates are true, or none are false.
Maintains a map of DataSeries to EncodingDescriptor, and a second map that contains the compressor to use
for each EncodingDescriptor that represents an EXTERNAL encoding.
Factory for creating CRAM compression headers for containers when writing to a CRAM stream.
Maintain a cache of reusable compressor instances in order to reduce the need to repeatedly
instantiate them, since some, like the RANS de/compressor, allocate large numbers (~256k) of
small temporary objects every time they're instantiated.
Notes: Container will construct a container out of as many CRAMCompressionRecords as it is handed, respecting only
the maximum number of slices.
Holds info about a mate pair for use when processing a coordinate sorted file.
Client must implement this class, which defines the way in which records are written to and
read from file.
Static methods that encapsulate the standard SAM way of storing ranges: one-based, with both ends
inclusive.
An input stream that wraps a
SeekableStream
to produce only bytes specified within coordinates.Superclass of Codecs which operate on Core Block bit streams
Contrast with
htsjdk.samtools.cram.encoding.external.ExternalCodec<T>
for External Block byte streamsAn input stream that counts the bytes read from it.
An example of how to index a feature file, and then count all the records in the file.
A class representing CRAI index entry: file and alignment offsets for each slice.
CRAI index used for CRAM files.
Merges CRAM index files for (headerless) parts of a CRAM file into a single index file.
Class for both constructing BAM index content and writing it out.
InternalAPI
Base class for
BundleResourceType.FMT_READS_CRAM
codecs.An interface that defines requirements for serializing/deserializing objects into and from a stream.
CRAM v2.1 codec
CRAM v3.0 codec
A CRAMRecord represents a SAMRecord that has been transformed to CRAM-style representation.
Iterate over CRAM containers from an input stream, and unlike
CramContainerIterator
only
the header of each container is read, rather than the whole stream.An iterator of CRAM containers read from an
InputStream
.Class for writing SAMRecords into a series of CRAM containers on an output stream, with an optional index.
Indexer for creating/reading/writing a CRAIIndex for a CRAM file/stream.
InternalAPI
Base class for
BundleResourceType.FMT_READS_CRAM
decoders.Decoder options specific to CRAM decoders.
CRAM v2.1decoder.
CRAM v3.0 decoder.
InternalAPI
Base class for
BundleResourceType.FMT_READS_CRAM
decoders.Encoder options specific to CRAM encoders.
CRAM v2.1 encoder.
CRAM v3.0 encoder.
A base class for the various CRAM encodings.
Parameters that can be set to control the encoding strategy used when writing CRAM.
Created by edwardk on 8/13/15.
BAMFileReader
analogue for CRAM files.A CRAM file header, including the file format definition (including CRAM version and content id),
and the SAMFileHeader.
Interface for indexing CRAM.
Methods to read and write CRAM int values as given in the file format specification.
Methods to read and write CRAM array of integers data type.
A collection of methods to read and write special values to/from CRAM files.
A lazy CRAMReferenceSource implementation, for use when no explicit reference source has been provided
by the user.
A reader used to consume and populate encoded
CRAMCompressionRecord
s from a set of streams representing data
series/blocks in a Slice.Class for handling the read features for a
CRAMCompressionRecord
.A writer that emits CRAMCompressionRecord into the various streams that represent a Slice's data series blocks.
Holds a region/fragment of a reference contig.
Interface used to supply a reference source when reading CRAM files.
An iterator of CRAM containers read from locations in a
SeekableStream
.A class to represent a version information, 3 number: major, minor and build number.
An input stream that calculates CRC32 of all the bytes passed through it.
An output stream that calculates CRC32 checksum of all the bytes written through the stream.
Implementation of the CSI index for BAM files.
Hacky little class used to allow us to set the compression level on a GZIP output stream which, for some
bizarre reason, is not exposed in the standard API.
Factory for creating custom readers for accessing API based resources,
e.g.
Interface to be implemented by custom factory classes that register
themselves with this factory and are loaded dynamically.
Represents a specific CRAM record data series and its associated type and unique Content ID.
A CRAM Data Series reader for a particular (Encoding, DataSeriesType) and associated parameters
Data series types known to CRAM.
A CRAM Data Series writer for a particular Encoding, DataSeriesType and associated parameters
NOTE: This code has been taken from w3.org, and modified slightly to handle timezones of the form [-+]DDDD,
and also to fix a bug in the application of time zone to the parsed date.
Must not read from delegate unless no bits left in the buffer!!!
Embodies defaults for global values that affect how the SAM JDK operates.
Default factory for creating SAM and BAM records used by the
SamReader
classes.Factory for
Deflater
objects used by BlockCompressedOutputStream
.Simple iterator class that delegates all method calls to an underlying iterator.
Class to hold a set of
Path
to be delete on the JVM exit through a shutdown hook.A read feature representing a deletion of one or more bases similar to
CigarOperator.D
.A single-ended FIFO queue.
A class for reading BAM file indices, hitting the disk once per query.
Abstract base class for all DownsamplingIterators that provides a uniform interface for recording
and reporting statistics bout how many records have been kept and discarded.
A factory for creating DownsamplingIterators that uses a number of different strategies to achieve downsampling while
meeting various criteria.
Describes the available downsampling strategies.
Filter out SAMRecords with DuplicateRead flag set
$Id$
This class helps us compute and compare duplicate scores, which are used for selecting the non-duplicate
during duplicate marking (see MarkDuplicates).
Stores a set of records that are duplicates of each other.
An iterator of sets of duplicates.
A DynamicIndexCreator creates the proper index based on an
IndexFactory.IndexBalanceApproach
and
the characteristics of the file.Iterator that traverses a SAM File, accumulating information on a per-locus basis.
Holds a SAMRecord plus the zero-based offset into that SAMRecord's bases and quality scores that corresponds
to the base and quality for the start of alignment block at the genomic position described by the AbstractLocusInfo.
Describes the type of
TypedRecordAndOffset
, whether it represents the start or the end of
an alignment block.A token that represents a fragment of a read name that has been tokenised.
A class for representing an encoding, including encoding-specific parameters, suitable for
serialization to/from a stream.
A helper class to choose and instantiate an appropriate
CRAMEncoding
given a DataSeriesType
and
an EncodingDescriptor
.Encoding ID as specified by Section 3 of the CRAM spec.
An example binary codec that encodes / decodes contig / start / stop values via DataInputStreams
Encode Byte Arrays using an External Data Block
Filter for filtering out reads that do not pass the quality filter
$Id$
The v1.0 FASTA codec.
The v1.0 FASTA decoder.
Writes a FASTA formatted reference file.
Buider for a
FastaReferenceWriter
Implementation of ReferenceSequenceFile for reading from FASTA files.
Reads/writes a fasta index file (.fai), as generated by `samtools faidx`.
Static methods to create an
FastaSequenceIndex
.Hold an individual entry in a fasta sequence index file.
This class encompasses all the basic information about a genotype.
Line-oriented InputStream reader that uses one buffer for disk buffering and line-termination-finding,
in order to improve performance.
Codec for encoding records into FASTQ format.
Enumeration for FastQ quality score formats formats.
Reads a FASTQ file with four lines per record.
Enum of the types of lines we see in Fastq.
Simple representation of a FASTQ record, without any conversion
Simple interface for a class that can write out fastq records.
Factory class for creating FastqWriter objects.
Marker interface for Locatables with Tribble support.
The base interface for classes that read in features.
A class to represent a header of a feature containing file.
the basic interface that feature sources need to match
LRU cache of OutputStreams to handle situation in which it is necessary to have more FileOutputStreams
than resource limits will allow.
Contains file extension constants for read, alignment, variant and annotation files
Thrown when it is possible to detect that a SAM or BAM file is truncated.
Deprecated.
use
FilteringSamIterator
insteadDeprecated.
since 2/29/16 use
FilteringVariantContextIterator
insteadFiltering Iterator which takes a filter and an iterator and iterates through only those records
which are not rejected by the filter.
A filtering iterator for VariantContexts that takes a base iterator and a VariantContextFilter.
Simple class used to format object values into a standard format for printing.
Decoder for the CRAM 3.1 FQZComp codec, used for compressing quality scores.
Placeholder for the (not yet implemented) CRAM 3.1 FQZComp quality score encoder.
A "non-seekable" ftp stream.
Object for full BED file.
Constants and utility methods used throughout the VCF/BCF/VariantContext classes
Constants and methods used by BAM and Tribble indices
This class encompasses all the basic information about a genotype.
A builder class for genotypes
Provides convenience setter methods for all of the Genotype field
values.
A Predicate on VariantContexts that returns true at sites that are either unfiltered, or passing (as variants).
Represents an ordered collection of Genotype objects
Summary types for Genotype objects
Codec for parsing Gff3 files, as defined in https://github.com/The-Sequence-Ontology/Specifications/blob/31f62ad469b31769b43af42e0903448db1826925/gff3.md
Note that while spec states that all feature types must be defined in sequence ontology, this implementation makes no check on feature types, and allows any string as feature type
Each feature line in the Gff3 file will be emitted as a separate feature.
Enum for parsing directive lines.
Gff3 format spec is defined at https://github.com/The-Sequence-Ontology/Specifications/blob/31f62ad469b31769b43af42e0903448db1826925/gff3.md
Discontinuous features which are split between multiple lines in the gff files are implemented as separate features linked as "co-features"
Gff3 format spec is defined at https://github.com/The-Sequence-Ontology/Specifications/blob/31f62ad469b31769b43af42e0903448db1826925/gff3.md
Discontinuous features which are split between multiple lines in the gff files are implemented as separate features linked as "co-features"
A class to write out gff3 files.
Represents a .gzi index of a block-compressed file.
Helper class for constructing the GZIindex.
Index entry mapping the block-offset (compressed offset) to the uncompressed offset where the
block starts.
Base class for all
HtsContentType.HAPLOID_REFERENCE
codecs.Base class for all
HtsContentType.HAPLOID_REFERENCE
decoders.Class for haploid reference decoder options.
Base class for all
HtsContentType.HAPLOID_REFERENCE
encoders.Class for haploid reference encoder options.
Class with string constants representing the file formats supported by haploid reference codecs.
Class with methods for resolving inputs and outputs to haploid reference encoders and decoders.
Class for haploid reference decoder options.
A read feature representing a hard clip similar to
CigarOperator.H
.This class calculates a HardyWeinberg p-value given three values representing
the observed frequences of homozygous and heterozygous genotypes in the
test population.
A header for a metrics file.
A Predicate on VariantContexts that either returns true at heterozygous sites (invertible to false).
Class for computing and accessing histogram type data.
Represents a bin in the Histogram.
Base interface implemented by all
htsjdk.beta.plugin
codecs.A registry for tracking
HtsCodec
instances.Base class for content-type-specific resolvers, with methods for resolving an input or output resource
to a codec that can supply an encoder or decoder for that resource.
The plugin framework defines a set of supported content types, each of which represents a type
of HTS data such as "aligned reads".
Base interface for decoders.
Base tag interface for options for
HtsDecoder
s.A global, static, immutable, public registry for
HtsCodec
instances.Base interface for encoders.
Base tag interface for options for
HtsEncoder
s.Base class for concrete implementations of reads codecs that handle
BundleResourceType.FMT_READS_HTSGET_BAM
codecs.Version 1.2 of
BundleResourceType.FMT_READS_HTSGET_BAM
codec.Base class for concrete implementations of reads decoders that handle
BundleResourceType.FMT_READS_HTSGET_BAM
decoding.Version 1.2 of
BundleResourceType.FMT_READS_HTSGET_BAM
decoder.Class for reading and querying BAM files from an htsget source
Filters out records that do not match any of the given intervals and query type.
Classes of data that can be requested in an htsget request as defined by the spec
Class allowing deserialization from json htsget error response, as defined in https://samtools.github.io/hts-specs/htsget.html
An example response could be as follows
{
"htsget": {
"error": "NotFound",
"message": "No such accession 'ENS16232164'"
}
}
Formats currently supported by htsget as defined by spec
Builder for an htsget POST request that allows opening a connection
using the request after validating that it is properly formed.
Builder for an htsget GET request that allows opening a connection
using the request after validating that it is properly formed.
Fields which can be used to filter a htsget request as defined by the spec
Class allowing deserialization from json htsget response, as defined in https://samtools.github.io/hts-specs/htsget.html
Tagging interface used as a type-bound for codec/encoder/decoder header type params.
A common interface for 1-based, closed genomic intervals.
Methods for interconverting between HtsQueryInterval and existing htsjdk types such as Locatable/QueryInterval
Exception type for all exceptions caused at runtime by HTSJDK.
A RuntimeException-derived class for propagating IOExceptions caught and rethrown by the plugin framework.
Base class for unexpected conditions caused by codec plugins.
Exception thrown when a requested operation is not supported by a plugin codec.
Default implementation for IOPath.
Common query interface for
HtsDecoder
sAn concrete query interval implementation of
HtsInterval
used for random access queries on
file formats represented by HtsDecoder
s that support random access.Query rule values used to determine signal whether a query should match overlapping or contained records.
Tagging interface used as a type-bound for codec/encoder/decoder record type params.
A class for representing 3-part versions with major, minor and patch segments.
Simple implementation of URLHelper based on the JDK URL and HttpURLConnection classes.
User: jrobinso
Date: Sep 23, 2009
Huffman bit code word consisting of a symbol, the corresponding codeword and codeword bit length.
Given a set of
HuffmanParams
, creates the set of canonical codes that are be used to
read/write symbols from/to an output/input stream.A class for carrying around encoding parameters for a canonical Huffman encoder.
A utility class that calculates Huffman encoding parameters based on the frequencies of the symbols to be
encoded.
Interface for all index implementations.
An interface for creating indexes
A fasta file driven by an index for fast, concurrent lookups.
Factory class for creating indexes.
We can optimize index-file-creation for different factors.
an enum that contains all of the information about the index types, and how to create them
Merges index files for (headerless) parts of a partitioned main file into a single index file.
Check with two index files are equal
Factory for
Inflater
objects used by BlockGunzipper
.A
BundleResource
backed by an InputStream
.Convenience methods to read from
InputStream
.A read feature representing a single insert base.
A read feature representing a multi-base insertion.
Filter things that fall outside a specified range of insert sizes.
Annotation indicating that a package, class, method, or type is release level "internal", even if the
access modifier is "public".
Represents a simple interval on a sequence.
Quick and dirty interval class
Describes a genomic interval and where in a file information for that
interval can be obtained
Comparator that orders intervals based on their sequence index, by coordinate
then by strand and finally by name.
Filter SAMRecords so that only those that overlap the given list of intervals.
Creates interval indexes from a stream of features
Filter out SAMRecords where neither record of a pair overlaps a given set of
intervals.
Represents a list of intervals against a reference sequence that can be written to
and read from a file.
A tribble codec for IntervalLists.
Serve up loci of interest based on an interval list.
Writes out the list of intervals to the supplied file.
A Red-Black tree with intervals for keys.
An implementation of an interval tree, following the explanation.
Index based on an interval tree
Utility class that implements an interval map.
A convenient way to provide a single view on the many int and int[] field values we work with,
for writing out the values.
Wraps an existing filter and inverts it.
Interface for htsjdk input/output paths/URIs.
A
BundleResource
backed by an IOPath
.Miscellaneous stateless static IO-oriented methods.
Factory for creating
SeekableStream
s based on URLs/paths.Use this type rather than java.util.Date in command-line options in order to get ISO 8601 parsing.
Provides an adapter to wrap an Iterator with an Iterable, allowing it to be run through a foreach loop.
Abstract implementation of an iterator that also implements Iterable (to return itself)
so that it can be used if for() loops.
Methods to read and write int values as per ITF8 specification in CRAM.
javascript based read filter
The script puts the following variables in the script context:
- 'record' a SamRecord (
https://github.com/samtools/htsjdk/blob/master/src/java/htsjdk/samtools/
SAMRecord.java ) - 'header' (
https://github.com/samtools/htsjdk/blob/master/src/java/htsjdk/samtools/
SAMFileHeader.java )
javascript based variant filter The script puts the following variables in
the script context:
- 'header' a htsjdk.variant.vcf.VCFHeader
- 'variant' a htsjdk.variant.variantcontext.VariantContext
How to treat values that appear in a jexl expression but are missing in the context it's applied to
Simple utility for building an on-demand (lazy) object-initializer.
Deprecated.
since 1/2017 use a
Supplier
insteadLazy-loading GenotypesContext.
Returns the data used in the full GenotypesContext constructor
GenotypesContext(java.util.ArrayList, java.util.Map, java.util.List)
Simple lazy parser interface.
Java port of UCSC liftOver.
Represents a portion of a liftover operation, for use in diagnosing liftover failures.
The linear index associated with a given reference in a BAM index.
Index defined by dividing the genome by chromosome, then each chromosome into bins of fixed width (in
genomic coordinates).
Blocks are organized as a simple flat list:
For creating a LinearIndex from a stream of features.
A very simple descriptor for line-iterables.
A simple iterator over the elements in LineReader.
Interface allows for implementations that read lines from a String, an ASCII file, or somewhere else.
Interface for line-oriented readers.
A Map class that holds a list of entries under each key instead of a single entry, and
provides utility methods for adding an entry under a key.
Input stream with methods to convert byte arrays to numeric values using "little endian" order.
Any class that has a single logical mapping onto the genome should implement Locatable
positions should be reported as 1-based and closed at both ends
Describes API for getting current position in a stream, writer, or underlying file.
Location info about a locus.
compares first by sequence index then by position
Simple implementation of Locus interface for ease of passing as an arg and comparing with other Locus implementations.
A wafer thin wrapper around System.err that uses var-args to make it
much more efficient to call the logging methods in without having to
surround every call site with calls to Log.isXXXEnabled().
Enumeration for setting log levels.
A variant of
BufferedReader
with improved performance reading files with long lines.Methods to read and write LTF8 as per CRAM specs.
Filter things with low mapping quality.
a collection of functions and classes for various common calculations
a class for calculating moving statistics - this class returns the
mean, variance, and std dev after accumulating any number of records.
Class to generate an MD5 string for a file as it is being read
Class to generate an MD5 string for a file as it is being read
An iterator over Iterators that return Ts.
Provides an iterator interface for merging multiple underlying iterators into a single
iterable stream.
A base class from which all Metric classes should inherit.
Contains a set of metrics that can be written to a file and parsed back
again.
Provides an implementation of the Murmur3_32 hash algorithm that has desirable properties in terms of randomness
and uniformity of the distribution of output values that make it a useful hashing algorithm for downsampling.
A mutable concrete Feature.
Some Index implementations can be modified in memory.
Mutable integer class suitable for use with collection classes that take a type parameter.
Created by IntelliJ IDEA.
An interface for features provided via an interval file, e.g.
CRAM 3.1 NameTokenisation decoder, used to compress read names in CRAM files.
A very naive implementation of a name tokenization encoder.
Deprecated.
use
SecondaryAlignmentFilter
instead.Deprecated.
use
SecondaryAlignmentSkippingIterator
instead.Available writer options for VariantContextWriters
A
BundleResource
backed by an OutputStream
.Filters out reads with very few unclipped bases, likely due to the read coming
from a foreign organism, e.g.
Utility class to efficiently do in memory overlap detection between a large
set of mapping like objects, and one or more candidate mappings.
A read feature representing padding, similar to
CigarOperator.P
.A Predicate on VariantContexts that returns true at sites that are either unfiltered, or passing (as variants).
Generic Closable Iterator that allows you to peek at the next value before calling next
Wrapper around an iterator that enables non-destructive peeking at the next element that would
be returned by next()
Minimal interface for an object at support getting the current position in the stream / writer / file, as well as a handful of other
reader-like features.
A wrapper around an
InputStream
which performs it's own buffering, and keeps track of the position.Wraps output stream in a manner which keeps track of the position within the file and allowing writes
at arbitrary points
This is a example program showing how to use SAM readers and (optionally) writers.
This is a example program showing how to use Feature readers and (optionally) writers.
Utility class that will execute sub processes via Runtime.getRuntime().exec(...) and read
off the output from stderr and stdout of the sub process.
Little progress logging class to facilitate consistent output of useful information when progressing
through a stream of SAM records.
An interface defining the record() methods of the Picard-public ProgressLogger implementation.
Utility for determining the type of quality encoding/format (see
FastqQualityFormat
) used in a SAM/BAM or Fastq.Utility class for working with quality scores and error probabilities.
Interval relative to a reference, for querying a BAM file.
Decoder for the CRAM 3.1 arithmetic (range) codec.
Encoder for the CRAM 3.1 arithmetic (range) codec.
Decoder for the CRAM 3.1 rANSNx16 codec with 16-bit state renormalization (as opposed to the rAns4x8 codec,
which uses 8-bit state renormalization), and order-0 or order-1 context.
Encoder for the CRAM 3.1 rANSNx16 codec with 16-bit state renormalization (as opposed to the rAns4x8 codec,
which uses 8-bit state renormalization), and order-0 or order-1 context.
Raw compressor that does no compression.
A class to wrap a
SeekableStream
in a read-only SeekableByteChannel
.A read feature representing a single base with associated quality score.
An interface to capture data in read coordinates.
Filter by a set of specified readnames
A class for creating a
Bundle
for reads and reads-related resources.Base class for all
HtsContentType.ALIGNED_READS
codecs.InternalAPI
Utilities for use by reads encoder/decoder implementations.
Base class for all
HtsContentType.ALIGNED_READS
decoders.Reads decoder options (shared/common).
Base interface for
HtsContentType.ALIGNED_READS
encoders.General reads encoder options.
Class with string constants for known formats supported by reads codecs.
Query methods specific to
ReadsDecoder
s.Class with methods for resolving inputs and outputs to reads encoders and decoders.
CRAM counterpart of
SAMTag
.That is a thread-safe wrapper for a list of cache Reference objects.
ReferenceContext defines how a given Slice or Container relates to a reference sequence.
Is this
ReferenceContext
Single Reference,
Multiple Reference, or Unmapped?
Section 8.5 of the CRAM spec defines the following values for the Slice Header sequence ID field:
-2: Multiple Reference Slice
-1: Unmapped-Unplaced Slice
Any positive integer (including zero): Single Reference SliceWrapper around a reference sequence that has been read from a reference file.
An interface for working with files of reference sequences regardless of the file format
being used.
Factory class for creating ReferenceSequenceFile instances for reading reference
sequences store in various formats.
Manages a ReferenceSequenceFile.
Interface for specifying loci of interest for genotype calling and other operations.
Used to represent a CRAM reference, the backing source for which can either be
a file or the EBI ENA reference service.
A read feature representing a reference skip similar to
CigarOperator.N
.Like
Iso8601Date
, but also comes in a "lazy now" flavor.Implementation of URLHelper designed for remote resources.
Constants for tags used in our SAM/BAM files
LRU collection class for managing objects that place some resource burden such that not too many of them
can existing in the VM at one time, but they can be reconstructed ias necessary.
c.f.
Thrown by various codecs to indicate EOF without having to clutter the API with throws clauses
Thrown by various IO classes to indicate IOException without having to clutter the API with throws clauses
Thrown by classes handling script engines like the javascript-based filters for SAM/VCF
Simple extension to SAMBinaryTagAndValue in order to distinguish unsigned array values, because
signedness cannot be determined by introspection of value.
Holds a SAMRecord attribute and the tagname (in binary form) for that attribute.
InternalAPI
Base class for
BundleResourceType.FMT_READS_SAM
codecs.SAM v1.0 codec.
A set of constants defined in the sam-spec (https://github.com/samtools/hts-specs) that need
to be referenced in code.
InternalAPI
Base class for
BundleResourceType.FMT_READS_SAM
decoders.SAM v1.0 decoder.
InternalAPI
Base class for
BundleResourceType.FMT_READS_SAM
encoders.SAM v1.0 encoder.
Header information from a SAM or BAM file.
Little class to generate program group IDs
Ways in which a SAM or BAM may be sorted.
Merges SAMFileHeaders that have the same sequences into a single merged header
object while providing read group translation for cases where read groups
clash across input headers.
Represents the origin of a SAM record.
A interface representing a collection of (possibly) discontinuous segments in the
BAM file, possibly representing the results of an index query.
Validates SAM files as follows:
checks sam file header for sequence dictionary
checks sam file header for read groups
for each sam record
reports error detected by SAMRecord.isValid()
validates NM (nucleotide differences) exists and matches reality
validates mate fields agree with data in the mate record
Interface for SAMText and BAM file writers.
Create a writer for writing SAM, BAM, or CRAM files.
Base class for implementing SAM writer with any underlying format.
SAM flags as enum, to be used in GUI, menu, etc...
This determines how flag fields are represented in the SAM file.
Thrown when a SAM file being read or decoded (text or binary) looks bad.
Provides ordering based on SAM header records' attribute values.
A helper class to read BAI and CRAI indexes.
Describes a SAM-like resource, including its data (where the records are), and optionally an index.
This class enables creation of a SAMRecord object from a String in SAM text format.
Iterator that traverses a SAM File and a ReferenceFile, accumulating information on a per-locus basis.
Small class to hold together
a
SamLocusIterator.LocusInfo
and the reference base over that locus.Iterator that traverses a SAM File, accumulating information on a per-locus basis.
The unit of iteration.
Implementation of
AbstractRecordAndOffset
class for SamLocusIterator
.Utility methods for pairs of SAMRecords
The possible orientations of paired reads.
A class to iterate through SAMRecords and set mate information on the given records, and optionally
set the mate cigar tag (true by default).
In-memory representation of @PG SAM header record.
Describes functionality for objects that produce
SAMRecord
s and associated information.Facet for index-related operations.
The minimal subset of functionality needed for a
SAMRecord
data source.Decorator for a
SamReader.PrimitiveSamReader
that expands its functionality into a SamReader
,
given the backing SamInputResource
.Internal interface for SAM/BAM/CRAM file reader implementations,
as distinct from non-file-based readers.
Describes a type of SAM file.
Describes the functionality for producing
SamReader
, and offers a
handful of static generators.A collection of binary
SamReaderFactory
options.Header information about a read group.
Java binding for a SAM file record.
Tag name and value of an attribute, for getAttributes() method.
Interface for comparators that define the various SAM sort orders.
Comparator for sorting SAMRecords by coordinate.
Compares records based on if they should be considered PCR Duplicates (see MarkDuplicates).
Factory interface which allows plugging in of different classes for generating instances of
SAMRecord and BAMRecord when reading from SAM/BAM files.
API for filtering SAMRecords
$Id$
Create an iterator over a
SamReader
that only returns reads that overlap one of the intervals
in an interval list.A general interface that adds functionality to a CloseableIterator of
SAMRecords.
Iterator that uses a dedicated background thread to prefetch SAMRecords,
reading ahead by a set number of bases to improve throughput.
SAMRecord comparator that provides an ordering based on a hash of the queryname.
Comparator for "queryname" ordering of SAMRecords.
Factory class for creating SAMRecords for testing purposes.
This class stores SAMRecords for return.
A little class to store the unique index associated with this record.
Collection of SAMSequenceRecords.
"On the fly" codec SAMSequenceDictionaryCodec.
Small class for loading a SAMSequenceDictionary from a file
Header information about a reference sequence.
Encapsulates simple check for SAMRecord order.
Utilities related to processing of
InputStream
s encoding SAM dataThe standard tags for a SAM record that are defined in the SAM spec.
Deprecated.
as of 11/2018, the functions in this class have been absorbed by the
SAMTag
enum.Misc methods for SAM-related unit tests.
Indicates that a required sanity-check condition was not met.
Parser for a SAM text header, and a generator of SAM text header.
Writer for text-format SAM files.
Utilty methods.
Class that encapsulates a validation error message as well as a type code so that
errors can be aggregated by type.
SBI is an index into BGZF-compressed data files, which has an entry for the file position of the start of every
nth record.
Merges SBI files for parts of a file that have been concatenated.
Writes SBI files as understood by
SBIIndex
.A read feature representing a contiguous stretch of quality scores in a read.
SamRecordFilter that filters out secondary alignments, but not supplemental alignments.
Wrapper around SAMRecord iterator that skips over secondary elements.
Filter out SAMRecords with Secondary or Supplementary flag set
This class should be viewed as a replacement for
NotPrimarySkippingIterator
,
in that we did not want to change the functionality of NPSI to no longer match its name
$Id$Wrapper around SAMRecord iterator that skips over secondary and supplementary elements.
A wrapper class to provide buffered read access to a SeekableStream.
Unfortunately the seekable stream classes exist for both Tribble and Picard, and we need both.
An implementation of
SeekableStream
for Path
.InputStream with random access support (seek).
Singleton class for getting
SeekableStream
s from URL/paths
Applications using this library can set their own factoryA
BundleResource
backed by a SeekableStream
.Represents a sequence region feature in a gff3 file.
An input stream over the first
signaturePrefixLength
bytes of another input stream, used to
allow multiple codecs to probe those bytes for a file format/version signature.An implementation of
Allele
which includes a byte[] of the bases in the allele or the symbolic name.Feature from a BED file without exon blocks.
A simple concrete Feature.
A CRAM slice is a logical construct that is just a subset of the blocks in a Slice.
Provides a layer over a
SliceBlocks
object and acts as a bridge between the DataSeries codecs
and their underlying blocks when reading a CRAM stream by presenting a bit (core) or byte (external) stream
for each block.Provides a layer over a
SliceBlocks
object and acts as a bridge between the DataSeries codecs
and their underlying blocks when writing a CRAM stream by presenting a bit (core) or byte (external) stream
for each block.Factory for creating
Slice
s when writing a CRAM stream.Checks if Snappy is available, and provides methods for wrapping InputStreams and OutputStreams with Snappy if it is.
A Predicate on VariantContexts that returns true at sites that are SNPs
A read feature representing a soft clip similar to
CigarOperator.S
.Filter to determine whether a read is "noisy" due to a poly-A run that is a sequencing artifact.
Optimized method for converting Solexa ASCII qualities into Phred scores.
Collection to which many records can be added.
Client must implement this class, which defines the way in which records are written to and
read from file.
Accumulate a list of longs that can then be sorted in natural order and iterated over.
Deprecated.
9/2017, this class is completely untested and unsupported, there is no replacement at this time
if you use this class please file an issue on github or it will be removed at some point in the future
Deprecated.
since 11/2018.
The ordinals of these are stored in the high-order 2 bits of each byte of the SQ tag.
Describes a single SRA accession for SRA read collection
Also provides app string functionality and allows to check if working SRA is supported on the running platform
Important: due to checks performed in SRAAccession.isValid(), we won't recognise any accessions other
than ones that follow the pattern "^[SED]RR[0-9]{6,9}$", e.g.
Iterator for aligned reads.
Emulates BAM index so that we can request chunks of records from SRAFileReader
Here is how it works:
SRA allows reading of alignments by Reference position fast, so we divide our "file" range for alignments as
a length of all references.
Allows reading Reference data from SRA
SRA iterator which returns SAMRecords for requested list of chunks
Describes record ranges info needed for emulating BAM index
Extends SAMRecord so that any of the fields will be loaded only when needed.
Iterator for unaligned reads.
Provides some functionality which can be used by other classes
Created by andrii.nikitiuk on 10/28/15.
Utility to help in performance testing.
Enum for strand, which can be encoded as a string
A
TabixIndexCreator
that can write to an output stream.A simple header who's data type is a single String.
Deprecated.
Grab-bag of stateless String-oriented utilities.
Type of Structural Variant as defined in the VCF spec 4.2
A substitution event captured in read coordinates.
Substitution matrix, used to represent base substitutions for reference-based CRAM
compression.
Implementation of
LineReader
that reads lines directly from the underlying stream or reader.The values in a Tabix header that define the format of the file being indexed, e.g.
This class represent a Tabix index that has been built in memory or read from a file.
IndexCreator for Tabix.
Merges tabix files for parts of a VCF file that have been concatenated.
classes that have anything to do with tabix
Filter class for matching tag attributes in SAMRecords
$Id$
CVO to use as a method return value.
Factory class for wrapping input and output streams for temporary files.
An extension of
BlockCompressedOutputStream
that doesn't write an empty BGZF block at the
end of the stream.Convert between String and Cigar class representations of CIGAR.
Converter between SAM text representation of a tag, and in-memory Object representation.
Deprecated.
This is deprecated with no replacement.
Deprecated.
This is deprecated with no replacement.
Common, tribble wide constants and static functions
Base class for Tribble-specific index creators.
A reader for text feature files (i.e.
Utility code for performing quality trimming.
A simple tuple class.
an exception for when we've discovered that an input file is unsorted; sorted files are required by Tribble
Placeholder interface for methods for upgrading one version of a format to a newer version.
Interface defining a helper class for dealing with URL resources.
A factory for creating
URLHelper
instances.How strict to be when reading a SAM or BAM, beyond bare minimum validation.
Simple functions that streamline the checking of values.
High-level overview
Builder class for
VariantContext
.A Comparator that orders VariantContexts by the ordering of the contigs/chromosomes in the List
provided at construction time, then by start position with each contig/chromosome.
API for filtering VariantContexts.
A simple but common wrapper for matching
VariantContext
objects using JEXL expressionsthis class writes VCF files
A
Bundle
for variants and variants-related resources that are backed by on disk files.Base class for all
HtsContentType.VARIANT_CONTEXTS
codecs.Utilities for VCF codec implementations.
Base class for all
HtsContentType.VARIANT_CONTEXTS
decoders.Base class for all
HtsContentType.VARIANT_CONTEXTS
encoders.Class with string constants representing known formats supported by variants codecs.
Class with methods for resolving inputs and outputs to variants encoders and decoders.
A feature codec for the VCF3 specification, to read older VCF files.
A class representing ALT fields in the VCF header
InternalAPI
Base class for concrete implementations of
HtsContentType.VARIANT_CONTEXTS
codecs.A feature codec for the VCF 4 specification
VCF V3.2 codec.
VCF V3.3 codec.
VCF V4.0 codec.
VCF V4.1 codec.
VCF V4.2 codec.
VCF V4.3 codec.
a base class for compound header lines, which include info lines and format lines (so far)
A special class representing a contig VCF header line.
InternalAPII
Base class for concrete implementations of
HtsContentType.VARIANT_CONTEXTS
decoders.VCF V3.2 decoder.
VCF V3.3 decoder.
VCF V4.0 decoder.
VCF V4.1 decoder.
VCF V4.2 decoder.
VCF V4.3 decoder.
InternalAPI
Base class for concrete implementations of
HtsContentType.VARIANT_CONTEXTS
encoders.Functions specific to encoding VCF records.
VCF V3.2 encoder.
VCF V3.3 encoder.
VCF V4.0 encoder.
VCF V4.1 encoder.
VCF V4.2 encoder.
Simplified interface for reading from VCF/BCF files.
A class to represent a VCF header
the count encodings we use for fields in VCF header lines
A class for translating between vcf header versions
the type encodings we use for fields in VCF header lines
Utility class to read a VCF header without being told beforehand whether the input is VCF or BCF.
information that identifies each header version
an interface for ID-based header lines
An iterator of `VariantContext`.
A Class building
VCFIterator
Example:A class representing META fields in the VCF header
A no-op implementation of VCFTextTransformer for pre-v43 VCFs, when such encodings are not supported and
no transformation need be done.
A class representing PEDIGREE fields in the VCF header
Text transformer for attribute values embedded in VCF.
Interface for reading VCF/BCF files.
Writes VariantContext instances to an OutputStream without headers or metadata.
A class representing SAMPLE fields in the VCF header
Manages header lines for standard VCF
Interface for transforming attribute values embedded in VCF.
Header that stores information about the version of some piece of software or
data used to create the metrics file.
Implementation of ReferenceSequenceMask that indicates that all the loci in the sequence dictionary are of interest.
Filter SAMRecords so that only those that have at least one un-clipped base are
returned.