D
- the decoder options type for this codecE
- the encoder options type for this codecpublic interface HtsCodec<D extends HtsDecoderOptions,E extends HtsEncoderOptions> extends Upgradeable
htsjdk.beta.plugin
codecs.
Each version of a file format supported by the htsjdk.beta.plugin
framework is
represented by a trio of components:
HtsCodec
HtsEncoder
HtsDecoder
The HtsCodec
is a lightweight and long-lived object that resides in an
HtsCodecRegistry
. A registry is used to resolve requests for
an HtsEncoder
or HtsDecoder
that matches a given resource. The HtsEncoder
and HtsDecoder
objects do the work of actually writing and reading records to and from
underlying resources.
A default, static, immutable HtsCodecRegistry
is populated with
HtsCodec
s that are discovered and instantiated statically via a ServiceLoader
,
and can be accessed using HtsDefaultRegistry
. A private, mutable
registry can be created at runtime via HtsCodecRegistry.createPrivateRegistry()
, and populated
dynamically by calls to HtsCodecRegistry.registerCodec(HtsCodec)
.
The primary responsibility of an HtsCodec
is to satisfy requests made by the framework during
codec resolution, inspecting and recognizing input URIs and stream resources that match the
supported format and version, and providing an HtsEncoder
or HtsDecoder
on demand, once
a match is made.
HtsContentType.ALIGNED_READS
HtsContentType.HAPLOID_REFERENCE
HtsContentType.VARIANT_CONTEXTS
HtsContentType.FEATURES
For each content type, there is a corresponding set of codec/decoder/encoder interfaces that
are implemented by components that support that content type. These interfaces extend generic base
interfaces, and provide generic parameter type instantiations appropriate for that content type.
As an example, see ReadsDecoder
which defines the interface for
all HtsDecoder
s for the HtsContentType.ALIGNED_READS
content
type. The different implementations of component trios for a given content type all use the same
content-type-specific interfaces, but each over a different combination of underlying file format
and version.
The generic, base interfaces that are common to all codecs, encoders, and decoders are:
HtsCodec
: base codec interface HtsEncoder
: base encoder interface HtsEncoderOptions
: base options interface for encoders HtsDecoder
: base decoder interface HtsDecoderOptions
: base options interface for decoders Bundle
: a optional type-specific
Bundle
implementation The packages containing the content type-specific interface definitions for each of the four different content types are:
HtsContentType.ALIGNED_READS
codecs, see the htsjdk.beta.plugin.reads
package HtsContentType.HAPLOID_REFERENCE
codecs, see the htsjdk.beta.plugin.hapref
package HtsContentType.VARIANT_CONTEXTS
codecs, see the htsjdk.beta.plugin.variants
package HtsContentType.FEATURES
codecs, see the htsjdk.beta.plugin.features
package
As an example, the htsjdk.beta.plugin.reads
package defines the following interfaces
that extend the generic base interfaces for codecs with content type HtsContentType.ALIGNED_READS
:
ReadsCodec
: reads codec interface, extends the generic
HtsCodec
interface ReadsEncoder
: reads encoder, extends the generic
HtsEncoder
interface ReadsEncoderOptions
: reads encoder options, extends the generic
HtsDecoderOptions
interface ReadsDecoder
: reads decoder interface, extends the generic
HtsDecoder
interface ReadsDecoderOptions
: reads decoder options, extends the generic
HtsDecoderOptions
interface ReadsFormats
: an class with string constants for each possible
supported reads file format The plugin framework uses registered codecs to conduct a series of probes into the structure and format of an input or output resource in order to find a matching codec that can produce an encoder or decoder for that resource. The values returned from the codec methods are used by the framework to prune a list of candidate codecs down, until a match is found. During codec resolution, the codec methods are called in the following order:
See the HtsCodecResolver
methods for more detail on the resolution
protocol:
HtsCodecResolver.resolveForDecoding(Bundle)
HtsCodecResolver.resolveForEncoding(Bundle)
HtsCodecResolver.resolveForEncoding(Bundle, HtsVersion)
Many file formats consist of a single file that resides on a file system that is supported by a
java.nio
file system provider. Codecs that support such formats are generally agnostic
about the IOPath or URI protocol scheme used to identify their resources, and assume that file contents
can be accessed directly via a single stream created via a java.nio
file system provider.
However, some file formats use a specific, well known URI format or protocol scheme, often
to identify a remote or otherwise specially-formatted resource, such as a local database
that is distributed across multiple physical files. These codecs may bypass direct file java.nio
system access, and instead use specialized code to access their underlying resources.
For example, the BAMCodecV1_0
assumes that IOPath
resources can be accessed as a stream on a single file via either the "file://" protocol, or
other protocols such gs:// or hdfs:// that have java.nio
file system providers. It does
not require or assume a particular URI format, and is agnostic about URI scheme.
In contrast, the HtsgetBAMCodecV1_2
codec
is a specialized codec that handles remote resources via the "http://" protocol.
It uses http
to access the underlying resource, and bypasses direct java.nio
file system access.
Codecs for formats that use a custom URI format or protocol scheme such as htsget
must be
able to determine if they can decode or encode a resource purely by inspecting the IOPath/URI, and
should follow these guidelines:
ownsURI(IOPath)
is presented with an IOPath with
a conforming URI canDecodeURI(IOPath)
is presented with an IOPath
with a conforming URI ownsURI(IOPath)
== canDecodeURI(IOPath)
getSignatureProbeLength()
methodgetSignatureLength()
method
getDecoder(Bundle, HtsDecoderOptions)
implementation should not attempt to automatically
resolve the companion index in order to satisfy index queries, if the index resource is not provided
in the input bundle. HtsDecoder
s for such file formats should only satisfy index queries if
the input bundle explicitly specifies the index resource. For file formats that do no use a separate
index resource to be specified (such as those that rely on a remote access mechanism), it is permissible
to satisfy index queries without requiring the index resource to be included in the bundle.
getDecoder(Bundle, HtsDecoderOptions)
and
getEncoder(Bundle, HtsEncoderOptions)
).
Modifier and Type | Method and Description |
---|---|
boolean |
canDecodeSignature(SignatureStream signatureStream,
java.lang.String sourceName)
Determine if the codec can decode an input stream by inspecting a signature embedded
within the stream.
|
boolean |
canDecodeURI(IOPath ioPath)
Determine if the URI for
ioPath (obtained via IOPath.getURI() )
conforms to the expected URI format this codec's file format. |
HtsContentType |
getContentType()
Get the
HtsContentType for this codec. |
HtsDecoder<?,? extends HtsRecord> |
getDecoder(Bundle inputBundle,
D decoderOptions)
Get an
HtsDecoder to decode the provided inputs. |
default java.lang.String |
getDisplayName()
Get a user-friendly display name for this codec.
|
HtsEncoder<?,? extends HtsRecord> |
getEncoder(Bundle outputBundle,
E encoderOptions)
Get an
HtsEncoder to encode to the provided outputs. |
java.lang.String |
getFileFormat()
Get the name of the file format supported by this codec.
|
int |
getSignatureLength()
Get the number of bytes in the format and version signature used by the file format supported
by this codec.
|
default int |
getSignatureProbeLength()
Get the number of bytes of needed by this codec to probe an input stream for a format/version
signature, and determine if it can supply a decoder for the stream.
|
HtsVersion |
getVersion()
Get the version of the file format returned by
getFileFormat() that is supported by this codec. |
default boolean |
ownsURI(IOPath ioPath)
Determine if this codec "owns" the URI contained in
ioPath see (IOPath.getURI() ). |
runVersionUpgrade
HtsContentType getContentType()
HtsContentType
for this codec.
HtsContentType
for this codec. The HtsContentType
determines the interfaces,
including the HEADER and RECORD types, used by this codec's HtsEncoder
and HtsDecoder
.
Each implementation of a given content type exposes the same interfaces, but over a different file
format or version. For example, both the BAM and HTSGET_BAM codecs have codec type
HtsContentType.ALIGNED_READS
, and are derived from ReadsCodec
,
but the serialized file formats and access mechanisms for the two codecs are different).java.lang.String getFileFormat()
BundleResourceType
and BundleResource.getFileFormat()
).HtsVersion getVersion()
getFileFormat()
that is supported by this codec.HtsVersion
) supported by this codecdefault java.lang.String getDisplayName()
default boolean ownsURI(IOPath ioPath)
ioPath
see (IOPath.getURI()
).
A codec "owns" the URI only if it has specific requirements on the URI protocol scheme, URI format,
or query parameters that go beyond a simple file extension, AND it explicitly recognizes the URI
as conforming to those requirements. File formats that only require a specific file extension should
always return false from ownsURI(htsjdk.io.IOPath)
, and should instead use the extension as a filter in
canDecodeURI(IOPath)
.
Returning true from this method will cause the framework to bypass the stream-oriented signature
probing that is used to resolve inputs to a codec handler. During codec resolution, if any registered
codec returns true for this method on ioPath
, the signature probing protocol will instead:
ioPath
Any codec that returns true from ownsURI(IOPath)
for a given IOPath must also return true
from canDecodeURI(IOPath)
for the same IOPath.
For custom URI handlers, codecs should avoid making remote calls to determine the suitability
or accessibility of the input resource; the return value for this method should be based only on the format
of the URI that is presented. Operations that require remote access that can fail, such as validating
server connectivity, authentication, or authorization, should be deferred until data is requested by the
caller via the codec's HtsEncoder
or HtsDecoder
.
Since this method is used during codec resolution, implementations should avoid calling methods that
may throw exceptions.
ioPath
- the ioPath to inspectboolean canDecodeURI(IOPath ioPath)
ioPath
(obtained via IOPath.getURI()
)
conforms to the expected URI format this codec's file format.
Most implementations only look at the file extension (see IOPath.hasExtension(java.lang.String)
).
For codecs that implement formats that use specific, well known file extensions, the codec should
reject inputs that do not conform to any of the accepted extensions. If the format does not use a
specific extension, or if the codec cannot determine if it can decode the underlying resource
without inspecting the underlying stream, it is safe to return true, after which the framework will
subsequently call this codec's canDecodeSignature(SignatureStream, String)
method, at
which time the codec can inspect the actual underlying stream via the SignatureStream
.
Implementations should generally not inspect the URI's protocol scheme unless the file format
supported by the codec requires the use a specific protocol scheme. For codecs that do own
a specific scheme or URI format, the return values for ownsURI(IOPath)
and
canDecodeURI(IOPath)
must always be the same (both true or both false) for a given IOPath.
For codecs that do not use a custom URI (and rely on NIO access), @link #ownsURI(IOPath)} should
always return false, with only the value returned from canDecodeURI(IOPath)
varying based
on features such as file extension probes.
It is never safe to attempt to directly inspect the underlying stream for ioPath
in this method. If the stream needs to be inspected, it should be done using the signature stream
when the canDecodeSignature(SignatureStream, String)
method is called.
ownsURI(IOPath)
, codecs should avoid making remote calls
to determine the suitability of the input resource; the return value for this method should be based
only on the format of the URI that is presented.
Since this method is used during codec resolution, implementations should avoid calling methods that
may throw exceptions.ioPath
- to be decodedboolean canDecodeSignature(SignatureStream signatureStream, java.lang.String sourceName)
getSignatureProbeLength()
getSignatureProbeLength()
Codecs that handle custom URIs that reference remote resources (those that return true for ownsURI(htsjdk.io.IOPath)
)
should generally not inspect the stream, and should return false from this method, since the method
will never be called with any resource for which ownsURI(htsjdk.io.IOPath)
returned true.
Since this method is used during codec resolution, implementations should avoid calling methods that
may throw exceptions.
signatureStream
- the stream to be inspect for the resource's embedded
signature and versionsourceName
- a display name describing the source of the input stream, for use in error messagesint getSignatureLength()
ownsURI(htsjdk.io.IOPath)
), should
always return 0 from this method.
Since this method is used during codec resolution, implementations should avoid calling methods that
may throw exceptions.default int getSignatureProbeLength()
getSignatureLength()
for codecs that support compressed or encrypted
streams, since they may require a larger and more semantically meaningful input fragment
(such as an entire encrypted or compressed block) in order to inspect the plaintext signature.
Therefore signatureProbeLength
should be expressed in "compressed/encrypted" space rather
than "plaintext" space. The length returned from this method is used to determine the size of the
SignatureStream
that is subsequently passed to
canDecodeSignature(SignatureStream, String)
.
Note: Codecs that are custom URI handlers (those that return true for ownsURI(IOPath)
),
should always return 0 from this method when it is called.
Since this method is used during codec resolution, implementations should avoid calling methods that
may throw exceptions.
HtsDecoder<?,? extends HtsRecord> getDecoder(Bundle inputBundle, D decoderOptions)
HtsDecoder
to decode the provided inputs. The input bundle must contain
resources of the type required by this codec. To find a codec appropriate for decoding a
given resource, use an HtsCodecResolver
obtained
from an HtsCodecRegistry
.
The framework will never call thi* method unless either ownsURI(IOPath)
, or
canDecodeURI(IOPath)
and canDecodeSignature(SignatureStream, String)
(IOPath)}
return true for inputBundle
.
inputBundle
- input to be decoded. To get a decoder for use with index queries that use
HtsQuery
methods, the bundle must contain
an index resource.decoderOptions
- options for the decoder to useHtsDecoder
that can decode the provided inputsHtsEncoder<?,? extends HtsRecord> getEncoder(Bundle outputBundle, E encoderOptions)
HtsEncoder
to encode to the provided outputs. The output bundle must contain
resources of the type required by this codec. To find a codec appropriate for encoding a given
resource, use an HtsCodecResolver
obtained from an
HtsCodecRegistry
.
The framework will never call this method unless either ownsURI(IOPath)
, or
canDecodeURI(IOPath)
returned true for outputBundle
.outputBundle
- target output for the encoderencoderOptions
- encoder options to useHtsEncoder
suitable for writing to the provided outputs