Package htsjdk.samtools.cram.build
Class ContainerFactory
java.lang.Object
htsjdk.samtools.cram.build.ContainerFactory
Aggregates SAMRecord objects into one or more
Container
s, composed of one or more Slice
s.
based on a set of rules implemented by this class in combination with the parameter values provided via a
CRAMEncodingStrategy
object.
The general call pattern is to pass records in one at a time, and process Containers as they are returned:
long containerOffset = initialOffset; // after writing header, etc
ContainerFactory containerFactory = new ContainerFactory(...)
// retrieve input records and obtain/emit Containers as they are produced by the factory...
while (inputSAM.hasNext() {
Container container = containerFactory.getNextContainer(inputSAM.next, containerOffset);
if (container != null) {
containerOffset = writeContainer(container...)
}
}
// if there is a final Container, retrieve and emit it
Container finalContainer = containerFactory.getFinalContainer(containerOffset);
if (finalContainer != null) {
containers.add(finalContainer);
}
Multiple slices are only aggregated into a single container if slices/container is > 1, *and* all of the
slices are SINGLE_REFERENCE and have the same (mapped) reference context. MULTI_REFERENCE slices are never
aggregated with other slices into a single container, no matter how many slices/container are requested,
since it can be very inefficient to do so (the spec requires that if any slice in a container is
multiple-reference, all slices in the container must also be MULTI_REFERENCE).
For coordinate sorted inputs, a MULTI_REFERENCE slice is only created when there are not enough reads mapped
to a single reference sequence to reach the MINIMUM_SINGLE_REFERENCE_SLICE_THRESHOLD. This usually only happens
near the end of the reads mapped to a given sequence. When that happens, a small MULTI_REFERENCE slice for the
remaining reads mapped to the previous sequence, plus some subsequent records are accumulated until
MINIMUM_SINGLE_REFERENCE_SLICE_THRESHOLD is hit, and the resulting MULTI_REFERENCE slice will be emitted into
it's own container.-
Constructor Summary
ConstructorsConstructorDescriptionContainerFactory
(SAMFileHeader samFileHeader, CRAMEncodingStrategy encodingStrategy, CRAMReferenceSource referenceSource) -
Method Summary
Modifier and TypeMethodDescriptiongetFinalContainer
(long containerByteOffset) Obtain aContainer
from any remaining accumulated SAMRecords, if any.final Container
getNextContainer
(SAMRecord samRecord, long containerByteOffset) boolean
shouldEmitContainer
(int currentReferenceContextID, int nextRecordIndex, int numberOfSliceEntries) Determine if a Container should be emitted based on the current reference context and the reference context for the next record to be processed, and the encoding strategy parameters.
-
Constructor Details
-
ContainerFactory
public ContainerFactory(SAMFileHeader samFileHeader, CRAMEncodingStrategy encodingStrategy, CRAMReferenceSource referenceSource) - Parameters:
samFileHeader
- theSAMFileHeader
(used to determine sort order and resolve read groups)encodingStrategy
- theCRAMEncodingStrategy
parameters to usereferenceSource
- theCRAMReferenceSource
to use for containers created by this factory
-
-
Method Details
-
getNextContainer
-
getFinalContainer
Obtain aContainer
from any remaining accumulated SAMRecords, if any. -
shouldEmitContainer
public boolean shouldEmitContainer(int currentReferenceContextID, int nextRecordIndex, int numberOfSliceEntries) Determine if a Container should be emitted based on the current reference context and the reference context for the next record to be processed, and the encoding strategy parameters. A container is emitted if: - the requested number of slices per container has been reached, or - a multi-reference slice has been accumulated (a multi-ref slice will always be emitted into it's own container as soon as it's generated, since we dont want to confer multi-ref-ness on the next slice, which might otherwise be single-ref), or - we haven't reached the requested number of slices, but we're changing reference contexts and we don't want to create a MULTI-REF container out of two or more SINGLE_REF slices with different contexts, since by the spec we'd be forced to call that container MULTI-REF, and thus the slices would have to be multi-ref. So instead emit a single ref container- Parameters:
currentReferenceContextID
-nextRecordIndex
-numberOfSliceEntries
-- Returns:
- true if a
Container
should be emitted, otherwise false
-