public final class ContainerFactory
extends java.lang.Object
Container
s, composed of one or more Slice
s.
based on a set of rules implemented by this class in combination with the parameter values provided via a
CRAMEncodingStrategy
object.
The general call pattern is to pass records in one at a time, and process Containers as they are returned:
long containerOffset = initialOffset; // after writing header, etc
ContainerFactory containerFactory = new ContainerFactory(...)
// retrieve input records and obtain/emit Containers as they are produced by the factory...
while (inputSAM.hasNext() {
Container container = containerFactory.getNextContainer(inputSAM.next, containerOffset);
if (container != null) {
containerOffset = writeContainer(container...)
}
}
// if there is a final Container, retrieve and emit it
Container finalContainer = containerFactory.getFinalContainer(containerOffset);
if (finalContainer != null) {
containers.add(finalContainer);
}
Multiple slices are only aggregated into a single container if slices/container is > 1, *and* all of the
slices are SINGLE_REFERENCE and have the same (mapped) reference context. MULTI_REFERENCE slices are never
aggregated with other slices into a single container, no matter how many slices/container are requested,
since it can be very inefficient to do so (the spec requires that if any slice in a container is
multiple-reference, all slices in the container must also be MULTI_REFERENCE).
For coordinate sorted inputs, a MULTI_REFERENCE slice is only created when there are not enough reads mapped
to a single reference sequence to reach the MINIMUM_SINGLE_REFERENCE_SLICE_THRESHOLD. This usually only happens
near the end of the reads mapped to a given sequence. When that happens, a small MULTI_REFERENCE slice for the
remaining reads mapped to the previous sequence, plus some subsequent records are accumulated until
MINIMUM_SINGLE_REFERENCE_SLICE_THRESHOLD is hit, and the resulting MULTI_REFERENCE slice will be emitted into
it's own container.Constructor and Description |
---|
ContainerFactory(SAMFileHeader samFileHeader,
CRAMEncodingStrategy encodingStrategy,
CRAMReferenceSource referenceSource) |
Modifier and Type | Method and Description |
---|---|
Container |
getFinalContainer(long containerByteOffset)
Obtain a
Container from any remaining accumulated SAMRecords, if any. |
Container |
getNextContainer(SAMRecord samRecord,
long containerByteOffset)
|
boolean |
shouldEmitContainer(int currentReferenceContextID,
int nextRecordIndex,
int numberOfSliceEntries)
Determine if a Container should be emitted based on the current reference context and the reference
context for the next record to be processed, and the encoding strategy parameters.
|
public ContainerFactory(SAMFileHeader samFileHeader, CRAMEncodingStrategy encodingStrategy, CRAMReferenceSource referenceSource)
samFileHeader
- the SAMFileHeader
(used to determine sort order and resolve read groups)encodingStrategy
- the CRAMEncodingStrategy
parameters to usereferenceSource
- the CRAMReferenceSource
to use for containers created by this factorypublic final Container getNextContainer(SAMRecord samRecord, long containerByteOffset)
public Container getFinalContainer(long containerByteOffset)
Container
from any remaining accumulated SAMRecords, if any.public boolean shouldEmitContainer(int currentReferenceContextID, int nextRecordIndex, int numberOfSliceEntries)
currentReferenceContextID
- nextRecordIndex
- numberOfSliceEntries
- Container
should be emitted, otherwise false