Class ContainerFactory

java.lang.Object
htsjdk.samtools.cram.build.ContainerFactory

public final class ContainerFactory extends Object
Aggregates SAMRecord objects into one or more Containers, composed of one or more Slices. based on a set of rules implemented by this class in combination with the parameter values provided via a CRAMEncodingStrategy object. The general call pattern is to pass records in one at a time, and process Containers as they are returned:

  long containerOffset = initialOffset; // after writing header, etc
  ContainerFactory containerFactory = new ContainerFactory(...)
  // retrieve input records and obtain/emit Containers as they are produced by the factory...
  while (inputSAM.hasNext() {
     Container container = containerFactory.getNextContainer(inputSAM.next, containerOffset);
     if (container != null) {
         containerOffset = writeContainer(container...)
     }
  }

  // if there is a final Container, retrieve and emit it
  Container finalContainer = containerFactory.getFinalContainer(containerOffset);
  if (finalContainer != null) {
      containers.add(finalContainer);
  }
  
 
Multiple slices are only aggregated into a single container if slices/container is > 1, *and* all of the slices are SINGLE_REFERENCE and have the same (mapped) reference context. MULTI_REFERENCE slices are never aggregated with other slices into a single container, no matter how many slices/container are requested, since it can be very inefficient to do so (the spec requires that if any slice in a container is multiple-reference, all slices in the container must also be MULTI_REFERENCE). For coordinate sorted inputs, a MULTI_REFERENCE slice is only created when there are not enough reads mapped to a single reference sequence to reach the MINIMUM_SINGLE_REFERENCE_SLICE_THRESHOLD. This usually only happens near the end of the reads mapped to a given sequence. When that happens, a small MULTI_REFERENCE slice for the remaining reads mapped to the previous sequence, plus some subsequent records are accumulated until MINIMUM_SINGLE_REFERENCE_SLICE_THRESHOLD is hit, and the resulting MULTI_REFERENCE slice will be emitted into it's own container.
  • Constructor Details

  • Method Details

    • getNextContainer

      public final Container getNextContainer(SAMRecord samRecord, long containerByteOffset)
      Add a new SAMRecord object to the factory, obtaining a Container if one is returned.
      Parameters:
      samRecord - the next SAMRecord to be written
      containerByteOffset - the byte offset to record in the Container if one is created
      Returns:
      a Container if the threshold for emitting a Container has been reached, otherwise null
    • getFinalContainer

      public Container getFinalContainer(long containerByteOffset)
      Obtain a Container from any remaining accumulated SAMRecords, if any.
      Parameters:
      containerByteOffset - the byte offset to record in the newly emitted Container if one is created
      Returns:
      a Container if any record have been accumulated, otherwise null
    • shouldEmitContainer

      public boolean shouldEmitContainer(int currentReferenceContextID, int nextRecordIndex, int numberOfSliceEntries)
      Determine if a Container should be emitted based on the current reference context and the reference context for the next record to be processed, and the encoding strategy parameters. A container is emitted if: - the requested number of slices per container has been reached, or - a multi-reference slice has been accumulated (a multi-ref slice will always be emitted into it's own container as soon as it's generated, since we dont want to confer multi-ref-ness on the next slice, which might otherwise be single-ref), or - we haven't reached the requested number of slices, but we're changing reference contexts and we don't want to create a MULTI-REF container out of two or more SINGLE_REF slices with different contexts, since by the spec we'd be forced to call that container MULTI-REF, and thus the slices would have to be multi-ref. So instead emit a single ref container
      Parameters:
      currentReferenceContextID -
      nextRecordIndex -
      numberOfSliceEntries -
      Returns:
      true if a Containershould be emitted, otherwise false