Class SliceFactory

java.lang.Object
htsjdk.samtools.cram.build.SliceFactory

public final class SliceFactory extends Object
Factory for creating Slices when writing a CRAM stream. Determines when to emit a Slice, based on a set of rules implemented by this class; the accumulated SliceEntry state objects; and the parameter values in the provided CRAMEncodingStrategy object.
  • Constructor Details

  • Method Details

    • createNewSliceEntry

      public long createNewSliceEntry(int currentReferenceContextID, List<SAMRecord> sliceSAMRecords)
      Add a new slice entry, and return the number of sliceEntries.
      Parameters:
      currentReferenceContextID -
      sliceSAMRecords -
      Returns:
    • getCRAMRecordsForAllSlices

      public List<CRAMCompressionRecord> getCRAMRecordsForAllSlices()
      Get all CRAM records accumulated by the factory. These are the records that will be used to create one or more slices when createSlices(htsjdk.samtools.cram.structure.CompressionHeader, long) is called.
      Returns:
      the list of all CRAMRecords
    • getNumberOfSliceEntries

      public int getNumberOfSliceEntries()
    • createSlices

      public List<Slice> createSlices(CompressionHeader compressionHeader, long containerByteOffset)
      Returns a set of Slices using the records accumulated by the factory, and resets the factory state.
      Parameters:
      compressionHeader - the compression header to use to create the Slices
      containerByteOffset - the container byte offset to use for the newly created Slices
      Returns:
      List of Slices created from the accumulated state of this SliceFactory
    • getUpdatedReferenceContext

      public int getUpdatedReferenceContext(int currentReferenceContext, int nextReferenceIndex, int numberOfSAMRecords)
      Decide if the current records should be flushed based on the current reference context, the reference context for the next record to be written, and the number of records seen so far. Slices with the Multiple Reference flag (-2) set as the sequence ID in the header may contain reads mapped to multiple external references, including unmapped reads (placed on these references or unplaced), but multiple embedded references cannot be combined in this way. When multiple references are used, the RI data series will be used to determine the reference sequence ID for each record. This data series is not present when only a single reference is used within a slice. The Unmapped (-1) sequence ID in the header is for slices containing only unplaced unmapped reads. A slice containing data that does not use the external reference in any sequence may set the reference MD5 sum to zero. This can happen because the data is unmapped or the sequence has been stored verbatim instead of via reference-differencing. This latter scenario is recommended for unsorted or non-coordinate-sorted data.
      Parameters:
      nextReferenceIndex - reference index of the next record to be emitted
      Returns:
      ReferenceContext.UNINITIALIZED_REFERENCE_ID if a current slice should be flushed and subsequent records should go into a new slice; otherwise the updated reference context.