Class CompressionHeaderEncodingMap

java.lang.Object
htsjdk.samtools.cram.structure.CompressionHeaderEncodingMap

public class CompressionHeaderEncodingMap extends Object
Maintains a map of DataSeries to EncodingDescriptor, and a second map that contains the compressor to use for each EncodingDescriptor that represents an EXTERNAL encoding. There are two constructors; one populates the map from scratch using the default encodings chosen by this (htsjdk) implementation, used when writing a new CRAM; one populates the map from a serialized CRAM stream resulting in encodings chosen by the implementation that wrote that CRAM. Although the CRAM spec defines a fixed list of data series, individual CRAM implementations may choose to use only a subset of these. Therefore, the actual set of encodings that are instantiated can vary depending on the source. Notes on the htsjdk CRAM write implementation: This implementation encodes ALL DataSeries to external blocks, (although some of the external encodings split the data between core and external; see ByteArrayLenEncoding, and does not use the 'BB' or 'QQ' DataSeries when writing CRAM at all. Relies heavily on GZIP and RANS for compression. See EncodingFactory for details on how an EncodingDescriptor is mapped to the codec that actually transfers data to and from underlying Slice blocks.
  • Field Details

    • DATASERIES_NOT_READ_BY_HTSJDK

      public static final Set<DataSeries> DATASERIES_NOT_READ_BY_HTSJDK
  • Constructor Details

    • CompressionHeaderEncodingMap

      public CompressionHeaderEncodingMap(CRAMEncodingStrategy encodingStrategy)
      Constructor used to create the default encoding map for writing CRAMs. The encoding strategy parameter values are used to set compression levels, etc, but any encoding map embedded is ignored since this uses the default strategy.
      Parameters:
      encodingStrategy - CRAMEncodingStrategy containing parameter values to use when creating the encoding map
    • CompressionHeaderEncodingMap

      public CompressionHeaderEncodingMap(InputStream inputStream)
      Constructor used to discover an encoding map from a serialized CRAM stream.
      Parameters:
      inputStream - the CRAM input stream to be consumed
  • Method Details

    • putTagBlockCompression

      public void putTagBlockCompression(int tagId, ExternalCompressor compressor)
      Add an external compressor for a tag block
      Parameters:
      tagId - the tag as a content ID
      compressor - compressor to be used for this tag block
    • getEncodingDescriptorForDataSeries

      public EncodingDescriptor getEncodingDescriptorForDataSeries(DataSeries dataSeries)
      Get the encoding params that should be used for a given DataSeries.
      Parameters:
      dataSeries -
      Returns:
      EncodingDescriptor for the DataSeries
    • getExternalIDs

      public List<Integer> getExternalIDs()
      Get a list of all external IDs for this encoding map
      Returns:
      list of all external IDs for this encoding map
    • createCompressedBlockForStream

      public Block createCompressedBlockForStream(Integer contentId, ByteArrayOutputStream outputStream)
      Given a content ID, return a Block for that ID by obtaining the contents of the stream, compressing it using the compressor for that contentID, and converting the result to a Block.
      Parameters:
      contentId - contentID to use
      outputStream - stream to compress
      Returns:
      Block containing the compressed contends of the stream
    • write

      public void write(OutputStream outputStream) throws IOException
      Write the encoding map out to a CRAM Stream
      Parameters:
      outputStream - stream to write
      Throws:
      IOException
    • getBestExternalCompressor

      public ExternalCompressor getBestExternalCompressor(byte[] data, CRAMEncodingStrategy encodingStrategy)
      Return the best external compressor to use for the provided byte array (compressor that results in the smallest compressed size). Note that this does not necessarily mean this is the best compression to use for the source data series, as it does not consider the size of the alphabet (2 byte int, 4 byte int) since its only choosing from EXTERNAL compressors.
      Parameters:
      data - byte array to compress
      encodingStrategy - encoding strategy parameters to use
      Returns:
      the best ExternalCompressor to use for this data
    • putExternalEncoding

      public void putExternalEncoding(DataSeries dataSeries, EncodingDescriptor encodingDescriptor, ExternalCompressor compressor)
    • equals

      public boolean equals(Object o)
      Overrides:
      equals in class Object
    • hashCode

      public int hashCode()
      Overrides:
      hashCode in class Object