Package org.apache.cassandra.db.streaming

File transfer

When tranferring whole or subsections of an sstable, only the DATA component is shipped. To that end, there are three "modes" of an sstable transfer that need to be handled somewhat differently: 1) uncompressed sstable - data needs to be read into user space so it can be manipulated: checksum validation, apply stream compression (see next section), and/or TLS encryption. 2) compressed sstable, transferred with SSL/TLS - data needs to be read into user space as that is where the TLS encryption needs to happen. Netty does not allow the pretense of doing zero-copy transfers when TLS is in the pipeline; data must explicitly be pulled into user-space memory for TLS encryption to work. 3) compressed sstable, transferred without SSL/TLS - data can be streamed via zero-copy transfer as the data does not need to be manipulated (it can be sent "as-is").

Compressing the data

We always want to transfer as few bytes as possible of the wire when streaming a file. If the sstable is not already compressed via table compression options, we apply an on-the-fly stream compression to the data. The stream compression format is documented in StreamCompressionSerializer You may be wondering: why implement your own compression scheme? why not use netty's built-in compression codecs, like Lz4FrameEncoder? That makes complete sense if all the sstables to be streamed are non using sstable compression (and obviously you wouldn't use stream compression when the sstables are using sstable compression). The problem is when you have a mix of files, some using sstable compression and some not. You can either: - send the files of one type over one kind of socket, and the others over another socket - send them both over the same socket, but then auto-adjust per each file type. I've opted for the latter to keep socket/channel management simpler and cleaner.