This class acts as a DataSource provider for column format tables provided Snappy.
Base trait for iterators that are capable of reading and returning the entire set of columns of a column batch.
Base trait for iterators that are capable of reading and returning the entire set of columns of a column batch. These can be local region iterators or those fetching entries from remote nodes.
Encapsulates a delta for update to be applied to column table and also is stored in the region.
Encapsulates a delta for update to be applied to column table and also
is stored in the region. The key for a delta is a negative columnIndex
evaluated as (ColumnFormatEntry.DELTA_STATROW_COL_INDEX - 1 + MAX_DEPTH * -columnIndex
)
where columnIndex
is the 0-based index of the underlying table column.
Note that this delta is for carrying the delta update and applying on existing delta, if any, while the actual value that is stored in the region is a ColumnFormatValue. This is to ensure clean working of the delta mechanism where store-layer code checks the type of object for Delta and makes assumptions about it (like it being a temporary value that should not go into region etc).
For a description of column delta format see the class comments in org.apache.spark.sql.execution.columnar.encoding.ColumnDeltaEncoder.
A RowEncoder implementation for ColumnFormatValue and child classes.
A customized iterator for column store tables that projects out the required columns and returns those column batches first that have all their columns in the memory.
A customized iterator for column store tables that projects out the required columns and returns those column batches first that have all their columns in the memory. Further this will make use of DiskBlockSortManager to allow for concurrent partition iterators to do cross-partition disk block sorting and fault-in for best disk read performance (SNAP-2012).
Key object in the column store.
Value object in the column store simply encapsulates binary data as a ByteBuffer.
Value object in the column store simply encapsulates binary data as a ByteBuffer. This can be either a direct buffer or a heap buffer depending on the system off-heap configuration. The reason for a separate type is to easily store data off-heap without any major changes to engine otherwise as well as efficiently serialize/deserialize them directly to Oplog/socket.
This class extends SerializedDiskBuffer to avoid a copy when reading/writing from Oplog. Consequently it writes the serialization header itself (typeID + classID + size) into stream as would be written by DataSerializer.writeObject. This helps it avoid additional byte writes when transferring data to the channels.
Partition resolver for the column store.
Currently this is same as ColumnFormatRelation but has kept it as a separate class to allow adding of any index specific functionality in future.
Column Store implementation for GemFireXD.
A ClusteredColumnIterator that fetches entries from a remote bucket.
Utility methods for column format storage keys and values.
This class acts as a DataSource provider for column format tables provided Snappy. It uses GemFireXD as actual datastore to physically locate the tables. Column tables can be used for storing data in columnar compressed format. A example usage is given below.
val data = Seq(Seq(1, 2, 3), Seq(7, 8, 9), Seq(9, 2, 3), Seq(4, 2, 3), Seq(5, 6, 7)) val rdd = sc.parallelize(data, data.length).map(s => new Data(s(0), s(1), s(2))) val dataDF = snc.createDataFrame(rdd) snc.createTable(tableName, "column", dataDF.schema, props) dataDF.write.insertInto(tableName)
This provider scans underlying tables in parallel and is aware of the data partition. It does not introduces a shuffle if simple table query is fired. One can insert a single or multiple rows into this table as well as do a bulk insert by a Spark DataFrame. Bulk insert example is shown above.