Class TrieMemtable

  • All Implemented Interfaces:
    java.lang.Comparable<Memtable>, Memtable, UnfilteredSource

    public class TrieMemtable
    extends AbstractShardedMemtable
    Trie memtable implementation. Improves memory usage, garbage collection efficiency and lookup performance. The implementation is described in detail in the paper: https://www.vldb.org/pvldb/vol15/p3359-lambov.pdf The configuration takes a single parameter: - shards: the number of shards to split into, defaulting to the number of CPU cores. Also see Memtable_API.md.
    • Field Detail

      • BUFFER_TYPE

        public static final BufferType BUFFER_TYPE
        Buffer type to use for memtable tries (on- vs off-heap)
      • MAX_RECURSIVE_KEY_LENGTH

        public static final int MAX_RECURSIVE_KEY_LENGTH
        If keys is below this length, we will use a recursive procedure for inserting data in the memtable trie.
        See Also:
        Constant Field Values
      • BYTE_COMPARABLE_VERSION

        public static final ByteComparable.Version BYTE_COMPARABLE_VERSION
        The byte-ordering conversion version to use for memtables.
    • Method Detail

      • isClean

        public boolean isClean()
        Description copied from interface: Memtable
        True if the memtable contains no data
      • discard

        public void discard()
        Description copied from interface: Memtable
        This memtable is no longer in use or required for outstanding flushes or operations. All held memory must be released.
        Specified by:
        discard in interface Memtable
        Overrides:
        discard in class AbstractAllocatorMemtable
      • put

        public long put​(PartitionUpdate update,
                        UpdateTransaction indexer,
                        OpOrder.Group opGroup)
        Should only be called by ColumnFamilyStore.apply via Keyspace.apply, which supplies the appropriate OpOrdering. commitLogSegmentPosition should only be null if this is a secondary index, in which case it is *expected* to be null
        Parameters:
        update - the partition update, may be a new partition or an update to an existing one
        indexer - receives information about the update's effect
        opGroup - write operation group, used to permit the operation to complete if it is needed to complete a flush to free space.
        Returns:
        the smallest timestamp delta between corresponding rows from existing and update. A timestamp delta being computed as the difference between the cells and DeletionTimes from any existing partition and those in update. See CASSANDRA-7979.
      • getLiveDataSize

        public long getLiveDataSize()
        Technically we should scatter gather on all the core threads because the size in following calls are not using volatile variables, but for metrics purpose this should be good enough.
      • partitionCount

        public long partitionCount()
        Description copied from interface: Memtable
        Number of partitions stored in the memtable
      • getMinTimestamp

        public long getMinTimestamp()
        Returns the minTS if one available, otherwise NO_MIN_TIMESTAMP. EncodingStats uses a synthetic epoch TS at 2015. We don't want to leak that (CASSANDRA-18118) so we return NO_MIN_TIMESTAMP instead.
        Specified by:
        getMinTimestamp in interface UnfilteredSource
        Overrides:
        getMinTimestamp in class AbstractMemtable
        Returns:
        The minTS or NO_MIN_TIMESTAMP if none available
      • partitionIterator

        public org.apache.cassandra.db.memtable.TrieMemtable.MemtableUnfilteredPartitionIterator partitionIterator​(ColumnFilter columnFilter,
                                                                                                                   DataRange dataRange,
                                                                                                                   SSTableReadsListener readsListener)
        Description copied from interface: UnfilteredSource
        Returns a partition iterator for the given data range.
        Parameters:
        columnFilter - filter to apply to all returned partitions
        dataRange - the partition and clustering range queried
        readsListener - a listener used to handle internal read events
      • rowIterator

        public UnfilteredRowIterator rowIterator​(DecoratedKey key,
                                                 Slices slices,
                                                 ColumnFilter selectedColumns,
                                                 boolean reversed,
                                                 SSTableReadsListener listener)
        Description copied from interface: UnfilteredSource
        Returns a row iterator for the given partition, applying the specified row and column filters.
        Parameters:
        key - the partition key
        slices - the row ranges to return
        selectedColumns - filter to apply to all returned partitions
        reversed - true if the content should be returned in reverse order
        listener - a listener used to handle internal read events
      • factory

        public static org.apache.cassandra.db.memtable.TrieMemtable.Factory factory​(java.util.Map<java.lang.String,​java.lang.String> optionsCopy)
      • unusedReservedMemory

        public long unusedReservedMemory()