Class SSTableReader

  • All Implemented Interfaces:
    UnfilteredSource, RefCounted<SSTableReader>, SelfRefCounted<SSTableReader>
    Direct Known Subclasses:
    SSTableReaderWithFilter

    public abstract class SSTableReader
    extends SSTable
    implements UnfilteredSource, SelfRefCounted<SSTableReader>
    An SSTableReader can be constructed in a number of places, but typically is either read from disk at startup, or constructed from a flushed memtable, or after compaction to replace some existing sstables. However once created, an sstablereader may also be modified.

    A reader's SSTableReader.OpenReason describes its current stage in its lifecycle. Note that in parallel to this, there are two different Descriptor types; TMPLINK and FINAL; the latter corresponds to SSTableReader.OpenReason.NORMAL state readers and all readers that replace a SSTableReader.OpenReason.NORMAL one. TMPLINK is used for SSTableReader.OpenReason.EARLY state readers and no others.

    When a reader is being compacted, if the result is large its replacement may be opened as SSTableReader.OpenReason.EARLY before compaction completes in order to present the result to consumers earlier. In this case the reader will itself be changed to a SSTableReader.OpenReason.MOVED_START state, where its start no longer represents its on-disk minimum key. This is to permit reads to be directed to only one reader when the two represent the same data. The SSTableReader.OpenReason.EARLY file can represent a compaction result that is either partially complete and still in-progress, or a complete and immutable sstable that is part of a larger macro compaction action that has not yet fully completed.

    Currently ALL compaction results at least briefly go through an SSTableReader.OpenReason.EARLY open state prior to completion, regardless of if early opening is enabled.

    Since a reader can be created multiple times over the same shared underlying resources, and the exact resources it shares between each instance differ subtly, we track the lifetime of any underlying resource with its own reference count, which each instance takes a Ref to. Each instance then tracks references to itself, and once these all expire it releases all its Ref to these underlying resources.

    There is some shared cleanup behaviour needed only once all readers in a certain stage of their lifecycle (i.e. SSTableReader.OpenReason.EARLY or SSTableReader.OpenReason.NORMAL opening), and some that must only occur once all readers of any kind over a single logical sstable have expired. These are managed by the SSTableReader.InstanceTidier and SSTableReader.GlobalTidy classes at the bottom, and are effectively managed as another resource each instance tracks its own Ref instance to, to ensure all of these resources are cleaned up safely and can be debugged otherwise.

    TODO: fill in details about Tracker and lifecycle interactions for tools, and for compaction strategies

    • Field Detail

      • maxTimestampAscending

        public static final java.util.Comparator<SSTableReader> maxTimestampAscending
      • maxTimestampDescending

        public static final java.util.Comparator<SSTableReader> maxTimestampDescending
      • firstKeyComparator

        public static final java.util.Comparator<SSTableReader> firstKeyComparator
      • firstKeyOrdering

        public static final com.google.common.collect.Ordering<SSTableReader> firstKeyOrdering
      • lastKeyComparator

        public static final java.util.Comparator<SSTableReader> lastKeyComparator
      • idComparator

        public static final java.util.Comparator<SSTableReader> idComparator
      • idReverseComparator

        public static final java.util.Comparator<SSTableReader> idReverseComparator
      • sizeComparator

        public static final java.util.Comparator<SSTableReader> sizeComparator
      • maxDataAge

        public final long maxDataAge
        maxDataAge is a timestamp in local server time (e.g. Global.currentTimeMilli) which represents an upper bound to the newest piece of data stored in the sstable. In other words, this sstable does not contain items created later than maxDataAge.

        The field is not serialized to disk, so relying on it for more than what truncate does is not advised.

        When a new sstable is flushed, maxDataAge is set to the time of creation. When a sstable is created from compaction, maxDataAge is set to max of all merged sstables.

        The age is in milliseconds since epoc and is local to this host.

      • isSuspect

        public final java.util.concurrent.atomic.AtomicBoolean isSuspect
      • sstableMetadata

        protected volatile StatsMetadata sstableMetadata
    • Method Detail

      • getApproximateKeyCount

        public static long getApproximateKeyCount​(java.lang.Iterable<SSTableReader> sstables)
        Calculate approximate key count. If cardinality estimator is available on all given sstables, then this method use them to estimate key count. If not, then this uses index summaries.
        Parameters:
        sstables - SSTables to calculate key count
        Returns:
        estimated key count
      • open

        public static SSTableReader open​(SSTable.Owner owner,
                                         Descriptor descriptor,
                                         java.util.Set<Component> components,
                                         TableMetadataRef metadata,
                                         boolean validate,
                                         boolean isOffline)
        Open an SSTable for reading
        Parameters:
        owner - owning entity
        descriptor - SSTable to open
        components - Components included with this SSTable
        metadata - for this SSTables CF
        validate - Check SSTable for corruption (limited)
        isOffline - Whether we are opening this SSTable "offline", for example from an external tool or not for inclusion in queries (validations) This stops regenerating BF + Summaries and also disables tracking of hotness for the SSTable.
        Returns:
        SSTableReader
      • getTotalBytes

        public static long getTotalBytes​(java.lang.Iterable<SSTableReader> sstables)
      • getTotalUncompressedBytes

        public static long getTotalUncompressedBytes​(java.lang.Iterable<SSTableReader> sstables)
      • equals

        public boolean equals​(java.lang.Object that)
        Overrides:
        equals in class java.lang.Object
      • hashCode

        public int hashCode()
        Overrides:
        hashCode in class java.lang.Object
      • getFilename

        public java.lang.String getFilename()
        Overrides:
        getFilename in class SSTable
      • setupOnline

        public void setupOnline()
      • runWithLock

        public <R,​E extends java.lang.Exception> R runWithLock​(CheckedFunction<Descriptor,​R,​E> task)
                                                              throws E extends java.lang.Exception
        Execute provided task with sstable lock to avoid racing with index summary redistribution, SEE CASSANDRA-15861.
        Parameters:
        task - to be guarded by sstable lock
        Throws:
        E extends java.lang.Exception
      • setReplaced

        public void setReplaced()
      • isReplaced

        public boolean isReplaced()
      • runOnClose

        public void runOnClose​(java.lang.Runnable runOnClose)
        The runnable passed to this method must not be an anonymous or non-static inner class. It can be a lambda or a method reference provided that it does not retain a reference chain to this reader.
      • unbuildTo

        protected final <B extends SSTableReader.Builder<?,​B>> B unbuildTo​(B builder,
                                                                                 boolean sharedCopy)
        The method sets fields specific to this SSTableReader and the parent SSTable on the provided SSTableReader.Builder. The method is intended to be called from the overloaded unbuildTo method in subclasses.
        Parameters:
        builder - the builder on which the fields should be set
        sharedCopy - whether the SharedCloseable resources should be passed as shared copies or directly; note that the method will overwrite the fields representing SharedCloseable only if they are not set in the builder yet (the relevant fields in the builder are null).
        Returns:
        the same instance of builder as provided
      • closeInternalComponent

        protected void closeInternalComponent​(java.lang.AutoCloseable closeable)
        All the resources which should be released upon closing this sstable reader are registered with in SSTableReader.GlobalTidy. This method lets close a provided resource explicitly any time and unregister it from SSTableReader.GlobalTidy so that it is not tried to be released twice.
        Parameters:
        closeable - a resource to be closed
      • releaseInMemoryComponents

        public abstract void releaseInMemoryComponents()
        This method is expected to close the components which occupy memory but are not needed when we just want to stream the components (for example, when SSTable is opened with SSTableLoader). The method should call closeInternalComponent(AutoCloseable) for each such component. Leaving the implementation empty is valid given there are not such resources to release.
      • validate

        public void validate()
        Perform any validation needed for the reader upon creation before returning it from the SSTableReader.Builder.
      • getCompressionMetadata

        public CompressionMetadata getCompressionMetadata()
        Returns the compression metadata for this sstable. Note that the compression metdata is a resource and should not be closed by the caller. TODO do not return a closeable resource or return a shared copy
        Throws:
        java.lang.IllegalStateException - if the sstable is not compressed
      • getCompressionMetadataOffHeapSize

        public long getCompressionMetadataOffHeapSize()
        Returns the amount of memory in bytes used off heap by the compression meta-data.
        Returns:
        the amount of memory in bytes used off heap by the compression meta-data
      • estimatedKeys

        public abstract long estimatedKeys()
        Calculates an estimate of the number of keys in the sstable represented by this reader.
      • estimatedKeysForRanges

        public abstract long estimatedKeysForRanges​(java.util.Collection<Range<Token>> ranges)
        Calculates an estimate of the number of keys for the given ranges in the sstable represented by this reader.
      • getKeySamples

        public abstract java.lang.Iterable<DecoratedKey> getKeySamples​(Range<Token> range)
        Returns sample keys for the provided token range.
      • getPositionsForRanges

        public java.util.List<SSTableReader.PartitionPositionBounds> getPositionsForRanges​(java.util.Collection<Range<Token>> ranges)
        Determine the minimal set of sections that can be extracted from this SSTable to cover the given ranges.
        Returns:
        A sorted list of (offset,end) pairs that cover the given ranges in the datafile for this SSTable.
      • getPosition

        public final long getPosition​(PartitionPosition key,
                                      SSTableReader.Operator op)
        Retrieves the position while updating the key cache and the stats.
        Parameters:
        key - The key to apply as the rhs to the given Operator. A 'fake' key is allowed to allow key selection by token bounds but only if op != * EQ
        op - The Operator defining matching keys: the nearest key to the target matching the operator wins.
      • getPosition

        protected long getPosition​(PartitionPosition key,
                                   SSTableReader.Operator op,
                                   boolean updateStats,
                                   SSTableReadsListener listener)
        Retrieve a position in data file according to the provided key and operator.
        Parameters:
        key - The key to apply as the rhs to the given Operator. A 'fake' key is allowed to allow key selection by token bounds but only if op != * EQ
        op - The Operator defining matching keys: the nearest key to the target matching the operator wins.
        updateStats - true if updating stats and cache
        listener - a listener used to handle internal events
        Returns:
        The index entry corresponding to the key, or null if the key is not present
      • getRowIndexEntry

        protected abstract AbstractRowIndexEntry getRowIndexEntry​(PartitionPosition key,
                                                                  SSTableReader.Operator op,
                                                                  boolean updateStats,
                                                                  SSTableReadsListener listener)
        Retrieve an index entry for the partition found according to the provided key and operator.
        Parameters:
        key - The key to apply as the rhs to the given Operator. A 'fake' key is allowed to allow key selection by token bounds but only if op != * EQ
        op - The Operator defining matching keys: the nearest key to the target matching the operator wins.
        updateStats - true if updating stats and cache
        listener - a listener used to handle internal events
        Returns:
        The index entry corresponding to the key, or null if the key is not present
      • keyReader

        public abstract KeyReader keyReader()
                                     throws java.io.IOException
        Returns a KeyReader over all keys in the sstable.
        Throws:
        java.io.IOException
      • keyIterator

        public KeyIterator keyIterator()
                                throws java.io.IOException
        Returns a KeyIterator over all keys in the sstable.
        Throws:
        java.io.IOException
      • firstKeyBeyond

        public abstract DecoratedKey firstKeyBeyond​(PartitionPosition token)
        Finds and returns the first key beyond a given token in this SSTable or null if no such key exists.
      • uncompressedLength

        public long uncompressedLength()
        Returns the length in bytes of the (uncompressed) data for this SSTable. For compressed files, this is not the same thing as the on disk size (see onDiskLength()).
      • tokenSpaceCoverage

        public double tokenSpaceCoverage()
        Returns:
        the fraction of the token space for which this sstable has content. In the simplest case this is just the size of the interval returned by getBounds(), but the sstable may contain "holes" when the locally-owned range is not contiguous (e.g. with vnodes). As this is affected by the local ranges which can change, the token space fraction is calculated at the time of writing the sstable and stored with its metadata. For older sstables that do not contain this metadata field, this method returns NaN.
      • onDiskLength

        public long onDiskLength()
        The length in bytes of the on disk size for this SSTable. For compressed files, this is not the same thing as the data length (see uncompressedLength()).
      • getCrcCheckChance

        public double getCrcCheckChance()
      • setCrcCheckChance

        public void setCrcCheckChance​(double crcCheckChance)
        Set the value of CRC check chance. The argument supplied is obtained from the property of the owning CFS. Called when either the SSTR is initialized, or the CFS's property is updated via JMX
      • markObsolete

        public void markObsolete​(java.lang.Runnable tidier)
        Mark the sstable as obsolete, i.e., compacted into newer sstables.

        When calling this function, the caller must ensure that the SSTableReader is not referenced anywhere except for threads holding a reference.

        Calling it multiple times is usually buggy.

      • isMarkedCompacted

        public boolean isMarkedCompacted()
      • markSuspect

        public void markSuspect()
      • unmarkSuspect

        public void unmarkSuspect()
      • isMarkedSuspect

        public boolean isMarkedSuspect()
      • getScanner

        public ISSTableScanner getScanner​(Range<Token> range)
        Direct I/O SSTableScanner over a defined range of tokens.
        Parameters:
        range - the range of keys to cover
        Returns:
        A Scanner for seeking over the rows of the SSTable.
      • getScanner

        public abstract ISSTableScanner getScanner()
        Direct I/O SSTableScanner over the entirety of the sstable..
        Returns:
        A Scanner over the full content of the SSTable.
      • getScanner

        public abstract ISSTableScanner getScanner​(java.util.Collection<Range<Token>> ranges)
        Direct I/O SSTableScanner over a defined collection of ranges of tokens.
        Parameters:
        ranges - the range of keys to cover
        Returns:
        A Scanner for seeking over the rows of the SSTable.
      • getScanner

        public abstract ISSTableScanner getScanner​(java.util.Iterator<AbstractBounds<PartitionPosition>> rangeIterator)
        Direct I/O SSTableScanner over an iterator of bounds.
        Parameters:
        rangeIterator - the keys to cover
        Returns:
        A Scanner for seeking over the rows of the SSTable.
      • getFileDataInput

        public FileDataInput getFileDataInput​(long position)
        Create a FileDataInput for the data file of the sstable represented by this reader. This method returns a newly opened resource which must be closed by the caller.
        Parameters:
        position - the data input will be opened and seek to this position
      • newSince

        public boolean newSince​(long timestampMillis)
        Tests if the sstable contains data newer than the given age param (in localhost currentMillis time). This works in conjunction with maxDataAge which is an upper bound on the data in the sstable represented by this reader.
        Returns:
        true iff this sstable contains data that's newer than the given timestamp
      • createLinks

        public void createLinks​(java.lang.String snapshotDirectoryPath)
      • createLinks

        public void createLinks​(java.lang.String snapshotDirectoryPath,
                                com.google.common.util.concurrent.RateLimiter rateLimiter)
      • createLinks

        public static void createLinks​(Descriptor descriptor,
                                       java.util.Set<Component> components,
                                       java.lang.String snapshotDirectoryPath)
      • createLinks

        public static void createLinks​(Descriptor descriptor,
                                       java.util.Set<Component> components,
                                       java.lang.String snapshotDirectoryPath,
                                       com.google.common.util.concurrent.RateLimiter limiter)
      • isRepaired

        public boolean isRepaired()
      • keyAtPositionFromSecondaryIndex

        public abstract DecoratedKey keyAtPositionFromSecondaryIndex​(long keyPositionFromSecondaryIndex)
                                                              throws java.io.IOException
        Reads the key stored at the position saved in SASI.

        When SASI is created, it uses key locations retrieved from KeyReader.keyPositionForSecondaryIndex(). This method is to read the key stored at such position. It is up to the concrete SSTable format implementation what that position means and which file it refers. The only requirement is that it is consistent with what KeyReader.keyPositionForSecondaryIndex() returns.

        Returns:
        key if found, null otherwise
        Throws:
        java.io.IOException
      • isPendingRepair

        public boolean isPendingRepair()
      • getPendingRepair

        public TimeUUID getPendingRepair()
      • getRepairedAt

        public long getRepairedAt()
      • isTransient

        public boolean isTransient()
      • intersects

        public boolean intersects​(java.util.Collection<Range<Token>> ranges)
      • getEstimatedCellPerPartitionCount

        public EstimatedHistogram getEstimatedCellPerPartitionCount()
      • getEstimatedDroppableTombstoneRatio

        public double getEstimatedDroppableTombstoneRatio​(long gcBefore)
      • getDroppableTombstonesBefore

        public double getDroppableTombstonesBefore​(long gcBefore)
      • getCompressionRatio

        public double getCompressionRatio()
      • getMaxTimestamp

        public long getMaxTimestamp()
      • getMaxLocalDeletionTime

        public long getMaxLocalDeletionTime()
      • mayHaveTombstones

        public boolean mayHaveTombstones()
        Whether the sstable may contain tombstones or if it is guaranteed to not contain any.

        Note that having that method return false guarantees the sstable has no tombstones whatsoever (so no cell tombstone, no range tombstone maker and no expiring columns), but having it return true doesn't guarantee it contains any as it may simply have non-expired cells.

      • getMinTTL

        public int getMinTTL()
      • getMaxTTL

        public int getMaxTTL()
      • getTotalColumnsSet

        public long getTotalColumnsSet()
      • getTotalRows

        public long getTotalRows()
      • getAvgColumnSetPerRow

        public int getAvgColumnSetPerRow()
      • getSSTableLevel

        public int getSSTableLevel()
      • mutateLevelAndReload

        public void mutateLevelAndReload​(int newLevel)
                                  throws java.io.IOException
        Mutate sstable level with a lock to avoid racing with entire-sstable-streaming and then reload sstable metadata
        Throws:
        java.io.IOException
      • mutateRepairedAndReload

        public void mutateRepairedAndReload​(long newRepairedAt,
                                            TimeUUID newPendingRepair,
                                            boolean isTransient)
                                     throws java.io.IOException
        Mutate sstable repair metadata with a lock to avoid racing with entire-sstable-streaming and then reload sstable metadata
        Throws:
        java.io.IOException
      • reloadSSTableMetadata

        public void reloadSSTableMetadata()
                                   throws java.io.IOException
        Reloads the sstable metadata from disk.

        Called after level is changed on sstable, for example if the sstable is dropped to L0

        Might be possible to remove in future versions

        Throws:
        java.io.IOException
      • openDataReader

        public RandomAccessReader openDataReader​(com.google.common.util.concurrent.RateLimiter limiter)
      • trySkipFileCacheBefore

        public void trySkipFileCacheBefore​(DecoratedKey key)
      • getDataCreationTime

        public long getDataCreationTime()
        Returns:
        last modified time for data component. 0 if given component does not exist or IO error occurs.
      • incrementReadCount

        public void incrementReadCount()
        Increment the total read count and read rate for this SSTable. This should not be incremented for non-query reads, like compaction.
      • setupInstance

        protected java.util.List<java.lang.AutoCloseable> setupInstance​(boolean trackHotness)
      • setup

        public void setup​(boolean trackHotness)
      • overrideReadMeter

        public void overrideReadMeter​(RestorableMeter readMeter)
      • mayContainAssumingKeyIsInRange

        public abstract boolean mayContainAssumingKeyIsInRange​(DecoratedKey key)
        The method verifies whether the sstable may contain the provided key. The method does approximation using Bloom filter if it is present and if it is not, performs accurate check in the index.
      • resetTidying

        public static void resetTidying()
      • moveAndOpenSSTable

        public static SSTableReader moveAndOpenSSTable​(ColumnFamilyStore cfs,
                                                       Descriptor oldDescriptor,
                                                       Descriptor newDescriptor,
                                                       java.util.Set<Component> components,
                                                       boolean copyData)
        Moves the sstable in oldDescriptor to a new place (with generation etc) in newDescriptor.

        All components given will be moved/renamed

      • shutdownBlocking

        public static void shutdownBlocking​(long timeout,
                                            java.util.concurrent.TimeUnit unit)
                                     throws java.lang.InterruptedException,
                                            java.util.concurrent.TimeoutException
        Throws:
        java.lang.InterruptedException
        java.util.concurrent.TimeoutException
      • bytesOnDisk

        public long bytesOnDisk()
        Returns:
        the physical size on disk of all components for this SSTable in bytes
      • logicalBytesOnDisk

        public long logicalBytesOnDisk()
        Returns:
        the total logical/uncompressed size in bytes of all components for this SSTable
      • maybePersistSSTableReadMeter

        public void maybePersistSSTableReadMeter()