Class DeltaLogActionUtils

Object
io.delta.kernel.internal.DeltaLogActionUtils

public class DeltaLogActionUtils extends Object
Exposes APIs to read the raw actions within the *commit files* of the _delta_log. This is used for CDF, streaming, and more.
  • Method Details

    • getCommitFilesForVersionRange

      public static List<FileStatus> getCommitFilesForVersionRange(Engine engine, Path tablePath, long startVersion, long endVersion)
      For a table get the list of commit log files for the provided version range.
      Parameters:
      tablePath - path for the given table
      startVersion - start version of the range (inclusive)
      endVersion - end version of the range (inclusive)
      Returns:
      the list of commit files in increasing order between startVersion and endVersion
      Throws:
      TableNotFoundException - if the table does not exist or if it is not a delta table
      KernelException - if a commit file does not exist for any of the versions in the provided range
      KernelException - if provided an invalid version range
    • readCommitFiles

      public static CloseableIterator<ColumnarBatch> readCommitFiles(Engine engine, List<FileStatus> commitFiles, StructType readSchema)
      Read the given commitFiles and return the contents as an iterator of batches. Also adds two columns "version" and "timestamp" that store the commit version and timestamp for the commit file that the batch was read from. The "version" and "timestamp" columns are the first and second columns in the returned schema respectively and both of LongType
      Parameters:
      commitFiles - list of delta commit files to read
      readSchema - JSON schema to read
      Returns:
      an iterator over the contents of the files in the same order as the provided files
    • listDeltaLogFilesAsIter

      public static CloseableIterator<FileStatus> listDeltaLogFilesAsIter(Engine engine, Set<FileNames.DeltaLogFileType> fileTypes, Path tablePath, long startVersion, Optional<Long> endVersionOpt, boolean mustBeRecreatable)
      Returns a CloseableIterator of files of type $fileTypes in the _delta_log directory of the given $tablePath, in increasing order from $startVersion to the optional $endVersion.
      Throws:
      TableNotFoundException - if the table or its _delta_log does not exist
      KernelException - if mustBeRecreatable is true, endVersionOpt is present, and the _delta_log history has been truncated so that we cannot load the desired end version