Package io.delta.kernel.internal
Class DeltaLogActionUtils
Object
io.delta.kernel.internal.DeltaLogActionUtils
Exposes APIs to read the raw actions within the *commit files* of the _delta_log. This is used
for CDF, streaming, and more.
-
Nested Class Summary
Nested ClassesModifier and TypeClassDescriptionstatic enum
Represents a Delta action. -
Method Summary
Modifier and TypeMethodDescriptionstatic List<FileStatus>
getCommitFilesForVersionRange
(Engine engine, Path tablePath, long startVersion, long endVersion) For a table get the list of commit log files for the provided version range.static CloseableIterator<FileStatus>
listDeltaLogFilesAsIter
(Engine engine, Set<FileNames.DeltaLogFileType> fileTypes, Path tablePath, long startVersion, Optional<Long> endVersionOpt, boolean mustBeRecreatable) Returns aCloseableIterator
of files of type $fileTypes in the _delta_log directory of the given $tablePath, in increasing order from $startVersion to the optional $endVersion.static CloseableIterator<ColumnarBatch>
readCommitFiles
(Engine engine, List<FileStatus> commitFiles, StructType readSchema) Read the given commitFiles and return the contents as an iterator of batches.
-
Method Details
-
getCommitFilesForVersionRange
public static List<FileStatus> getCommitFilesForVersionRange(Engine engine, Path tablePath, long startVersion, long endVersion) For a table get the list of commit log files for the provided version range.- Parameters:
tablePath
- path for the given tablestartVersion
- start version of the range (inclusive)endVersion
- end version of the range (inclusive)- Returns:
- the list of commit files in increasing order between startVersion and endVersion
- Throws:
TableNotFoundException
- if the table does not exist or if it is not a delta tableKernelException
- if a commit file does not exist for any of the versions in the provided rangeKernelException
- if provided an invalid version range
-
readCommitFiles
public static CloseableIterator<ColumnarBatch> readCommitFiles(Engine engine, List<FileStatus> commitFiles, StructType readSchema) Read the given commitFiles and return the contents as an iterator of batches. Also adds two columns "version" and "timestamp" that store the commit version and timestamp for the commit file that the batch was read from. The "version" and "timestamp" columns are the first and second columns in the returned schema respectively and both ofLongType
- Parameters:
commitFiles
- list of delta commit files to readreadSchema
- JSON schema to read- Returns:
- an iterator over the contents of the files in the same order as the provided files
-
listDeltaLogFilesAsIter
public static CloseableIterator<FileStatus> listDeltaLogFilesAsIter(Engine engine, Set<FileNames.DeltaLogFileType> fileTypes, Path tablePath, long startVersion, Optional<Long> endVersionOpt, boolean mustBeRecreatable) Returns aCloseableIterator
of files of type $fileTypes in the _delta_log directory of the given $tablePath, in increasing order from $startVersion to the optional $endVersion.- Throws:
TableNotFoundException
- if the table or its _delta_log does not existKernelException
- if mustBeRecreatable is true, endVersionOpt is present, and the _delta_log history has been truncated so that we cannot load the desired end version
-