Package io.delta.kernel
Interface Scan
- All Known Implementing Classes:
ScanImpl
Represents a scan of a Delta table.
- Since:
- 3.0.0
-
Method Summary
Modifier and TypeMethodDescriptionGet the remaining filter that is not guaranteed to be satisfied for the data Delta Kernel returns.getScanFiles
(Engine engine) Get an iterator of data files to scan.getScanState
(Engine engine) Get the scan state associated with the current scan.transformPhysicalData
(Engine engine, Row scanState, Row scanFile, CloseableIterator<ColumnarBatch> physicalDataIter) Transform the physical data read from the table data file into the logical data that expected out of the Delta table.
-
Method Details
-
getScanFiles
Get an iterator of data files to scan.- Parameters:
engine
-Engine
instance to use in Delta Kernel.- Returns:
- iterator of
FilteredColumnarBatch
s where each selected row in the batch corresponds to one scan file. Schema of each row is defined as follows:-
- name:
add
, type:struct
- Description: Represents `AddFile` DeltaLog action
-
- name:
path
, type:string
, description: location of the file. The path is a URI as specified by RFC 2396 URI Generic Syntax, which needs to be decoded to get the data file path. - name:
partitionValues
, type:map(string, string)
, description: A map from partition column to value for this logical file. - name:
size
, type:long
, description: size of the file. - name:
modificationTime
, type:log
, description: the time this logical file was created, as milliseconds since the epoch. - name:
dataChange
, type:boolean
, description: When false the logical file must already be present in the table or the records in the added file must be contained in one or more remove actions in the same version - name:
deletionVector
, type:string
, description: Either null (or absent in JSON) when no DV is associated with this data file, or a struct (described below) that contains necessary information about the DV that is part of this logical file. For description of each member variable in `deletionVector` @see Protocol- name:
storageType
, type:string
- name:
pathOrInlineDv
, type:string
, description: The path is a URI as specified by RFC 2396 URI Generic Syntax, which needs to be decoded to get the data file path. - name:
offset
, type:log
- name:
sizeInBytes
, type:log
- name:
cardinality
, type:log
- name:
- name:
tags
, type:map(string, string)
, description: Map containing metadata about the scan file.
- name:
- name:
-
- name:
tableRoot
, type:string
- Description: Absolute path of the table location. The path is a URI as specified by RFC 2396 URI Generic Syntax, which needs to be decode to get the data file path. NOTE: this is temporary. Will be removed in future.
- name:
-
- See Also:
-
getRemainingFilter
Get the remaining filter that is not guaranteed to be satisfied for the data Delta Kernel returns. This filter is used by Delta Kernel to do data skipping when possible.- Returns:
- the remaining filter as a
Predicate
.
-
getScanState
Get the scan state associated with the current scan. This state is common across all files in the scan to be read. -
transformPhysicalData
static CloseableIterator<FilteredColumnarBatch> transformPhysicalData(Engine engine, Row scanState, Row scanFile, CloseableIterator<ColumnarBatch> physicalDataIter) throws IOException Transform the physical data read from the table data file into the logical data that expected out of the Delta table.- Parameters:
engine
- Connector providedEngine
implementation.scanState
- Scan state returned bygetScanState(Engine)
scanFile
- Scan file from where the physical dataphysicalDataIter
is read from.physicalDataIter
- Iterator ofColumnarBatch
s containing the physical data read from thescanFile
.- Returns:
- Data read from the input scan files as an iterator of
FilteredColumnarBatch
s. EachFilteredColumnarBatch
instance contains the data read and an optional selection vector that indicates data rows as valid or invalid. It is the responsibility of the caller to close this iterator. - Throws:
IOException
- when error occurs while reading the data.
-