Package io.delta.kernel.internal
Class InternalScanFileUtils
Object
io.delta.kernel.internal.InternalScanFileUtils
Utilities to extract information out of the scan file rows returned by
Scan.getScanFiles(Engine)
.-
Field Summary
FieldsModifier and TypeFieldDescriptionstatic final int
static final Column
Column
expression referring to the `partitionValues` in scan `add` file.static final int
static final StructType
Schema of the returned scan files.static final StructType
Schema of the returned scan files whenScanImpl.getScanFiles(Engine, boolean)
is called withincludeStats=true
.static StructField
-
Method Summary
Modifier and TypeMethodDescriptionstatic Row
generateScanFileRow
(FileStatus fileStatus) Create a scan file row conforming to the schemaSCAN_FILE_SCHEMA
for given file status.static FileStatus
getAddFileStatus
(Row scanFileInfo) getBaseRowId
(Row scanFile) getDefaultRowCommitVersion
(Row scanFile) static DeletionVectorDescriptor
getDeletionVectorDescriptorFromRow
(Row scanFile) Create aDeletionVectorDescriptor
fromadd
entry in the given scan file row.getPartitionValues
(Row scanFileInfo) Get the partition columns and values belonging to theAddFile
from given scan file row.static Column
getPartitionValuesParsedRefInAddFile
(String partitionColName) Get a references column for given partition column name in partitionValues_parsed column in scan file row.
-
Field Details
-
ADD_FILE_PARTITION_COL_REF
Column
expression referring to the `partitionValues` in scan `add` file. -
TABLE_ROOT_STRUCT_FIELD
-
SCAN_FILE_SCHEMA
Schema of the returned scan files. May have an additional column "add.stats" at the end of the "add" columns that is not represented in the schema here. This column is conditionally read when a valid data skipping filter can be generated. -
SCAN_FILE_SCHEMA_WITH_STATS
Schema of the returned scan files whenScanImpl.getScanFiles(Engine, boolean)
is called withincludeStats=true
. -
ADD_FILE_ORDINAL
public static final int ADD_FILE_ORDINAL -
ADD_FILE_STATS_ORDINAL
public static final int ADD_FILE_STATS_ORDINAL
-
-
Method Details
-
getAddFileStatus
Get theFileStatus
ofAddFile
from given scan fileRow
. TheFileStatus
contains file metadata about the file.- Parameters:
scanFileInfo
-Row
representing one scan file.- Returns:
- a
FileStatus
object created from the given scan file row.
-
getPartitionValues
Get the partition columns and values belonging to theAddFile
from given scan file row.- Parameters:
scanFileInfo
-Row
representing one scan file.- Returns:
- Map of partition column name to partition column value.
-
generateScanFileRow
Create a scan file row conforming to the schemaSCAN_FILE_SCHEMA
for given file status. This is used when creating the ScanFile row for reading commit or checkpoint files.- Parameters:
fileStatus
-- Returns:
-
getDeletionVectorDescriptorFromRow
Create aDeletionVectorDescriptor
fromadd
entry in the given scan file row.- Parameters:
scanFile
-Row
representing one scan file.- Returns:
-
getPartitionValuesParsedRefInAddFile
Get a references column for given partition column name in partitionValues_parsed column in scan file row.- Parameters:
partitionColName
- Partition column name- Returns:
Column
reference
-
getBaseRowId
-
getDefaultRowCommitVersion
-