Class InternalScanFileUtils

Object
io.delta.kernel.internal.InternalScanFileUtils

public class InternalScanFileUtils extends Object
Utilities to extract information out of the scan file rows returned by Scan.getScanFiles(Engine).
  • Field Details

    • ADD_FILE_PARTITION_COL_REF

      public static final Column ADD_FILE_PARTITION_COL_REF
      Column expression referring to the `partitionValues` in scan `add` file.
    • TABLE_ROOT_STRUCT_FIELD

      public static StructField TABLE_ROOT_STRUCT_FIELD
    • SCAN_FILE_SCHEMA

      public static final StructType SCAN_FILE_SCHEMA
      Schema of the returned scan files. May have an additional column "add.stats" at the end of the "add" columns that is not represented in the schema here. This column is conditionally read when a valid data skipping filter can be generated.
    • SCAN_FILE_SCHEMA_WITH_STATS

      public static final StructType SCAN_FILE_SCHEMA_WITH_STATS
      Schema of the returned scan files when ScanImpl.getScanFiles(Engine, boolean) is called with includeStats=true.
    • ADD_FILE_ORDINAL

      public static final int ADD_FILE_ORDINAL
    • ADD_FILE_STATS_ORDINAL

      public static final int ADD_FILE_STATS_ORDINAL
  • Method Details

    • getAddFileStatus

      public static FileStatus getAddFileStatus(Row scanFileInfo)
      Get the FileStatus of AddFile from given scan file Row. The FileStatus contains file metadata about the file.
      Parameters:
      scanFileInfo - Row representing one scan file.
      Returns:
      a FileStatus object created from the given scan file row.
    • getPartitionValues

      public static Map<String,String> getPartitionValues(Row scanFileInfo)
      Get the partition columns and values belonging to the AddFile from given scan file row.
      Parameters:
      scanFileInfo - Row representing one scan file.
      Returns:
      Map of partition column name to partition column value.
    • generateScanFileRow

      public static Row generateScanFileRow(FileStatus fileStatus)
      Create a scan file row conforming to the schema SCAN_FILE_SCHEMA for given file status. This is used when creating the ScanFile row for reading commit or checkpoint files.
      Parameters:
      fileStatus -
      Returns:
    • getDeletionVectorDescriptorFromRow

      public static DeletionVectorDescriptor getDeletionVectorDescriptorFromRow(Row scanFile)
      Create a DeletionVectorDescriptor from add entry in the given scan file row.
      Parameters:
      scanFile - Row representing one scan file.
      Returns:
    • getPartitionValuesParsedRefInAddFile

      public static Column getPartitionValuesParsedRefInAddFile(String partitionColName)
      Get a references column for given partition column name in partitionValues_parsed column in scan file row.
      Parameters:
      partitionColName - Partition column name
      Returns:
      Column reference
    • getBaseRowId

      public static Optional<Long> getBaseRowId(Row scanFile)
    • getDefaultRowCommitVersion

      public static Optional<Long> getDefaultRowCommitVersion(Row scanFile)