Class ScanBuilderImpl

Object
io.delta.kernel.internal.ScanBuilderImpl
All Implemented Interfaces:
ScanBuilder

public class ScanBuilderImpl extends Object implements ScanBuilder
Implementation of ScanBuilder.
  • Constructor Details

  • Method Details

    • withFilter

      public ScanBuilder withFilter(Predicate predicate)
      Description copied from interface: ScanBuilder
      Apply the given filter expression to prune any files that do not possibly contain the data that satisfies the given filter.

      Kernel makes use of the scan file partition values (for partitioned tables) and file-level column statistics (min, max, null count etc.) in the Delta metadata for filtering. Sometimes these metadata is not enough to deterministically say a scan file doesn't contain data that satisfies the filter.

      E.g. given filter is a = 2. In file A, column a has min value as -40 and max value as 200. In file B, column a has min value as 78 and max value as 323. File B can be ruled out as it cannot possibly have rows where `a = 2`, but file A cannot be ruled out as it may contain rows where a = 2.

      As filtering is a best effort, the Scan object may return scan files (through Scan.getScanFiles(Engine)) that does not satisfy the filter. It is the responsibility of the caller to apply the remaining filter returned by Scan.getRemainingFilter() to the data read from the scan files (returned by Scan.getScanFiles(Engine)) to completely filter out the data that doesn't satisfy the filter.```

      Specified by:
      withFilter in interface ScanBuilder
      Parameters:
      predicate - a Predicate to prune the metadata or data.
      Returns:
      A ScanBuilder with filter applied.
    • withReadSchema

      public ScanBuilder withReadSchema(StructType readSchema)
      Description copied from interface: ScanBuilder
      Apply the given readSchema. If the builder already has a projection applied, calling this again replaces the existing projection.
      Specified by:
      withReadSchema in interface ScanBuilder
      Parameters:
      readSchema - Subset of columns to read from the Delta table.
      Returns:
      A ScanBuilder with projection pruning.
    • build

      public Scan build()
      Specified by:
      build in interface ScanBuilder
      Returns:
      Build the instance