the list of root table paths to scan (some of which might be filtered out later)
as set of options to control discovery
an optional partition schema that will be use to provide types for the discovered partitions
Returns the list of files that will be read when scanning this relation.
Returns the list of files that will be read when scanning this relation.
Returns all valid files grouped into partitions when the data is partitioned.
Returns all valid files grouped into partitions when the data is partitioned. If the data is unpartitioned, this will return a single partition with no partition values.
The filters used to prune which partitions are returned. These filters
must only refer to partition columns and this method will only return
files where these predicates are guaranteed to evaluate to true
.
Thus, these filters will not need to be evaluated again on the
returned data.
Filters that can be applied on non-partitioned columns. The implementation does not need to guarantee these filters are applied, i.e. the execution engine will ensure these filters are still applied on the returned files.
List leaf files of given paths.
List leaf files of given paths. This method will submit a Spark job to do parallel listing whenever there is a path having more files than the parallel partition discovery discovery threshold.
This is publicly visible for testing.
Returns an optional metadata operation time, in nanoseconds, for listing files.
Returns an optional metadata operation time, in nanoseconds, for listing files.
We do file listing in query optimization (in order to get the proper statistics) and we want to account for file listing time in physical execution (as metrics). To do that, we save the file listing time in some implementations and physical execution calls it in this method to update the metrics.
Returns the specification of the partitions inferred from the data.
Returns the specification of the partitions inferred from the data.
Refresh any cached file listings
Refresh any cached file listings
Returns the list of root input paths from which the catalog will get files.
Returns the list of root input paths from which the catalog will get files. There may be a single root path from which partitions are discovered, or individual partitions may be specified by each path.
Sum of table file sizes, in bytes
Sum of table file sizes, in bytes
A FileIndex that generates the list of files to process by recursively listing all the files present in
paths
.