a list of paths to scan
as set of options to control discovery
an optional partition schema that will be use to provide types for the discovered partitions
if true, return empty file list when encountering a FileNotFoundException in file listing. Note that this is a hack for SPARK-16313. We should get rid of this flag in the future.
Returns all the valid files.
Returns all the valid files.
Returns all valid files grouped into partitions when the data is partitioned.
Returns all valid files grouped into partitions when the data is partitioned. If the data is unpartitioned, this will return a single partition with no partition values.
The filters used to prune which partitions are returned. These filters must
only refer to partition columns and this method will only return files
where these predicates are guaranteed to evaluate to true
. Thus, these
filters will not need to be evaluated again on the returned data.
List leaf files of given paths.
List leaf files of given paths. This method will submit a Spark job to do parallel listing whenever there is a path having more files than the parallel partition discovery discovery threshold.
This is publicly visible for testing.
Returns the specification of the partitions inferred from the data.
Returns the specification of the partitions inferred from the data.
a list of paths to scan
a list of paths to scan
Refresh the file listing
Refresh the file listing
A FileCatalog that generates the list of files to process by recursively listing all the files present in
paths
.