org.apache.spark.sql.execution.datasources

ListingFileCatalog

class ListingFileCatalog extends PartitioningAwareFileCatalog

A FileCatalog that generates the list of files to process by recursively listing all the files present in paths.

Linear Supertypes
Ordering
  1. Alphabetic
  2. By inheritance
Inherited
  1. ListingFileCatalog
  2. PartitioningAwareFileCatalog
  3. Logging
  4. FileCatalog
  5. AnyRef
  6. Any
  1. Hide All
  2. Show all
Learn more about member selection
Visibility
  1. Public
  2. All

Instance Constructors

  1. new ListingFileCatalog(sparkSession: SparkSession, paths: Seq[Path], parameters: Map[String, String], partitionSchema: Option[StructType], ignoreFileNotFound: Boolean = false)

    paths

    a list of paths to scan

    parameters

    as set of options to control discovery

    partitionSchema

    an optional partition schema that will be use to provide types for the discovered partitions

    ignoreFileNotFound

    if true, return empty file list when encountering a FileNotFoundException in file listing. Note that this is a hack for SPARK-16313. We should get rid of this flag in the future.

Value Members

  1. final def !=(arg0: AnyRef): Boolean

    Definition Classes
    AnyRef
  2. final def !=(arg0: Any): Boolean

    Definition Classes
    Any
  3. final def ##(): Int

    Definition Classes
    AnyRef → Any
  4. final def ==(arg0: AnyRef): Boolean

    Definition Classes
    AnyRef
  5. final def ==(arg0: Any): Boolean

    Definition Classes
    Any
  6. def allFiles(): Seq[FileStatus]

    Returns all the valid files.

    Returns all the valid files.

    Definition Classes
    PartitioningAwareFileCatalogFileCatalog
  7. final def asInstanceOf[T0]: T0

    Definition Classes
    Any
  8. def clone(): AnyRef

    Attributes
    protected[java.lang]
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  9. final def eq(arg0: AnyRef): Boolean

    Definition Classes
    AnyRef
  10. def equals(other: Any): Boolean

    Definition Classes
    ListingFileCatalog → AnyRef → Any
  11. def finalize(): Unit

    Attributes
    protected[java.lang]
    Definition Classes
    AnyRef
    Annotations
    @throws( classOf[java.lang.Throwable] )
  12. final def getClass(): Class[_]

    Definition Classes
    AnyRef → Any
  13. val hadoopConf: Configuration

    Attributes
    protected
    Definition Classes
    PartitioningAwareFileCatalog
  14. def hashCode(): Int

    Definition Classes
    ListingFileCatalog → AnyRef → Any
  15. def inferPartitioning(): PartitionSpec

    Attributes
    protected
    Definition Classes
    PartitioningAwareFileCatalog
  16. def initializeLogIfNecessary(isInterpreter: Boolean): Unit

    Attributes
    protected
    Definition Classes
    Logging
  17. final def isInstanceOf[T0]: Boolean

    Definition Classes
    Any
  18. def isTraceEnabled(): Boolean

    Attributes
    protected
    Definition Classes
    Logging
  19. def leafDirToChildrenFiles: Map[Path, Array[FileStatus]]

    Attributes
    protected
    Definition Classes
    ListingFileCatalogPartitioningAwareFileCatalog
  20. def leafFiles: LinkedHashMap[Path, FileStatus]

    Attributes
    protected
    Definition Classes
    ListingFileCatalogPartitioningAwareFileCatalog
  21. def listFiles(filters: Seq[Expression]): Seq[Partition]

    Returns all valid files grouped into partitions when the data is partitioned.

    Returns all valid files grouped into partitions when the data is partitioned. If the data is unpartitioned, this will return a single partition with no partition values.

    filters

    The filters used to prune which partitions are returned. These filters must only refer to partition columns and this method will only return files where these predicates are guaranteed to evaluate to true. Thus, these filters will not need to be evaluated again on the returned data.

    Definition Classes
    PartitioningAwareFileCatalogFileCatalog
  22. def listLeafFiles(paths: Seq[Path]): LinkedHashSet[FileStatus]

    List leaf files of given paths.

    List leaf files of given paths. This method will submit a Spark job to do parallel listing whenever there is a path having more files than the parallel partition discovery discovery threshold.

    This is publicly visible for testing.

  23. def log: Logger

    Attributes
    protected
    Definition Classes
    Logging
  24. def logDebug(msg: ⇒ String, throwable: Throwable): Unit

    Attributes
    protected
    Definition Classes
    Logging
  25. def logDebug(msg: ⇒ String): Unit

    Attributes
    protected
    Definition Classes
    Logging
  26. def logError(msg: ⇒ String, throwable: Throwable): Unit

    Attributes
    protected
    Definition Classes
    Logging
  27. def logError(msg: ⇒ String): Unit

    Attributes
    protected
    Definition Classes
    Logging
  28. def logInfo(msg: ⇒ String, throwable: Throwable): Unit

    Attributes
    protected
    Definition Classes
    Logging
  29. def logInfo(msg: ⇒ String): Unit

    Attributes
    protected
    Definition Classes
    Logging
  30. def logName: String

    Attributes
    protected
    Definition Classes
    Logging
  31. def logTrace(msg: ⇒ String, throwable: Throwable): Unit

    Attributes
    protected
    Definition Classes
    Logging
  32. def logTrace(msg: ⇒ String): Unit

    Attributes
    protected
    Definition Classes
    Logging
  33. def logWarning(msg: ⇒ String, throwable: Throwable): Unit

    Attributes
    protected
    Definition Classes
    Logging
  34. def logWarning(msg: ⇒ String): Unit

    Attributes
    protected
    Definition Classes
    Logging
  35. final def ne(arg0: AnyRef): Boolean

    Definition Classes
    AnyRef
  36. final def notify(): Unit

    Definition Classes
    AnyRef
  37. final def notifyAll(): Unit

    Definition Classes
    AnyRef
  38. def partitionSpec(): PartitionSpec

    Returns the specification of the partitions inferred from the data.

    Returns the specification of the partitions inferred from the data.

    Definition Classes
    ListingFileCatalogFileCatalog
  39. val paths: Seq[Path]

    a list of paths to scan

    a list of paths to scan

    Definition Classes
    ListingFileCatalogFileCatalog
  40. def refresh(): Unit

    Refresh the file listing

    Refresh the file listing

    Definition Classes
    ListingFileCatalogFileCatalog
  41. final def synchronized[T0](arg0: ⇒ T0): T0

    Definition Classes
    AnyRef
  42. def toString(): String

    Definition Classes
    AnyRef → Any
  43. final def wait(): Unit

    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  44. final def wait(arg0: Long, arg1: Int): Unit

    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  45. final def wait(arg0: Long): Unit

    Definition Classes
    AnyRef
    Annotations
    @throws( ... )

Inherited from Logging

Inherited from FileCatalog

Inherited from AnyRef

Inherited from Any

Ungrouped