Class

org.apache.spark.sql.execution.datasources

DataSource

Related Doc: package datasources

Permalink

case class DataSource(sparkSession: SparkSession, className: String, paths: Seq[String] = Nil, userSpecifiedSchema: Option[StructType] = None, partitionColumns: Seq[String] = Seq.empty, bucketSpec: Option[BucketSpec] = None, options: Map[String, String] = Map.empty) extends Logging with Product with Serializable

The main class responsible for representing a pluggable Data Source in Spark SQL. In addition to acting as the canonical set of parameters that can describe a Data Source, this class is used to resolve a description to a concrete implementation that can be used in a query plan (either batch or streaming) or to write out data using an external library.

From an end user's perspective a DataSource description can be created explicitly using org.apache.spark.sql.DataFrameReader or CREATE TABLE USING DDL. Additionally, this class is used when resolving a description from a metastore to a concrete implementation.

Many of the arguments to this class are optional, though depending on the specific API being used these optional arguments might be filled in during resolution using either inference or external metadata. For example, when reading a partitioned table from a file system, partition columns will be inferred from the directory layout even if they are not specified.

paths

A list of file system paths that hold data. These will be globbed before and qualified. This option only works when reading from a FileFormat.

userSpecifiedSchema

An optional specification of the schema of the data. When present we skip attempting to infer the schema.

partitionColumns

A list of column names that the relation is partitioned by. When this list is empty, the relation is unpartitioned.

bucketSpec

An optional specification for bucketing (hash-partitioning) of the data.

Linear Supertypes
Serializable, Serializable, Product, Equals, Logging, AnyRef, Any
Ordering
  1. Alphabetic
  2. By Inheritance
Inherited
  1. DataSource
  2. Serializable
  3. Serializable
  4. Product
  5. Equals
  6. Logging
  7. AnyRef
  8. Any
  1. Hide All
  2. Show All
Visibility
  1. Public
  2. All

Instance Constructors

  1. new DataSource(sparkSession: SparkSession, className: String, paths: Seq[String] = Nil, userSpecifiedSchema: Option[StructType] = None, partitionColumns: Seq[String] = Seq.empty, bucketSpec: Option[BucketSpec] = None, options: Map[String, String] = Map.empty)

    Permalink

    paths

    A list of file system paths that hold data. These will be globbed before and qualified. This option only works when reading from a FileFormat.

    userSpecifiedSchema

    An optional specification of the schema of the data. When present we skip attempting to infer the schema.

    partitionColumns

    A list of column names that the relation is partitioned by. When this list is empty, the relation is unpartitioned.

    bucketSpec

    An optional specification for bucketing (hash-partitioning) of the data.

Type Members

  1. case class SourceInfo(name: String, schema: StructType, partitionColumns: Seq[String]) extends Product with Serializable

    Permalink

Value Members

  1. final def !=(arg0: Any): Boolean

    Permalink
    Definition Classes
    AnyRef → Any
  2. final def ##(): Int

    Permalink
    Definition Classes
    AnyRef → Any
  3. final def ==(arg0: Any): Boolean

    Permalink
    Definition Classes
    AnyRef → Any
  4. final def asInstanceOf[T0]: T0

    Permalink
    Definition Classes
    Any
  5. val bucketSpec: Option[BucketSpec]

    Permalink

    An optional specification for bucketing (hash-partitioning) of the data.

  6. val className: String

    Permalink
  7. def clone(): AnyRef

    Permalink
    Attributes
    protected[java.lang]
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  8. def createSink(outputMode: OutputMode): Sink

    Permalink

    Returns a sink that can be used to continually write data.

  9. def createSource(metadataPath: String): Source

    Permalink

    Returns a source that can be used to continually read data.

  10. final def eq(arg0: AnyRef): Boolean

    Permalink
    Definition Classes
    AnyRef
  11. def finalize(): Unit

    Permalink
    Attributes
    protected[java.lang]
    Definition Classes
    AnyRef
    Annotations
    @throws( classOf[java.lang.Throwable] )
  12. final def getClass(): Class[_]

    Permalink
    Definition Classes
    AnyRef → Any
  13. def hasMetadata(path: Seq[String]): Boolean

    Permalink

    Returns true if there is a single path that has a metadata log indicating which files should be read.

  14. def initializeLogIfNecessary(isInterpreter: Boolean): Unit

    Permalink
    Attributes
    protected
    Definition Classes
    Logging
  15. final def isDebugEnabled: Boolean

    Permalink
    Attributes
    protected
    Definition Classes
    Logging
  16. final def isInfoEnabled: Boolean

    Permalink
    Attributes
    protected
    Definition Classes
    Logging
  17. final def isInstanceOf[T0]: Boolean

    Permalink
    Definition Classes
    Any
  18. final def isTraceEnabled: Boolean

    Permalink
    Attributes
    protected
    Definition Classes
    Logging
  19. def log: Logger

    Permalink
    Attributes
    protected
    Definition Classes
    Logging
  20. def logDebug(msg: ⇒ String, throwable: Throwable): Unit

    Permalink
    Attributes
    protected
    Definition Classes
    Logging
  21. def logDebug(msg: ⇒ String): Unit

    Permalink
    Attributes
    protected
    Definition Classes
    Logging
  22. def logError(msg: ⇒ String, throwable: Throwable): Unit

    Permalink
    Attributes
    protected
    Definition Classes
    Logging
  23. def logError(msg: ⇒ String): Unit

    Permalink
    Attributes
    protected
    Definition Classes
    Logging
  24. def logInfo(msg: ⇒ String, throwable: Throwable): Unit

    Permalink
    Attributes
    protected
    Definition Classes
    Logging
  25. def logInfo(msg: ⇒ String): Unit

    Permalink
    Attributes
    protected
    Definition Classes
    Logging
  26. def logName: String

    Permalink
    Attributes
    protected
    Definition Classes
    Logging
  27. def logTrace(msg: ⇒ String, throwable: Throwable): Unit

    Permalink
    Attributes
    protected
    Definition Classes
    Logging
  28. def logTrace(msg: ⇒ String): Unit

    Permalink
    Attributes
    protected
    Definition Classes
    Logging
  29. def logWarning(msg: ⇒ String, throwable: Throwable): Unit

    Permalink
    Attributes
    protected
    Definition Classes
    Logging
  30. def logWarning(msg: ⇒ String): Unit

    Permalink
    Attributes
    protected
    Definition Classes
    Logging
  31. final def ne(arg0: AnyRef): Boolean

    Permalink
    Definition Classes
    AnyRef
  32. final def notify(): Unit

    Permalink
    Definition Classes
    AnyRef
  33. final def notifyAll(): Unit

    Permalink
    Definition Classes
    AnyRef
  34. val options: Map[String, String]

    Permalink
  35. val partitionColumns: Seq[String]

    Permalink

    A list of column names that the relation is partitioned by.

    A list of column names that the relation is partitioned by. When this list is empty, the relation is unpartitioned.

  36. val paths: Seq[String]

    Permalink

    A list of file system paths that hold data.

    A list of file system paths that hold data. These will be globbed before and qualified. This option only works when reading from a FileFormat.

  37. lazy val providingClass: Class[_]

    Permalink
  38. def resolveRelation(checkPathExist: Boolean = true): BaseRelation

    Permalink

    Create a resolved BaseRelation that can be used to read data from or write data into this DataSource

    Create a resolved BaseRelation that can be used to read data from or write data into this DataSource

    checkPathExist

    A flag to indicate whether to check the existence of path or not. This flag will be set to false when we create an empty table (the path of the table does not exist).

  39. lazy val sourceInfo: SourceInfo

    Permalink
  40. val sparkSession: SparkSession

    Permalink
  41. final def synchronized[T0](arg0: ⇒ T0): T0

    Permalink
    Definition Classes
    AnyRef
  42. val userSpecifiedSchema: Option[StructType]

    Permalink

    An optional specification of the schema of the data.

    An optional specification of the schema of the data. When present we skip attempting to infer the schema.

  43. final def wait(): Unit

    Permalink
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  44. final def wait(arg0: Long, arg1: Int): Unit

    Permalink
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  45. final def wait(arg0: Long): Unit

    Permalink
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  46. def write(mode: SaveMode, data: DataFrame): BaseRelation

    Permalink

    Writes the give DataFrame out to this DataSource.

Inherited from Serializable

Inherited from Serializable

Inherited from Product

Inherited from Equals

Inherited from Logging

Inherited from AnyRef

Inherited from Any

Ungrouped