Packages

case class DataSource(sparkSession: SparkSession, className: String, paths: Seq[String] = Nil, userSpecifiedSchema: Option[StructType] = None, partitionColumns: Seq[String] = Seq.empty, bucketSpec: Option[BucketSpec] = None, options: Map[String, String] = Map.empty, catalogTable: Option[CatalogTable] = None) extends SessionStateHelper with Logging with Product with Serializable

The main class responsible for representing a pluggable Data Source in Spark SQL. In addition to acting as the canonical set of parameters that can describe a Data Source, this class is used to resolve a description to a concrete implementation that can be used in a query plan (either batch or streaming) or to write out data using an external library.

From an end user's perspective a DataSource description can be created explicitly using org.apache.spark.sql.DataFrameReader or CREATE TABLE USING DDL. Additionally, this class is used when resolving a description from a metastore to a concrete implementation.

Many of the arguments to this class are optional, though depending on the specific API being used these optional arguments might be filled in during resolution using either inference or external metadata. For example, when reading a partitioned table from a file system, partition columns will be inferred from the directory layout even if they are not specified.

paths

A list of file system paths that hold data. These will be globbed before if the "globPaths" option is true, and will be qualified. This option only works when reading from a FileFormat. These paths are expected to be hadoop Path strings.

userSpecifiedSchema

An optional specification of the schema of the data. When present we skip attempting to infer the schema.

partitionColumns

A list of column names that the relation is partitioned by. This list is generally empty during the read path, unless this DataSource is managed by Hive. In these cases, during resolveRelation, we will call getOrInferFileFormatSchema for file based DataSources to infer the partitioning. In other cases, if this list is empty, then this table is unpartitioned.

bucketSpec

An optional specification for bucketing (hash-partitioning) of the data.

catalogTable

Optional catalog table reference that can be used to push down operations over the datasource to the catalog service.

Linear Supertypes
Serializable, Product, Equals, Logging, SessionStateHelper, AnyRef, Any
Ordering
  1. Alphabetic
  2. By Inheritance
Inherited
  1. DataSource
  2. Serializable
  3. Product
  4. Equals
  5. Logging
  6. SessionStateHelper
  7. AnyRef
  8. Any
  1. Hide All
  2. Show All
Visibility
  1. Public
  2. Protected

Instance Constructors

  1. new DataSource(sparkSession: SparkSession, className: String, paths: Seq[String] = Nil, userSpecifiedSchema: Option[StructType] = None, partitionColumns: Seq[String] = Seq.empty, bucketSpec: Option[BucketSpec] = None, options: Map[String, String] = Map.empty, catalogTable: Option[CatalogTable] = None)

    paths

    A list of file system paths that hold data. These will be globbed before if the "globPaths" option is true, and will be qualified. This option only works when reading from a FileFormat. These paths are expected to be hadoop Path strings.

    userSpecifiedSchema

    An optional specification of the schema of the data. When present we skip attempting to infer the schema.

    partitionColumns

    A list of column names that the relation is partitioned by. This list is generally empty during the read path, unless this DataSource is managed by Hive. In these cases, during resolveRelation, we will call getOrInferFileFormatSchema for file based DataSources to infer the partitioning. In other cases, if this list is empty, then this table is unpartitioned.

    bucketSpec

    An optional specification for bucketing (hash-partitioning) of the data.

    catalogTable

    Optional catalog table reference that can be used to push down operations over the datasource to the catalog service.

Type Members

  1. implicit class LogStringContext extends AnyRef
    Definition Classes
    Logging
  2. case class SourceInfo(name: String, schema: StructType, partitionColumns: Seq[String]) extends Product with Serializable

Value Members

  1. final def !=(arg0: Any): Boolean
    Definition Classes
    AnyRef → Any
  2. final def ##: Int
    Definition Classes
    AnyRef → Any
  3. final def ==(arg0: Any): Boolean
    Definition Classes
    AnyRef → Any
  4. def MDC(key: LogKey, value: Any): MDC
    Attributes
    protected
    Definition Classes
    Logging
  5. final def asInstanceOf[T0]: T0
    Definition Classes
    Any
  6. val bucketSpec: Option[BucketSpec]
  7. val catalogTable: Option[CatalogTable]
  8. val className: String
  9. def clone(): AnyRef
    Attributes
    protected[lang]
    Definition Classes
    AnyRef
    Annotations
    @throws(classOf[java.lang.CloneNotSupportedException]) @IntrinsicCandidate() @native()
  10. def createSink(outputMode: OutputMode): Sink

    Returns a sink that can be used to continually write data.

  11. def createSource(metadataPath: String): Source

    Returns a source that can be used to continually read data.

  12. final def eq(arg0: AnyRef): Boolean
    Definition Classes
    AnyRef
  13. final def getClass(): Class[_ <: AnyRef]
    Definition Classes
    AnyRef → Any
    Annotations
    @IntrinsicCandidate() @native()
  14. def getHadoopConf(sparkSession: SparkSession): Configuration
    Definition Classes
    SessionStateHelper
  15. def getHadoopConf(sparkSession: SparkSession, options: Map[String, String]): Configuration
    Definition Classes
    SessionStateHelper
  16. def getSparkConf(sparkSession: SparkSession): SparkConf
    Definition Classes
    SessionStateHelper
  17. def getSqlConf(sparkSession: SparkSession): SQLConf
    Definition Classes
    SessionStateHelper
  18. def globPaths: Boolean

    Whether or not paths should be globbed before being used to access files.

  19. def initializeLogIfNecessary(isInterpreter: Boolean, silent: Boolean): Boolean
    Attributes
    protected
    Definition Classes
    Logging
  20. def initializeLogIfNecessary(isInterpreter: Boolean): Unit
    Attributes
    protected
    Definition Classes
    Logging
  21. final def isInstanceOf[T0]: Boolean
    Definition Classes
    Any
  22. def isTraceEnabled(): Boolean
    Attributes
    protected
    Definition Classes
    Logging
  23. def log: Logger
    Attributes
    protected
    Definition Classes
    Logging
  24. def logBasedOnLevel(level: Level)(f: => MessageWithContext): Unit
    Attributes
    protected
    Definition Classes
    Logging
  25. def logDebug(msg: => String, throwable: Throwable): Unit
    Attributes
    protected
    Definition Classes
    Logging
  26. def logDebug(entry: LogEntry, throwable: Throwable): Unit
    Attributes
    protected
    Definition Classes
    Logging
  27. def logDebug(entry: LogEntry): Unit
    Attributes
    protected
    Definition Classes
    Logging
  28. def logDebug(msg: => String): Unit
    Attributes
    protected
    Definition Classes
    Logging
  29. def logError(msg: => String, throwable: Throwable): Unit
    Attributes
    protected
    Definition Classes
    Logging
  30. def logError(entry: LogEntry, throwable: Throwable): Unit
    Attributes
    protected
    Definition Classes
    Logging
  31. def logError(entry: LogEntry): Unit
    Attributes
    protected
    Definition Classes
    Logging
  32. def logError(msg: => String): Unit
    Attributes
    protected
    Definition Classes
    Logging
  33. def logInfo(msg: => String, throwable: Throwable): Unit
    Attributes
    protected
    Definition Classes
    Logging
  34. def logInfo(entry: LogEntry, throwable: Throwable): Unit
    Attributes
    protected
    Definition Classes
    Logging
  35. def logInfo(entry: LogEntry): Unit
    Attributes
    protected
    Definition Classes
    Logging
  36. def logInfo(msg: => String): Unit
    Attributes
    protected
    Definition Classes
    Logging
  37. def logName: String
    Attributes
    protected
    Definition Classes
    Logging
  38. def logTrace(msg: => String, throwable: Throwable): Unit
    Attributes
    protected
    Definition Classes
    Logging
  39. def logTrace(entry: LogEntry, throwable: Throwable): Unit
    Attributes
    protected
    Definition Classes
    Logging
  40. def logTrace(entry: LogEntry): Unit
    Attributes
    protected
    Definition Classes
    Logging
  41. def logTrace(msg: => String): Unit
    Attributes
    protected
    Definition Classes
    Logging
  42. def logWarning(msg: => String, throwable: Throwable): Unit
    Attributes
    protected
    Definition Classes
    Logging
  43. def logWarning(entry: LogEntry, throwable: Throwable): Unit
    Attributes
    protected
    Definition Classes
    Logging
  44. def logWarning(entry: LogEntry): Unit
    Attributes
    protected
    Definition Classes
    Logging
  45. def logWarning(msg: => String): Unit
    Attributes
    protected
    Definition Classes
    Logging
  46. final def ne(arg0: AnyRef): Boolean
    Definition Classes
    AnyRef
  47. final def notify(): Unit
    Definition Classes
    AnyRef
    Annotations
    @IntrinsicCandidate() @native()
  48. final def notifyAll(): Unit
    Definition Classes
    AnyRef
    Annotations
    @IntrinsicCandidate() @native()
  49. val options: Map[String, String]
  50. val partitionColumns: Seq[String]
  51. val paths: Seq[String]
  52. def planForWriting(mode: SaveMode, data: LogicalPlan): LogicalPlan

    Returns a logical plan to write the given LogicalPlan out to this DataSource.

  53. def productElementNames: Iterator[String]
    Definition Classes
    Product
  54. lazy val providingClass: Class[_]
  55. def resolveRelation(checkFilesExist: Boolean = true, readOnly: Boolean = false): BaseRelation

    Create a resolved BaseRelation that can be used to read data from or write data into this DataSource

    Create a resolved BaseRelation that can be used to read data from or write data into this DataSource

    checkFilesExist

    Whether to confirm that the files exist when generating the non-streaming file based datasource. StructuredStreaming jobs already list file existence, and when generating incremental jobs, the batch is considered as a non-streaming file based data source. Since we know that files already exist, we don't need to check them again.

  56. def sessionState(sparkSession: SparkSession): SessionState
    Attributes
    protected
    Definition Classes
    SessionStateHelper
  57. lazy val sourceInfo: SourceInfo
  58. val sparkSession: SparkSession
  59. final def synchronized[T0](arg0: => T0): T0
    Definition Classes
    AnyRef
  60. val userSpecifiedSchema: Option[StructType]
  61. final def wait(arg0: Long, arg1: Int): Unit
    Definition Classes
    AnyRef
    Annotations
    @throws(classOf[java.lang.InterruptedException])
  62. final def wait(arg0: Long): Unit
    Definition Classes
    AnyRef
    Annotations
    @throws(classOf[java.lang.InterruptedException]) @native()
  63. final def wait(): Unit
    Definition Classes
    AnyRef
    Annotations
    @throws(classOf[java.lang.InterruptedException])
  64. def withLogContext(context: Map[String, String])(body: => Unit): Unit
    Attributes
    protected
    Definition Classes
    Logging
  65. def writeAndRead(mode: SaveMode, data: LogicalPlan, outputColumnNames: Seq[String]): BaseRelation

    Writes the given LogicalPlan out to this DataSource and returns a BaseRelation for the following reading.

    Writes the given LogicalPlan out to this DataSource and returns a BaseRelation for the following reading.

    mode

    The save mode for this writing.

    data

    The input query plan that produces the data to be written. Note that this plan is analyzed and optimized.

    outputColumnNames

    The original output column names of the input query plan. The optimizer may not preserve the output column's names' case, so we need this parameter instead of data.output.

Deprecated Value Members

  1. def finalize(): Unit
    Attributes
    protected[lang]
    Definition Classes
    AnyRef
    Annotations
    @throws(classOf[java.lang.Throwable]) @Deprecated
    Deprecated

    (Since version 9)

Inherited from Serializable

Inherited from Product

Inherited from Equals

Inherited from Logging

Inherited from SessionStateHelper

Inherited from AnyRef

Inherited from Any

Ungrouped