class SparkSession extends sql.api.SparkSession[Dataset] with Logging
The entry point to programming Spark with the Dataset and DataFrame API.
In environments that this has been created upfront (e.g. REPL, notebooks), use the builder to get an existing session:
SparkSession.builder().getOrCreate()
The builder can also be used to create a new session:
SparkSession.builder .master("local") .appName("Word Count") .config("spark.some.config.option", "some-value") .getOrCreate()
- Self Type
- SparkSession
- Annotations
- @Stable()
- Alphabetic
- By Inheritance
- SparkSession
- Logging
- SparkSession
- Closeable
- AutoCloseable
- Serializable
- AnyRef
- Any
- Hide All
- Show All
- Public
- Protected
Type Members
- implicit class LogStringContext extends AnyRef
- Definition Classes
- Logging
Value Members
- final def !=(arg0: Any): Boolean
- Definition Classes
- AnyRef → Any
- final def ##: Int
- Definition Classes
- AnyRef → Any
- final def ==(arg0: Any): Boolean
- Definition Classes
- AnyRef → Any
- def addArtifact(source: String, target: String): Unit
<invalid inheritdoc annotation>
<invalid inheritdoc annotation>
- Definition Classes
- SparkSession → SparkSession
- Annotations
- @Experimental()
- def addArtifact(bytes: Array[Byte], target: String): Unit
<invalid inheritdoc annotation>
<invalid inheritdoc annotation>
- Definition Classes
- SparkSession → SparkSession
- Annotations
- @Experimental()
- def addArtifact(uri: URI): Unit
<invalid inheritdoc annotation>
<invalid inheritdoc annotation>
- Definition Classes
- SparkSession → SparkSession
- Annotations
- @Experimental()
- def addArtifact(path: String): Unit
<invalid inheritdoc annotation>
<invalid inheritdoc annotation>
- Definition Classes
- SparkSession → SparkSession
- Annotations
- @Experimental()
- def addArtifacts(uri: URI*): Unit
<invalid inheritdoc annotation>
<invalid inheritdoc annotation>
- Definition Classes
- SparkSession → SparkSession
- Annotations
- @Experimental() @varargs()
- final def asInstanceOf[T0]: T0
- Definition Classes
- Any
- def baseRelationToDataFrame(baseRelation: BaseRelation): DataFrame
Convert a
BaseRelation
created for external data sources into aDataFrame
.Convert a
BaseRelation
created for external data sources into aDataFrame
.- Since
2.0.0
- lazy val catalog: Catalog
<invalid inheritdoc annotation>
<invalid inheritdoc annotation>
- Definition Classes
- SparkSession → SparkSession
- Annotations
- @transient()
- def clone(): AnyRef
- Attributes
- protected[lang]
- Definition Classes
- AnyRef
- Annotations
- @throws(classOf[java.lang.CloneNotSupportedException]) @IntrinsicCandidate() @native()
- def close(): Unit
Stop the underlying
SparkContext
.Stop the underlying
SparkContext
.- Definition Classes
- SparkSession → Closeable → AutoCloseable
- Since
2.1.0
- lazy val conf: RuntimeConfig
Runtime configuration interface for Spark.
Runtime configuration interface for Spark.
This is the interface through which the user can get and set all Spark and Hadoop configurations that are relevant to Spark SQL. When getting the value of a config, this defaults to the value set in the underlying
SparkContext
, if any.- Annotations
- @transient()
- Since
2.0.0
- def createDataFrame(data: List[_], beanClass: Class[_]): DataFrame
<invalid inheritdoc annotation>
<invalid inheritdoc annotation>
- Definition Classes
- SparkSession → SparkSession
- def createDataFrame(rdd: JavaRDD[_], beanClass: Class[_]): DataFrame
Applies a schema to an RDD of Java Beans.
Applies a schema to an RDD of Java Beans.
WARNING: Since there is no guaranteed ordering for fields in a Java Bean, SELECT * queries will return the columns in an undefined order.
- Since
2.0.0
- def createDataFrame(rdd: RDD[_], beanClass: Class[_]): DataFrame
Applies a schema to an RDD of Java Beans.
Applies a schema to an RDD of Java Beans.
WARNING: Since there is no guaranteed ordering for fields in a Java Bean, SELECT * queries will return the columns in an undefined order.
- Since
2.0.0
- def createDataFrame(rows: List[Row], schema: StructType): DataFrame
<invalid inheritdoc annotation>
<invalid inheritdoc annotation>
- Definition Classes
- SparkSession → SparkSession
- Annotations
- @DeveloperApi()
- def createDataFrame(rowRDD: JavaRDD[Row], schema: StructType): DataFrame
:: DeveloperApi :: Creates a
DataFrame
from aJavaRDD
containing Rows using the given schema.:: DeveloperApi :: Creates a
DataFrame
from aJavaRDD
containing Rows using the given schema. It is important to make sure that the structure of every Row of the provided RDD matches the provided schema. Otherwise, there will be runtime exception.- Annotations
- @DeveloperApi()
- Since
2.0.0
- def createDataFrame(rowRDD: RDD[Row], schema: StructType): DataFrame
:: DeveloperApi :: Creates a
DataFrame
from anRDD
containing Rows using the given schema.:: DeveloperApi :: Creates a
DataFrame
from anRDD
containing Rows using the given schema. It is important to make sure that the structure of every Row of the provided RDD matches the provided schema. Otherwise, there will be runtime exception. Example:import org.apache.spark.sql._ import org.apache.spark.sql.types._ val sparkSession = new org.apache.spark.sql.SparkSession(sc) val schema = StructType( StructField("name", StringType, false) :: StructField("age", IntegerType, true) :: Nil) val people = sc.textFile("examples/src/main/resources/people.txt").map( _.split(",")).map(p => Row(p(0), p(1).trim.toInt)) val dataFrame = sparkSession.createDataFrame(people, schema) dataFrame.printSchema // root // |-- name: string (nullable = false) // |-- age: integer (nullable = true) dataFrame.createOrReplaceTempView("people") sparkSession.sql("select name from people").collect.foreach(println)
- Annotations
- @DeveloperApi()
- Since
2.0.0
- def createDataFrame[A <: Product](data: Seq[A])(implicit arg0: scala.reflect.api.JavaUniverse.TypeTag[A]): DataFrame
<invalid inheritdoc annotation>
<invalid inheritdoc annotation>
- Definition Classes
- SparkSession → SparkSession
- def createDataFrame[A <: Product](rdd: RDD[A])(implicit arg0: scala.reflect.api.JavaUniverse.TypeTag[A]): DataFrame
Creates a
DataFrame
from an RDD of Product (e.g.Creates a
DataFrame
from an RDD of Product (e.g. case classes, tuples).- Since
2.0.0
- def createDataset[T](data: List[T])(implicit arg0: Encoder[T]): Dataset[T]
<invalid inheritdoc annotation>
<invalid inheritdoc annotation>
- Definition Classes
- SparkSession → SparkSession
- def createDataset[T](data: RDD[T])(implicit arg0: Encoder[T]): Dataset[T]
Creates a Dataset from an RDD of a given type.
Creates a Dataset from an RDD of a given type. This method requires an encoder (to convert a JVM object of type
T
to and from the internal Spark SQL representation) that is generally created automatically through implicits from aSparkSession
, or can be created explicitly by calling static methods on Encoders.- Since
2.0.0
- def createDataset[T](data: Seq[T])(implicit arg0: Encoder[T]): Dataset[T]
<invalid inheritdoc annotation>
<invalid inheritdoc annotation>
- Definition Classes
- SparkSession → SparkSession
- def dataSource: DataSourceRegistration
A collection of methods for registering user-defined data sources.
A collection of methods for registering user-defined data sources.
- Annotations
- @Experimental() @Unstable()
- Since
4.0.0
- lazy val emptyDataFrame: DataFrame
<invalid inheritdoc annotation>
<invalid inheritdoc annotation>
- Definition Classes
- SparkSession → SparkSession
- Annotations
- @transient()
- def emptyDataset[T](implicit arg0: Encoder[T]): Dataset[T]
<invalid inheritdoc annotation>
<invalid inheritdoc annotation>
- Definition Classes
- SparkSession → SparkSession
- final def eq(arg0: AnyRef): Boolean
- Definition Classes
- AnyRef
- def equals(arg0: AnyRef): Boolean
- Definition Classes
- AnyRef → Any
- def executeCommand(runner: String, command: String, options: Map[String, String]): DataFrame
Execute an arbitrary string command inside an external execution engine rather than Spark.
Execute an arbitrary string command inside an external execution engine rather than Spark. This could be useful when user wants to execute some commands out of Spark. For example, executing custom DDL/DML command for JDBC, creating index for ElasticSearch, creating cores for Solr and so on.
The command will be eagerly executed after this method is called and the returned DataFrame will contain the output of the command(if any).
- runner
The class name of the runner that implements
ExternalCommandRunner
.- command
The target command to be executed
- options
The options for the runner.
- Annotations
- @Unstable()
- Since
3.0.0
- def experimental: ExperimentalMethods
:: Experimental :: A collection of methods that are considered experimental, but can be used to hook into the query planner for advanced functionality.
:: Experimental :: A collection of methods that are considered experimental, but can be used to hook into the query planner for advanced functionality.
- Annotations
- @Experimental() @Unstable()
- Since
2.0.0
- final def getClass(): Class[_ <: AnyRef]
- Definition Classes
- AnyRef → Any
- Annotations
- @IntrinsicCandidate() @native()
- def hashCode(): Int
- Definition Classes
- AnyRef → Any
- Annotations
- @IntrinsicCandidate() @native()
- def initializeLogIfNecessary(isInterpreter: Boolean, silent: Boolean): Boolean
- Attributes
- protected
- Definition Classes
- Logging
- def initializeLogIfNecessary(isInterpreter: Boolean): Unit
- Attributes
- protected
- Definition Classes
- Logging
- final def isInstanceOf[T0]: Boolean
- Definition Classes
- Any
- def isTraceEnabled(): Boolean
- Attributes
- protected
- Definition Classes
- Logging
- def listenerManager: ExecutionListenerManager
An interface to register custom org.apache.spark.sql.util.QueryExecutionListeners that listen for execution metrics.
An interface to register custom org.apache.spark.sql.util.QueryExecutionListeners that listen for execution metrics.
- Since
2.0.0
- def log: Logger
- Attributes
- protected
- Definition Classes
- Logging
- def logDebug(msg: => String, throwable: Throwable): Unit
- Attributes
- protected
- Definition Classes
- Logging
- def logDebug(entry: LogEntry, throwable: Throwable): Unit
- Attributes
- protected
- Definition Classes
- Logging
- def logDebug(entry: LogEntry): Unit
- Attributes
- protected
- Definition Classes
- Logging
- def logDebug(msg: => String): Unit
- Attributes
- protected
- Definition Classes
- Logging
- def logError(msg: => String, throwable: Throwable): Unit
- Attributes
- protected
- Definition Classes
- Logging
- def logError(entry: LogEntry, throwable: Throwable): Unit
- Attributes
- protected
- Definition Classes
- Logging
- def logError(entry: LogEntry): Unit
- Attributes
- protected
- Definition Classes
- Logging
- def logError(msg: => String): Unit
- Attributes
- protected
- Definition Classes
- Logging
- def logInfo(msg: => String, throwable: Throwable): Unit
- Attributes
- protected
- Definition Classes
- Logging
- def logInfo(entry: LogEntry, throwable: Throwable): Unit
- Attributes
- protected
- Definition Classes
- Logging
- def logInfo(entry: LogEntry): Unit
- Attributes
- protected
- Definition Classes
- Logging
- def logInfo(msg: => String): Unit
- Attributes
- protected
- Definition Classes
- Logging
- def logName: String
- Attributes
- protected
- Definition Classes
- Logging
- def logTrace(msg: => String, throwable: Throwable): Unit
- Attributes
- protected
- Definition Classes
- Logging
- def logTrace(entry: LogEntry, throwable: Throwable): Unit
- Attributes
- protected
- Definition Classes
- Logging
- def logTrace(entry: LogEntry): Unit
- Attributes
- protected
- Definition Classes
- Logging
- def logTrace(msg: => String): Unit
- Attributes
- protected
- Definition Classes
- Logging
- def logWarning(msg: => String, throwable: Throwable): Unit
- Attributes
- protected
- Definition Classes
- Logging
- def logWarning(entry: LogEntry, throwable: Throwable): Unit
- Attributes
- protected
- Definition Classes
- Logging
- def logWarning(entry: LogEntry): Unit
- Attributes
- protected
- Definition Classes
- Logging
- def logWarning(msg: => String): Unit
- Attributes
- protected
- Definition Classes
- Logging
- final def ne(arg0: AnyRef): Boolean
- Definition Classes
- AnyRef
- def newSession(): SparkSession
<invalid inheritdoc annotation>
<invalid inheritdoc annotation>
- Definition Classes
- SparkSession → SparkSession
- final def notify(): Unit
- Definition Classes
- AnyRef
- Annotations
- @IntrinsicCandidate() @native()
- final def notifyAll(): Unit
- Definition Classes
- AnyRef
- Annotations
- @IntrinsicCandidate() @native()
- def parseDataType(dataTypeString: String): DataType
Parses the data type in our internal string representation.
Parses the data type in our internal string representation. The data type string should have the same format as the one generated by
toString
in scala. It is only used by PySpark.- Attributes
- protected[sql]
- def range(start: Long, end: Long, step: Long, numPartitions: Int): Dataset[Long]
<invalid inheritdoc annotation>
<invalid inheritdoc annotation>
- Definition Classes
- SparkSession → SparkSession
- def range(start: Long, end: Long, step: Long): Dataset[Long]
<invalid inheritdoc annotation>
<invalid inheritdoc annotation>
- Definition Classes
- SparkSession → SparkSession
- def range(start: Long, end: Long): Dataset[Long]
<invalid inheritdoc annotation>
<invalid inheritdoc annotation>
- Definition Classes
- SparkSession → SparkSession
- def range(end: Long): Dataset[Long]
<invalid inheritdoc annotation>
<invalid inheritdoc annotation>
- Definition Classes
- SparkSession → SparkSession
- def read: DataFrameReader
<invalid inheritdoc annotation>
<invalid inheritdoc annotation>
- Definition Classes
- SparkSession → SparkSession
- def readStream: DataStreamReader
Returns a
DataStreamReader
that can be used to read streaming data in as aDataFrame
.Returns a
DataStreamReader
that can be used to read streaming data in as aDataFrame
.sparkSession.readStream.parquet("/path/to/directory/of/parquet/files") sparkSession.readStream.schema(schema).json("/path/to/directory/of/json/files")
- Since
2.0.0
- lazy val sessionState: SessionState
State isolated across sessions, including SQL configurations, temporary tables, registered functions, and everything else that accepts a org.apache.spark.sql.internal.SQLConf.
State isolated across sessions, including SQL configurations, temporary tables, registered functions, and everything else that accepts a org.apache.spark.sql.internal.SQLConf. If
parentSessionState
is not null, theSessionState
will be a copy of the parent.This is internal to Spark and there is no guarantee on interface stability.
- Annotations
- @Unstable() @transient()
- Since
2.2.0
- lazy val sharedState: SharedState
State shared across sessions, including the
SparkContext
, cached data, listener, and a catalog that interacts with external systems.State shared across sessions, including the
SparkContext
, cached data, listener, and a catalog that interacts with external systems.This is internal to Spark and there is no guarantee on interface stability.
- Annotations
- @Unstable() @transient()
- Since
2.2.0
- val sparkContext: SparkContext
- def sql(sqlText: String): DataFrame
<invalid inheritdoc annotation>
<invalid inheritdoc annotation>
- Definition Classes
- SparkSession → SparkSession
- def sql(sqlText: String, args: Map[String, Any]): DataFrame
<invalid inheritdoc annotation>
<invalid inheritdoc annotation>
- Definition Classes
- SparkSession → SparkSession
- Annotations
- @Experimental()
- def sql(sqlText: String, args: Map[String, Any]): DataFrame
<invalid inheritdoc annotation>
<invalid inheritdoc annotation>
- Definition Classes
- SparkSession → SparkSession
- Annotations
- @Experimental()
- def sql(sqlText: String, args: Array[_]): DataFrame
<invalid inheritdoc annotation>
<invalid inheritdoc annotation>
- Definition Classes
- SparkSession → SparkSession
- Annotations
- @Experimental()
- val sqlContext: SQLContext
A wrapped version of this session in the form of a SQLContext, for backward compatibility.
A wrapped version of this session in the form of a SQLContext, for backward compatibility.
- Since
2.0.0
- def stop(): Unit
- Definition Classes
- SparkSession
- def streams: StreamingQueryManager
Returns a
StreamingQueryManager
that allows managing all theStreamingQuery
s active onthis
.Returns a
StreamingQueryManager
that allows managing all theStreamingQuery
s active onthis
.- Annotations
- @Unstable()
- Since
2.0.0
- final def synchronized[T0](arg0: => T0): T0
- Definition Classes
- AnyRef
- def table(tableName: String): DataFrame
<invalid inheritdoc annotation>
<invalid inheritdoc annotation>
- Definition Classes
- SparkSession → SparkSession
- def time[T](f: => T): T
- Definition Classes
- SparkSession
- def toString(): String
- Definition Classes
- AnyRef → Any
- def udf: UDFRegistration
<invalid inheritdoc annotation>
<invalid inheritdoc annotation>
- Definition Classes
- SparkSession → SparkSession
- def version: String
<invalid inheritdoc annotation>
<invalid inheritdoc annotation>
- Definition Classes
- SparkSession → SparkSession
- final def wait(arg0: Long, arg1: Int): Unit
- Definition Classes
- AnyRef
- Annotations
- @throws(classOf[java.lang.InterruptedException])
- final def wait(arg0: Long): Unit
- Definition Classes
- AnyRef
- Annotations
- @throws(classOf[java.lang.InterruptedException]) @native()
- final def wait(): Unit
- Definition Classes
- AnyRef
- Annotations
- @throws(classOf[java.lang.InterruptedException])
- def withLogContext(context: HashMap[String, String])(body: => Unit): Unit
- Attributes
- protected
- Definition Classes
- Logging
- object implicits extends SQLImplicits with Serializable
(Scala-specific) Implicit methods available in Scala for converting common Scala objects into
DataFrame
s.(Scala-specific) Implicit methods available in Scala for converting common Scala objects into
DataFrame
s.val sparkSession = SparkSession.builder.getOrCreate() import sparkSession.implicits._
- Since
2.0.0
Deprecated Value Members
- def finalize(): Unit
- Attributes
- protected[lang]
- Definition Classes
- AnyRef
- Annotations
- @throws(classOf[java.lang.Throwable]) @Deprecated
- Deprecated
(Since version 9)