SparkSession

Companion object SparkSession

class SparkSession extends sql.api.SparkSession[Dataset] with Logging

The entry point to programming Spark with the Dataset and DataFrame API.

In environments that this has been created upfront (e.g. REPL, notebooks), use the builder to get an existing session:

SparkSession.builder().getOrCreate()

The builder can also be used to create a new session:

SparkSession.builder
  .master("local")
  .appName("Word Count")
  .config("spark.some.config.option", "some-value")
  .getOrCreate()

Self Type: SparkSession
Annotations: @Stable()

Linear Supertypes

Logging, api.SparkSession[Dataset], Closeable, AutoCloseable, Serializable, AnyRef, Any

Ordering

Alphabetic
By Inheritance

Inherited

SparkSession
Logging
SparkSession
Closeable
AutoCloseable
Serializable
AnyRef
Any

Hide All
Show All

Visibility

Public
Protected

Type Members

implicit class LogStringContext extends AnyRef
Definition Classes
Logging

Value Members

final def !=(arg0: Any): Boolean
Definition Classes
AnyRef → Any
final def ##: Int
Definition Classes
AnyRef → Any
final def ==(arg0: Any): Boolean
Definition Classes
AnyRef → Any
def addArtifact(source: String, target: String): Unit
<invalid inheritdoc annotation>
<invalid inheritdoc annotation>
Definition Classes
SparkSession → SparkSession
Annotations
@Experimental()
def addArtifact(bytes: Array[Byte], target: String): Unit
<invalid inheritdoc annotation>
<invalid inheritdoc annotation>
Definition Classes
SparkSession → SparkSession
Annotations
@Experimental()
def addArtifact(uri: URI): Unit
<invalid inheritdoc annotation>
<invalid inheritdoc annotation>
Definition Classes
SparkSession → SparkSession
Annotations
@Experimental()
def addArtifact(path: String): Unit
<invalid inheritdoc annotation>
<invalid inheritdoc annotation>
Definition Classes
SparkSession → SparkSession
Annotations
@Experimental()
def addArtifacts(uri: URI*): Unit
<invalid inheritdoc annotation>
<invalid inheritdoc annotation>
Definition Classes
SparkSession → SparkSession
Annotations
@Experimental() @varargs()
final def asInstanceOf[T0]: T0
Definition Classes
Any
def baseRelationToDataFrame(baseRelation: BaseRelation): DataFrame
Convert a BaseRelation created for external data sources into a DataFrame.
Convert a BaseRelation created for external data sources into a DataFrame.
Since
2.0.0
lazy val catalog: Catalog
<invalid inheritdoc annotation>
<invalid inheritdoc annotation>
Definition Classes
SparkSession → SparkSession
Annotations
@transient()
def clone(): AnyRef
Attributes
protected[lang]
Definition Classes
AnyRef
Annotations
@throws(classOf[java.lang.CloneNotSupportedException]) @IntrinsicCandidate() @native()
def close(): Unit
Stop the underlying SparkContext.
Stop the underlying SparkContext.
Definition Classes
SparkSession → Closeable → AutoCloseable
Since
2.1.0
lazy val conf: RuntimeConfig
Runtime configuration interface for Spark.
Runtime configuration interface for Spark.
This is the interface through which the user can get and set all Spark and Hadoop configurations that are relevant to Spark SQL. When getting the value of a config, this defaults to the value set in the underlying SparkContext, if any.
Annotations
@transient()
Since
2.0.0
def createDataFrame(data: List[_], beanClass: Class[_]): DataFrame
<invalid inheritdoc annotation>
<invalid inheritdoc annotation>
Definition Classes
SparkSession → SparkSession
def createDataFrame(rdd: JavaRDD[_], beanClass: Class[_]): DataFrame
Applies a schema to an RDD of Java Beans.
Applies a schema to an RDD of Java Beans.
WARNING: Since there is no guaranteed ordering for fields in a Java Bean, SELECT * queries will return the columns in an undefined order.
Since
2.0.0
def createDataFrame(rdd: RDD[_], beanClass: Class[_]): DataFrame
Applies a schema to an RDD of Java Beans.
Applies a schema to an RDD of Java Beans.
WARNING: Since there is no guaranteed ordering for fields in a Java Bean, SELECT * queries will return the columns in an undefined order.
Since
2.0.0
def createDataFrame(rows: List[Row], schema: StructType): DataFrame
<invalid inheritdoc annotation>
<invalid inheritdoc annotation>
Definition Classes
SparkSession → SparkSession
Annotations
@DeveloperApi()
def createDataFrame(rowRDD: JavaRDD[Row], schema: StructType): DataFrame
:: DeveloperApi :: Creates a DataFrame from a JavaRDD containing Rows using the given schema.
:: DeveloperApi :: Creates a DataFrame from a JavaRDD containing Rows using the given schema. It is important to make sure that the structure of every Row of the provided RDD matches the provided schema. Otherwise, there will be runtime exception.
Annotations
@DeveloperApi()
Since
2.0.0

def createDataFrame(rowRDD: RDD[Row], schema: StructType): DataFrame

:: DeveloperApi :: Creates a DataFrame from an RDD containing Rows using the given schema.

:: DeveloperApi :: Creates a DataFrame from an RDD containing Rows using the given schema. It is important to make sure that the structure of every Row of the provided RDD matches the provided schema. Otherwise, there will be runtime exception. Example:

import org.apache.spark.sql._
import org.apache.spark.sql.types._
val sparkSession = new org.apache.spark.sql.SparkSession(sc)

val schema =
  StructType(
    StructField("name", StringType, false) ::
    StructField("age", IntegerType, true) :: Nil)

val people =
  sc.textFile("examples/src/main/resources/people.txt").map(
    _.split(",")).map(p => Row(p(0), p(1).trim.toInt))
val dataFrame = sparkSession.createDataFrame(people, schema)
dataFrame.printSchema
// root
// |-- name: string (nullable = false)
// |-- age: integer (nullable = true)

dataFrame.createOrReplaceTempView("people")
sparkSession.sql("select name from people").collect.foreach(println)

Annotations: @DeveloperApi()
Since: 2.0.0

def createDataFrame[A <: Product](data: Seq[A])(implicit arg0: scala.reflect.api.JavaUniverse.TypeTag[A]): DataFrame
<invalid inheritdoc annotation>
<invalid inheritdoc annotation>
Definition Classes
SparkSession → SparkSession
def createDataFrame[A <: Product](rdd: RDD[A])(implicit arg0: scala.reflect.api.JavaUniverse.TypeTag[A]): DataFrame
Creates a DataFrame from an RDD of Product (e.g.
Creates a DataFrame from an RDD of Product (e.g. case classes, tuples).
Since
2.0.0
def createDataset[T](data: List[T])(implicit arg0: Encoder[T]): Dataset[T]
<invalid inheritdoc annotation>
<invalid inheritdoc annotation>
Definition Classes
SparkSession → SparkSession
def createDataset[T](data: RDD[T])(implicit arg0: Encoder[T]): Dataset[T]
Creates a Dataset from an RDD of a given type.
Creates a Dataset from an RDD of a given type. This method requires an encoder (to convert a JVM object of type T to and from the internal Spark SQL representation) that is generally created automatically through implicits from a SparkSession, or can be created explicitly by calling static methods on Encoders.
Since
2.0.0
def createDataset[T](data: Seq[T])(implicit arg0: Encoder[T]): Dataset[T]
<invalid inheritdoc annotation>
<invalid inheritdoc annotation>
Definition Classes
SparkSession → SparkSession
def dataSource: DataSourceRegistration
A collection of methods for registering user-defined data sources.
A collection of methods for registering user-defined data sources.
Annotations
@Experimental() @Unstable()
Since
4.0.0
lazy val emptyDataFrame: DataFrame
<invalid inheritdoc annotation>
<invalid inheritdoc annotation>
Definition Classes
SparkSession → SparkSession
Annotations
@transient()
def emptyDataset[T](implicit arg0: Encoder[T]): Dataset[T]
<invalid inheritdoc annotation>
<invalid inheritdoc annotation>
Definition Classes
SparkSession → SparkSession
final def eq(arg0: AnyRef): Boolean
Definition Classes
AnyRef
def equals(arg0: AnyRef): Boolean
Definition Classes
AnyRef → Any
def executeCommand(runner: String, command: String, options: Map[String, String]): DataFrame
Execute an arbitrary string command inside an external execution engine rather than Spark.
Execute an arbitrary string command inside an external execution engine rather than Spark. This could be useful when user wants to execute some commands out of Spark. For example, executing custom DDL/DML command for JDBC, creating index for ElasticSearch, creating cores for Solr and so on.
The command will be eagerly executed after this method is called and the returned DataFrame will contain the output of the command(if any).
runner
The class name of the runner that implements ExternalCommandRunner.
command
The target command to be executed
options
The options for the runner.
Annotations
@Unstable()
Since
3.0.0
def experimental: ExperimentalMethods
:: Experimental :: A collection of methods that are considered experimental, but can be used to hook into the query planner for advanced functionality.
:: Experimental :: A collection of methods that are considered experimental, but can be used to hook into the query planner for advanced functionality.
Annotations
@Experimental() @Unstable()
Since
2.0.0
final def getClass(): Class[_ <: AnyRef]
Definition Classes
AnyRef → Any
Annotations
@IntrinsicCandidate() @native()
def hashCode(): Int
Definition Classes
AnyRef → Any
Annotations
@IntrinsicCandidate() @native()
def initializeLogIfNecessary(isInterpreter: Boolean, silent: Boolean): Boolean
Attributes
protected
Definition Classes
Logging
def initializeLogIfNecessary(isInterpreter: Boolean): Unit
Attributes
protected
Definition Classes
Logging
final def isInstanceOf[T0]: Boolean
Definition Classes
Any
def isTraceEnabled(): Boolean
Attributes
protected
Definition Classes
Logging
def listenerManager: ExecutionListenerManager
An interface to register custom org.apache.spark.sql.util.QueryExecutionListeners that listen for execution metrics.
An interface to register custom org.apache.spark.sql.util.QueryExecutionListeners that listen for execution metrics.
Since
2.0.0
def log: Logger
Attributes
protected
Definition Classes
Logging
def logDebug(msg: => String, throwable: Throwable): Unit
Attributes
protected
Definition Classes
Logging
def logDebug(entry: LogEntry, throwable: Throwable): Unit
Attributes
protected
Definition Classes
Logging
def logDebug(entry: LogEntry): Unit
Attributes
protected
Definition Classes
Logging
def logDebug(msg: => String): Unit
Attributes
protected
Definition Classes
Logging
def logError(msg: => String, throwable: Throwable): Unit
Attributes
protected
Definition Classes
Logging
def logError(entry: LogEntry, throwable: Throwable): Unit
Attributes
protected
Definition Classes
Logging
def logError(entry: LogEntry): Unit
Attributes
protected
Definition Classes
Logging
def logError(msg: => String): Unit
Attributes
protected
Definition Classes
Logging
def logInfo(msg: => String, throwable: Throwable): Unit
Attributes
protected
Definition Classes
Logging
def logInfo(entry: LogEntry, throwable: Throwable): Unit
Attributes
protected
Definition Classes
Logging
def logInfo(entry: LogEntry): Unit
Attributes
protected
Definition Classes
Logging
def logInfo(msg: => String): Unit
Attributes
protected
Definition Classes
Logging
def logName: String
Attributes
protected
Definition Classes
Logging
def logTrace(msg: => String, throwable: Throwable): Unit
Attributes
protected
Definition Classes
Logging
def logTrace(entry: LogEntry, throwable: Throwable): Unit
Attributes
protected
Definition Classes
Logging
def logTrace(entry: LogEntry): Unit
Attributes
protected
Definition Classes
Logging
def logTrace(msg: => String): Unit
Attributes
protected
Definition Classes
Logging
def logWarning(msg: => String, throwable: Throwable): Unit
Attributes
protected
Definition Classes
Logging
def logWarning(entry: LogEntry, throwable: Throwable): Unit
Attributes
protected
Definition Classes
Logging
def logWarning(entry: LogEntry): Unit
Attributes
protected
Definition Classes
Logging
def logWarning(msg: => String): Unit
Attributes
protected
Definition Classes
Logging
final def ne(arg0: AnyRef): Boolean
Definition Classes
AnyRef
def newSession(): SparkSession
<invalid inheritdoc annotation>
<invalid inheritdoc annotation>
Definition Classes
SparkSession → SparkSession
final def notify(): Unit
Definition Classes
AnyRef
Annotations
@IntrinsicCandidate() @native()
final def notifyAll(): Unit
Definition Classes
AnyRef
Annotations
@IntrinsicCandidate() @native()
def parseDataType(dataTypeString: String): DataType
Parses the data type in our internal string representation.
Parses the data type in our internal string representation. The data type string should have the same format as the one generated by toString in scala. It is only used by PySpark.
Attributes
protected[sql]
def range(start: Long, end: Long, step: Long, numPartitions: Int): Dataset[Long]
<invalid inheritdoc annotation>
<invalid inheritdoc annotation>
Definition Classes
SparkSession → SparkSession
def range(start: Long, end: Long, step: Long): Dataset[Long]
<invalid inheritdoc annotation>
<invalid inheritdoc annotation>
Definition Classes
SparkSession → SparkSession
def range(start: Long, end: Long): Dataset[Long]
<invalid inheritdoc annotation>
<invalid inheritdoc annotation>
Definition Classes
SparkSession → SparkSession
def range(end: Long): Dataset[Long]
<invalid inheritdoc annotation>
<invalid inheritdoc annotation>
Definition Classes
SparkSession → SparkSession
def read: DataFrameReader
<invalid inheritdoc annotation>
<invalid inheritdoc annotation>
Definition Classes
SparkSession → SparkSession
def readStream: DataStreamReader
Returns a DataStreamReader that can be used to read streaming data in as a DataFrame.
Returns a DataStreamReader that can be used to read streaming data in as a DataFrame.
```
sparkSession.readStream.parquet("/path/to/directory/of/parquet/files")
sparkSession.readStream.schema(schema).json("/path/to/directory/of/json/files")
```
Since
2.0.0
lazy val sessionState: SessionState
State isolated across sessions, including SQL configurations, temporary tables, registered functions, and everything else that accepts a org.apache.spark.sql.internal.SQLConf.
State isolated across sessions, including SQL configurations, temporary tables, registered functions, and everything else that accepts a org.apache.spark.sql.internal.SQLConf. If parentSessionState is not null, the SessionState will be a copy of the parent.
This is internal to Spark and there is no guarantee on interface stability.
Annotations
@Unstable() @transient()
Since
2.2.0
lazy val sharedState: SharedState
State shared across sessions, including the SparkContext, cached data, listener, and a catalog that interacts with external systems.
State shared across sessions, including the SparkContext, cached data, listener, and a catalog that interacts with external systems.
This is internal to Spark and there is no guarantee on interface stability.
Annotations
@Unstable() @transient()
Since
2.2.0
val sparkContext: SparkContext
def sql(sqlText: String): DataFrame
<invalid inheritdoc annotation>
<invalid inheritdoc annotation>
Definition Classes
SparkSession → SparkSession
def sql(sqlText: String, args: Map[String, Any]): DataFrame
<invalid inheritdoc annotation>
<invalid inheritdoc annotation>
Definition Classes
SparkSession → SparkSession
Annotations
@Experimental()
def sql(sqlText: String, args: Map[String, Any]): DataFrame
<invalid inheritdoc annotation>
<invalid inheritdoc annotation>
Definition Classes
SparkSession → SparkSession
Annotations
@Experimental()
def sql(sqlText: String, args: Array[_]): DataFrame
<invalid inheritdoc annotation>
<invalid inheritdoc annotation>
Definition Classes
SparkSession → SparkSession
Annotations
@Experimental()
val sqlContext: SQLContext
A wrapped version of this session in the form of a SQLContext, for backward compatibility.
A wrapped version of this session in the form of a SQLContext, for backward compatibility.
Since
2.0.0
def stop(): Unit
Definition Classes
SparkSession
def streams: StreamingQueryManager
Returns a StreamingQueryManager that allows managing all the StreamingQuerys active on this.
Returns a StreamingQueryManager that allows managing all the StreamingQuerys active on this.
Annotations
@Unstable()
Since
2.0.0
final def synchronized[T0](arg0: => T0): T0
Definition Classes
AnyRef
def table(tableName: String): DataFrame
<invalid inheritdoc annotation>
<invalid inheritdoc annotation>
Definition Classes
SparkSession → SparkSession
def time[T](f: => T): T
Definition Classes
SparkSession
def toString(): String
Definition Classes
AnyRef → Any
def udf: UDFRegistration
<invalid inheritdoc annotation>
<invalid inheritdoc annotation>
Definition Classes
SparkSession → SparkSession
def version: String
<invalid inheritdoc annotation>
<invalid inheritdoc annotation>
Definition Classes
SparkSession → SparkSession
final def wait(arg0: Long, arg1: Int): Unit
Definition Classes
AnyRef
Annotations
@throws(classOf[java.lang.InterruptedException])
final def wait(arg0: Long): Unit
Definition Classes
AnyRef
Annotations
@throws(classOf[java.lang.InterruptedException]) @native()
final def wait(): Unit
Definition Classes
AnyRef
Annotations
@throws(classOf[java.lang.InterruptedException])
def withLogContext(context: HashMap[String, String])(body: => Unit): Unit
Attributes
protected
Definition Classes
Logging
object implicits extends SQLImplicits with Serializable
(Scala-specific) Implicit methods available in Scala for converting common Scala objects into DataFrames.
(Scala-specific) Implicit methods available in Scala for converting common Scala objects into DataFrames.
```
val sparkSession = SparkSession.builder.getOrCreate()
import sparkSession.implicits._
```
Since
2.0.0

Deprecated Value Members

def finalize(): Unit
Attributes
protected[lang]
Definition Classes
AnyRef
Annotations
@throws(classOf[java.lang.Throwable]) @Deprecated
Deprecated
(Since version 9)

Packages

SparkSession

Companion object SparkSession

class SparkSession extends sql.api.SparkSession[Dataset] with Logging

Type Members

Value Members

Deprecated Value Members

Inherited from Logging

Inherited from api.SparkSession[Dataset]

Inherited from Closeable

Inherited from AutoCloseable

Inherited from Serializable

Inherited from AnyRef

Inherited from Any

Ungrouped

Packages

SparkSession

Companion object SparkSession

class SparkSession extends sql.api.SparkSession[Dataset] with Logging

Type Members

Value Members

Deprecated Value Members

Inherited from Logging

Inherited from api.SparkSession[Dataset]

Inherited from Closeable

Inherited from AutoCloseable

Inherited from Serializable

Inherited from AnyRef

Inherited from Any

Ungrouped

SparkSession