package sql
Package Members
- package artifact
- package avro
- package catalyst
- package classic
Allows the execution of relational queries, including those expressed in SQL using Spark.
- package columnar
- package connector
- package execution
The physical execution component of Spark SQL.
The physical execution component of Spark SQL. Note that this is a private package. All classes in catalyst are considered an internal API to Spark SQL and are subject to change between minor releases.
- package internal
All classes in this package are considered an internal API to Spark and are subject to change between minor releases.
- package jdbc
- package scripting
- package sources
A set of APIs for adding data sources to Spark SQL.
- package streaming
- package util
Type Members
- type DataFrame = Dataset[Row]
- class ExperimentalMethods extends AnyRef
:: Experimental :: Holder for experimental methods for the bravest.
:: Experimental :: Holder for experimental methods for the bravest. We make NO guarantee about the stability regarding binary compatibility and source compatibility of methods here.
spark.experimental.extraStrategies += ...
- Annotations
- @Experimental() @Unstable()
- Since
1.3.0
- trait ExtendedExplainGenerator extends AnyRef
A trait for a session extension to implement that provides addition explain plan information.
A trait for a session extension to implement that provides addition explain plan information.
- Annotations
- @DeveloperApi() @Since("4.0.0")
- class SparkSessionExtensions extends AnyRef
:: Experimental :: Holder for injection points to the SparkSession.
:: Experimental :: Holder for injection points to the SparkSession. We make NO guarantee about the stability regarding binary compatibility and source compatibility of methods here.
This current provides the following extension points:
- Analyzer Rules.
- Check Analysis Rules.
- Cache Plan Normalization Rules.
- Optimizer Rules.
- Pre CBO Rules.
- Planning Strategies.
- Customized Parser.
- (External) Catalog listeners.
- Columnar Rules.
- Adaptive Query Post Planner Strategy Rules.
- Adaptive Query Stage Preparation Rules.
- Adaptive Query Execution Runtime Optimizer Rules.
- Adaptive Query Stage Optimizer Rules.
The extensions can be used by calling
withExtensions
on the SparkSession.Builder, for example:SparkSession.builder() .master("...") .config("...", true) .withExtensions { extensions => extensions.injectResolutionRule { session => ... } extensions.injectParser { (session, parser) => ... } } .getOrCreate()
The extensions can also be used by setting the Spark SQL configuration property
spark.sql.extensions
. Multiple extensions can be set using a comma-separated list. For example:SparkSession.builder() .master("...") .config("spark.sql.extensions", "org.example.MyExtensions,org.example.YourExtensions") .getOrCreate() class MyExtensions extends Function1[SparkSessionExtensions, Unit] { override def apply(extensions: SparkSessionExtensions): Unit = { extensions.injectResolutionRule { session => ... } extensions.injectParser { (session, parser) => ... } } } class YourExtensions extends SparkSessionExtensionsProvider { override def apply(extensions: SparkSessionExtensions): Unit = { extensions.injectResolutionRule { session => ... } extensions.injectFunction(...) } }
Note that none of the injected builders should assume that the SparkSession is fully initialized and should not touch the session's internals (e.g. the SessionState).
- Annotations
- @DeveloperApi() @Experimental() @Unstable()
- trait SparkSessionExtensionsProvider extends (SparkSessionExtensions) => Unit
:: Unstable ::
:: Unstable ::
Base trait for implementations used by SparkSessionExtensions
For example, now we have an external function named
Age
to register as an extension for SparkSession:package org.apache.spark.examples.extensions import org.apache.spark.sql.catalyst.expressions.{CurrentDate, Expression, RuntimeReplaceable, SubtractDates} case class Age(birthday: Expression, child: Expression) extends RuntimeReplaceable { def this(birthday: Expression) = this(birthday, SubtractDates(CurrentDate(), birthday)) override def exprsReplaced: Seq[Expression] = Seq(birthday) override protected def withNewChildInternal(newChild: Expression): Expression = copy(newChild) }
We need to create our extension which inherits SparkSessionExtensionsProvider Example:
package org.apache.spark.examples.extensions import org.apache.spark.sql.{SparkSessionExtensions, SparkSessionExtensionsProvider} import org.apache.spark.sql.catalyst.FunctionIdentifier import org.apache.spark.sql.catalyst.expressions.{Expression, ExpressionInfo} class MyExtensions extends SparkSessionExtensionsProvider { override def apply(v1: SparkSessionExtensions): Unit = { v1.injectFunction( (new FunctionIdentifier("age"), new ExpressionInfo(classOf[Age].getName, "age"), (children: Seq[Expression]) => new Age(children.head))) } }
Then, we can inject
MyExtensions
in three ways,- withExtensions of SparkSession.Builder
- Config - spark.sql.extensions
- java.util.ServiceLoader - Add to src/main/resources/META-INF/services/org.apache.spark.sql.SparkSessionExtensionsProvider
- Annotations
- @DeveloperApi() @Unstable() @Since("3.2.0")
- Since
3.2.0
- See also
SparkSession.Builder