package sql
Package Members
- package artifact
- package avro
- package catalyst
- package classic
Allows the execution of relational queries, including those expressed in SQL using Spark.
- package columnar
- package connector
- package execution
The physical execution component of Spark SQL.
The physical execution component of Spark SQL. Note that this is a private package. All classes in catalyst are considered an internal API to Spark SQL and are subject to change between minor releases.
- package internal
All classes in this package are considered an internal API to Spark and are subject to change between minor releases.
- package jdbc
- package scripting
- package sources
A set of APIs for adding data sources to Spark SQL.
- package streaming
- package util
Type Members
- type DataFrame = Dataset[Row]
- class ExperimentalMethods extends AnyRef
:: Experimental :: Holder for experimental methods for the bravest.
:: Experimental :: Holder for experimental methods for the bravest. We make NO guarantee about the stability regarding binary compatibility and source compatibility of methods here.
spark.experimental.extraStrategies += ...
- Annotations
- @Experimental() @Unstable()
- Since
1.3.0
- trait ExtendedExplainGenerator extends AnyRef
A trait for a session extension to implement that provides addition explain plan information.
A trait for a session extension to implement that provides addition explain plan information.
- Annotations
- @DeveloperApi() @Since("4.0.0")
- class SparkSessionExtensions extends AnyRef
:: Experimental :: Holder for injection points to the SparkSession.
:: Experimental :: Holder for injection points to the SparkSession. We make NO guarantee about the stability regarding binary compatibility and source compatibility of methods here.
This current provides the following extension points:
- Analyzer Rules.
- Check Analysis Rules.
- Cache Plan Normalization Rules.
- Optimizer Rules.
- Pre CBO Rules.
- Planning Strategies.
- Customized Parser.
- (External) Catalog listeners.
- Columnar Rules.
- Adaptive Query Post Planner Strategy Rules.
- Adaptive Query Stage Preparation Rules.
- Adaptive Query Execution Runtime Optimizer Rules.
- Adaptive Query Stage Optimizer Rules.
The extensions can be used by calling
withExtensionson the SparkSession.Builder, for example:SparkSession.builder() .master("...") .config("...", true) .withExtensions { extensions => extensions.injectResolutionRule { session => ... } extensions.injectParser { (session, parser) => ... } } .getOrCreate()
The extensions can also be used by setting the Spark SQL configuration property
spark.sql.extensions. Multiple extensions can be set using a comma-separated list. For example:SparkSession.builder() .master("...") .config("spark.sql.extensions", "org.example.MyExtensions,org.example.YourExtensions") .getOrCreate() class MyExtensions extends Function1[SparkSessionExtensions, Unit] { override def apply(extensions: SparkSessionExtensions): Unit = { extensions.injectResolutionRule { session => ... } extensions.injectParser { (session, parser) => ... } } } class YourExtensions extends SparkSessionExtensionsProvider { override def apply(extensions: SparkSessionExtensions): Unit = { extensions.injectResolutionRule { session => ... } extensions.injectFunction(...) } }
Note that none of the injected builders should assume that the SparkSession is fully initialized and should not touch the session's internals (e.g. the SessionState).
- Annotations
- @DeveloperApi() @Experimental() @Unstable()
- trait SparkSessionExtensionsProvider extends (SparkSessionExtensions) => Unit
Base trait for implementations used by SparkSessionExtensions
Base trait for implementations used by SparkSessionExtensions
For example, now we have an external function named
Ageto register as an extension for SparkSession:package org.apache.spark.examples.extensions import org.apache.spark.sql.catalyst.expressions.{CurrentDate, Expression, RuntimeReplaceable, SubtractDates} case class Age(birthday: Expression, child: Expression) extends RuntimeReplaceable { def this(birthday: Expression) = this(birthday, SubtractDates(CurrentDate(), birthday)) override def exprsReplaced: Seq[Expression] = Seq(birthday) override protected def withNewChildInternal(newChild: Expression): Expression = copy(newChild) }
We need to create our extension which inherits SparkSessionExtensionsProvider Example:
package org.apache.spark.examples.extensions import org.apache.spark.sql.{SparkSessionExtensions, SparkSessionExtensionsProvider} import org.apache.spark.sql.catalyst.FunctionIdentifier import org.apache.spark.sql.catalyst.expressions.{Expression, ExpressionInfo} class MyExtensions extends SparkSessionExtensionsProvider { override def apply(v1: SparkSessionExtensions): Unit = { v1.injectFunction( (new FunctionIdentifier("age"), new ExpressionInfo(classOf[Age].getName, "age"), (children: Seq[Expression]) => new Age(children.head))) } }
Then, we can inject
MyExtensionsin three ways,- withExtensions of SparkSession.Builder
- Config - spark.sql.extensions
- java.util.ServiceLoader - Add to src/main/resources/META-INF/services/org.apache.spark.sql.SparkSessionExtensionsProvider
- Annotations
- @DeveloperApi() @Since("3.2.0")
- Since
3.2.0
- See also
SparkSession.Builder