Packages

package classic

Allows the execution of relational queries, including those expressed in SQL using Spark.

Linear Supertypes
AnyRef, Any
Ordering
  1. Alphabetic
  2. By Inheritance
Inherited
  1. classic
  2. AnyRef
  3. Any
  1. Hide All
  2. Show All
Visibility
  1. Public
  2. Protected

Type Members

  1. class Catalog extends catalog.Catalog

    Internal implementation of the user-facing Catalog.

  2. trait ClassicConversions extends AnyRef

    Conversions from sql interfaces to the Classic specific implementation.

    Conversions from sql interfaces to the Classic specific implementation.

    This class is mainly used by the implementation. It is also meant to be used by extension developers.

    We provide both a trait and an object. The trait is useful in situations where an extension developer needs to use these conversions in a project covering multiple Spark versions. They can create a shim for these conversions, the Spark 4+ version of the shim implements this trait, and shims for older versions do not.

    Annotations
    @DeveloperApi()
  3. trait ColumnConversions extends AnyRef

    Conversions from a Column to an Expression.

    Conversions from a Column to an Expression.

    Annotations
    @DeveloperApi()
  4. type DataFrame = Dataset[Row]
  5. final class DataFrameNaFunctions extends sql.DataFrameNaFunctions

    Functionality for working with missing data in DataFrames.

    Functionality for working with missing data in DataFrames.

    Annotations
    @Stable()
    Since

    1.3.1

  6. class DataFrameReader extends sql.DataFrameReader

    Interface used to load a Dataset from external storage systems (e.g.

    Interface used to load a Dataset from external storage systems (e.g. file systems, key-value stores, etc). Use SparkSession.read to access this.

    Annotations
    @Stable()
    Since

    1.4.0

  7. final class DataFrameStatFunctions extends sql.DataFrameStatFunctions

    Statistic functions for DataFrames.

    Statistic functions for DataFrames.

    Annotations
    @Stable()
    Since

    1.4.0

  8. final class DataFrameWriter[T] extends sql.DataFrameWriter[T]

    Interface used to write a Dataset to external storage systems (e.g.

    Interface used to write a Dataset to external storage systems (e.g. file systems, key-value stores, etc). Use Dataset.write to access this.

    Annotations
    @Stable()
    Since

    1.4.0

  9. final class DataFrameWriterV2[T] extends sql.DataFrameWriterV2[T]

    Interface used to write a org.apache.spark.sql.classic.Dataset to external storage using the v2 API.

    Interface used to write a org.apache.spark.sql.classic.Dataset to external storage using the v2 API.

    Annotations
    @Experimental()
    Since

    3.0.0

  10. final class DataStreamReader extends streaming.DataStreamReader

    Interface used to load a streaming Dataset from external storage systems (e.g.

    Interface used to load a streaming Dataset from external storage systems (e.g. file systems, key-value stores, etc). Use SparkSession.readStream to access this.

    Annotations
    @Evolving()
    Since

    2.0.0

  11. final class DataStreamWriter[T] extends streaming.DataStreamWriter[T]

    Interface used to write a streaming Dataset to external storage systems (e.g.

    Interface used to write a streaming Dataset to external storage systems (e.g. file systems, key-value stores, etc). Use Dataset.writeStream to access this.

    Annotations
    @Evolving()
    Since

    2.0.0

  12. class Dataset[T] extends sql.Dataset[T]

    A Dataset is a strongly typed collection of domain-specific objects that can be transformed in parallel using functional or relational operations.

    A Dataset is a strongly typed collection of domain-specific objects that can be transformed in parallel using functional or relational operations. Each Dataset also has an untyped view called a DataFrame, which is a Dataset of Row.

    Operations available on Datasets are divided into transformations and actions. Transformations are the ones that produce new Datasets, and actions are the ones that trigger computation and return results. Example transformations include map, filter, select, and aggregate (groupBy). Example actions count, show, or writing data out to file systems.

    Datasets are "lazy", i.e. computations are only triggered when an action is invoked. Internally, a Dataset represents a logical plan that describes the computation required to produce the data. When an action is invoked, Spark's query optimizer optimizes the logical plan and generates a physical plan for efficient execution in a parallel and distributed manner. To explore the logical plan as well as optimized physical plan, use the explain function.

    To efficiently support domain-specific objects, an Encoder is required. The encoder maps the domain specific type T to Spark's internal type system. For example, given a class Person with two fields, name (string) and age (int), an encoder is used to tell Spark to generate code at runtime to serialize the Person object into a binary structure. This binary structure often has much lower memory footprint as well as are optimized for efficiency in data processing (e.g. in a columnar format). To understand the internal binary representation for data, use the schema function.

    There are typically two ways to create a Dataset. The most common way is by pointing Spark to some files on storage systems, using the read function available on a SparkSession.

    val people = spark.read.parquet("...").as[Person]  // Scala
    Dataset<Person> people = spark.read().parquet("...").as(Encoders.bean(Person.class)); // Java

    Datasets can also be created through transformations available on existing Datasets. For example, the following creates a new Dataset by applying a filter on the existing one:

    val names = people.map(_.name)  // in Scala; names is a Dataset[String]
    Dataset<String> names = people.map(
      (MapFunction<Person, String>) p -> p.name, Encoders.STRING()); // Java

    Dataset operations can also be untyped, through various domain-specific-language (DSL) functions defined in: Dataset (this class), Column, and functions. These operations are very similar to the operations available in the data frame abstraction in R or Python.

    To select a column from the Dataset, use apply method in Scala and col in Java.

    val ageCol = people("age")  // in Scala
    Column ageCol = people.col("age"); // in Java

    Note that the Column type can also be manipulated through its various functions.

    // The following creates a new column that increases everybody's age by 10.
    people("age") + 10  // in Scala
    people.col("age").plus(10);  // in Java

    A more concrete example in Scala:

    // To create Dataset[Row] using SparkSession
    val people = spark.read.parquet("...")
    val department = spark.read.parquet("...")
    
    people.filter("age > 30")
      .join(department, people("deptId") === department("id"))
      .groupBy(department("name"), people("gender"))
      .agg(avg(people("salary")), max(people("age")))

    and in Java:

    // To create Dataset<Row> using SparkSession
    Dataset<Row> people = spark.read().parquet("...");
    Dataset<Row> department = spark.read().parquet("...");
    
    people.filter(people.col("age").gt(30))
      .join(department, people.col("deptId").equalTo(department.col("id")))
      .groupBy(department.col("name"), people.col("gender"))
      .agg(avg(people.col("salary")), max(people.col("age")));
    Annotations
    @Stable()
    Since

    1.6.0

  13. class DatasetHolder[U] extends sql.DatasetHolder[U]
  14. class KeyValueGroupedDataset[K, V] extends sql.KeyValueGroupedDataset[K, V]

    A Dataset has been logically grouped by a user specified grouping key.

    A Dataset has been logically grouped by a user specified grouping key. Users should not construct a KeyValueGroupedDataset directly, but should instead call groupByKey on an existing Dataset.

    Since

    2.0.0

  15. class MergeIntoWriter[T] extends sql.MergeIntoWriter[T]

    MergeIntoWriter provides methods to define and execute merge actions based on specified conditions.

    MergeIntoWriter provides methods to define and execute merge actions based on specified conditions.

    T

    the type of data in the Dataset.

    Annotations
    @Experimental()
    Since

    4.0.0

  16. class RelationalGroupedDataset extends sql.RelationalGroupedDataset

    A set of methods for aggregations on a DataFrame, created by groupBy, cube or rollup (and also pivot).

    A set of methods for aggregations on a DataFrame, created by groupBy, cube or rollup (and also pivot).

    The main method is the agg function, which has multiple variants. This class also contains some first-order statistics such as mean, sum for convenience.

    Annotations
    @Stable()
    Since

    2.0.0

    Note

    This class was named GroupedData in Spark 1.x.

  17. class RichColumn extends AnyRef

    Helper class that adds the expr and named methods to a Column.

    Helper class that adds the expr and named methods to a Column. This can be used to reinstate the pre-Spark 4 Column functionality.

    Annotations
    @DeveloperApi()
  18. class RuntimeConfig extends sql.RuntimeConfig

    Runtime configuration interface for Spark.

    Runtime configuration interface for Spark. To access this, use SparkSession.conf.

    Options set here are automatically propagated to the Hadoop configuration during I/O.

    Annotations
    @Stable()
    Since

    2.0.0

  19. class SQLContext extends sql.SQLContext

    The entry point for working with structured data (rows and columns) in Spark 1.x.

    The entry point for working with structured data (rows and columns) in Spark 1.x.

    As of Spark 2.0, this is replaced by SparkSession. However, we are keeping the class here for backward compatibility.

    Annotations
    @Stable()
    Since

    1.0.0

  20. abstract class SQLImplicits extends sql.SQLImplicits

    <invalid inheritdoc annotation>

  21. class SparkSession extends sql.SparkSession with Logging with ColumnConversions

    The entry point to programming Spark with the Dataset and DataFrame API.

    The entry point to programming Spark with the Dataset and DataFrame API.

    In environments that this has been created upfront (e.g. REPL, notebooks), use the builder to get an existing session:

    SparkSession.builder().getOrCreate()

    The builder can also be used to create a new session:

    SparkSession.builder
      .master("local")
      .appName("Word Count")
      .config("spark.some.config.option", "some-value")
      .getOrCreate()
    Annotations
    @Stable()
  22. type Strategy = SparkStrategy

    Converts a logical plan into zero or more SparkPlans.

    Converts a logical plan into zero or more SparkPlans. This API is exposed for experimenting with the query planner and is not designed to be stable across spark releases. Developers writing libraries should instead consider using the stable APIs provided in org.apache.spark.sql.sources

    Annotations
    @DeveloperApi() @Unstable()
  23. trait StreamingQuery extends streaming.StreamingQuery

    <invalid inheritdoc annotation>

  24. class StreamingQueryManager extends streaming.StreamingQueryManager with Logging

    A class to manage all the StreamingQuery active in a SparkSession.

    A class to manage all the StreamingQuery active in a SparkSession.

    Annotations
    @Evolving()
    Since

    2.0.0

  25. class TableValuedFunction extends sql.TableValuedFunction
  26. class UDFRegistration extends sql.UDFRegistration with Logging

    Functions for registering user-defined functions.

    Functions for registering user-defined functions. Use SparkSession.udf to access this:

    spark.udf
    Annotations
    @Stable()
    Since

    1.3.0

Value Members

  1. object ClassicConversions extends ClassicConversions
    Annotations
    @DeveloperApi()
  2. object ColumnConversions extends ColumnConversions

    Automatic conversions from a Column to an Expression.

    Automatic conversions from a Column to an Expression. This uses the active SparkSession for parsing, and the active SQLConf for fetching configurations.

    This functionality is not part of the ClassicConversions because it is generally better to use SparkSession.toRichColumn(...) or SparkSession.expression(...) directly.

    Annotations
    @DeveloperApi()
  3. object DataStreamWriter
  4. object SQLContext extends SQLContextCompanion with Serializable
  5. object SparkSession extends SparkSessionCompanion with Logging with Serializable
    Annotations
    @Stable()

Inherited from AnyRef

Inherited from Any

Ungrouped