Package

org.apache.spark

sql

Permalink

package sql

Allows the execution of relational queries, including those expressed in SQL using Spark.

Linear Supertypes
AnyRef, Any
Ordering
  1. Alphabetic
  2. By inheritance
Inherited
  1. sql
  2. AnyRef
  3. Any
  1. Hide All
  2. Show all
Visibility
  1. Public
  2. All

Type Members

  1. class Column extends Logging

    Permalink

    :: Experimental :: A column that will be computed based on the data in a DataFrame.

    :: Experimental :: A column that will be computed based on the data in a DataFrame.

    A new column is constructed based on the input columns present in a dataframe:

    df("columnName")            // On a specific DataFrame.
    col("columnName")           // A generic column no yet associcated with a DataFrame.
    col("columnName.field")     // Extracting a struct field
    col("`a.column.with.dots`") // Escape `.` in column names.
    $"columnName"               // Scala short hand for a named column.
    expr("a + 1")               // A column that is constructed from a parsed SQL Expression.
    lit("abc")                  // A column that produces a literal (constant) value.

    Column objects can be composed to form complex expressions:

    $"a" + 1
    $"a" === $"b"
    Annotations
    @Experimental()
    Since

    1.3.0

  2. class ColumnName extends Column

    Permalink

    :: Experimental :: A convenient class used for constructing schema.

    :: Experimental :: A convenient class used for constructing schema.

    Annotations
    @Experimental()
    Since

    1.3.0

  3. class DataFrame extends Queryable with Serializable

    Permalink

    :: Experimental :: A distributed collection of data organized into named columns.

    :: Experimental :: A distributed collection of data organized into named columns.

    A DataFrame is equivalent to a relational table in Spark SQL. The following example creates a DataFrame by pointing Spark SQL to a Parquet data set.

    val people = sqlContext.read.parquet("...")  // in Scala
    DataFrame people = sqlContext.read().parquet("...")  // in Java

    Once created, it can be manipulated using the various domain-specific-language (DSL) functions defined in: DataFrame (this class), Column, and functions.

    To select a column from the data frame, use apply method in Scala and col in Java.

    val ageCol = people("age")  // in Scala
    Column ageCol = people.col("age")  // in Java

    Note that the Column type can also be manipulated through its various functions.

    // The following creates a new column that increases everybody's age by 10.
    people("age") + 10  // in Scala
    people.col("age").plus(10);  // in Java

    A more concrete example in Scala:

    // To create DataFrame using SQLContext
    val people = sqlContext.read.parquet("...")
    val department = sqlContext.read.parquet("...")
    
    people.filter("age > 30")
      .join(department, people("deptId") === department("id"))
      .groupBy(department("name"), "gender")
      .agg(avg(people("salary")), max(people("age")))

    and in Java:

    // To create DataFrame using SQLContext
    DataFrame people = sqlContext.read().parquet("...");
    DataFrame department = sqlContext.read().parquet("...");
    
    people.filter("age".gt(30))
      .join(department, people.col("deptId").equalTo(department("id")))
      .groupBy(department.col("name"), "gender")
      .agg(avg(people.col("salary")), max(people.col("age")));
    Annotations
    @Experimental()
    Since

    1.3.0

  4. case class DataFrameHolder extends Product with Serializable

    Permalink

    A container for a DataFrame, used for implicit conversions.

    A container for a DataFrame, used for implicit conversions.

    To use this, import implicit conversions in SQL:

    import sqlContext.implicits._
    Since

    1.3.0

  5. final class DataFrameNaFunctions extends AnyRef

    Permalink

    :: Experimental :: Functionality for working with missing data in DataFrames.

    :: Experimental :: Functionality for working with missing data in DataFrames.

    Annotations
    @Experimental()
    Since

    1.3.1

  6. class DataFrameReader extends Logging

    Permalink

    :: Experimental :: Interface used to load a DataFrame from external storage systems (e.g.

    :: Experimental :: Interface used to load a DataFrame from external storage systems (e.g. file systems, key-value stores, etc). Use SQLContext.read to access this.

    Annotations
    @Experimental()
    Since

    1.4.0

  7. final class DataFrameStatFunctions extends AnyRef

    Permalink

    :: Experimental :: Statistic functions for DataFrames.

    :: Experimental :: Statistic functions for DataFrames.

    Annotations
    @Experimental()
    Since

    1.4.0

  8. final class DataFrameWriter extends AnyRef

    Permalink

    :: Experimental :: Interface used to write a DataFrame to external storage systems (e.g.

    :: Experimental :: Interface used to write a DataFrame to external storage systems (e.g. file systems, key-value stores, etc). Use DataFrame.write to access this.

    Annotations
    @Experimental()
    Since

    1.4.0

  9. class Dataset[T] extends Queryable with Serializable

    Permalink

    :: Experimental :: A Dataset is a strongly typed collection of objects that can be transformed in parallel using functional or relational operations.

    :: Experimental :: A Dataset is a strongly typed collection of objects that can be transformed in parallel using functional or relational operations.

    A Dataset differs from an RDD in the following ways:

    • Internally, a Dataset is represented by a Catalyst logical plan and the data is stored in the encoded form. This representation allows for additional logical operations and enables many operations (sorting, shuffling, etc.) to be performed without deserializing to an object.
    • The creation of a Dataset requires the presence of an explicit Encoder that can be used to serialize the object into a binary format. Encoders are also capable of mapping the schema of a given object to the Spark SQL type system. In contrast, RDDs rely on runtime reflection based serialization. Operations that change the type of object stored in the dataset also need an encoder for the new type.

    A Dataset can be thought of as a specialized DataFrame, where the elements map to a specific JVM object type, instead of to a generic Row container. A DataFrame can be transformed into specific Dataset by calling df.as[ElementType]. Similarly you can transform a strongly-typed Dataset to a generic DataFrame by calling ds.toDF().

    COMPATIBILITY NOTE: Long term we plan to make DataFrame extend Dataset[Row]. However, making this change to the class hierarchy would break the function signatures for the existing functional operations (map, flatMap, etc). As such, this class should be considered a preview of the final API. Changes will be made to the interface after Spark 1.6.

    Annotations
    @Experimental()
    Since

    1.6.0

  10. case class DatasetHolder[T] extends Product with Serializable

    Permalink

    A container for a Dataset, used for implicit conversions.

    A container for a Dataset, used for implicit conversions.

    To use this, import implicit conversions in SQL:

    import sqlContext.implicits._
    Since

    1.6.0

  11. class ExperimentalMethods extends AnyRef

    Permalink

    :: Experimental :: Holder for experimental methods for the bravest.

    :: Experimental :: Holder for experimental methods for the bravest. We make NO guarantee about the stability regarding binary compatibility and source compatibility of methods here.

    sqlContext.experimental.extraStrategies += ...
    Annotations
    @Experimental()
    Since

    1.3.0

  12. class GroupedData extends AnyRef

    Permalink

    :: Experimental :: A set of methods for aggregations on a DataFrame, created by DataFrame.groupBy.

    :: Experimental :: A set of methods for aggregations on a DataFrame, created by DataFrame.groupBy.

    The main method is the agg function, which has multiple variants. This class also contains convenience some first order statistics such as mean, sum for convenience.

    Annotations
    @Experimental()
    Since

    1.3.0

  13. class GroupedDataset[K, V] extends Serializable

    Permalink

    :: Experimental :: A Dataset has been logically grouped by a user specified grouping key.

    :: Experimental :: A Dataset has been logically grouped by a user specified grouping key. Users should not construct a GroupedDataset directly, but should instead call groupBy on an existing Dataset.

    COMPATIBILITY NOTE: Long term we plan to make GroupedDataset) extend GroupedData. However, making this change to the class hierarchy would break some function signatures. As such, this class should be considered a preview of the final API. Changes will be made to the interface after Spark 1.6.

    Annotations
    @Experimental()
    Since

    1.6.0

  14. class SQLContext extends Logging with Serializable

    Permalink

    The entry point for working with structured data (rows and columns) in Spark.

    The entry point for working with structured data (rows and columns) in Spark. Allows the creation of DataFrame objects as well as the execution of SQL queries.

    Since

    1.0.0

  15. abstract class SQLImplicits extends AnyRef

    Permalink

    A collection of implicit methods for converting common Scala objects into DataFrames.

    A collection of implicit methods for converting common Scala objects into DataFrames.

    Since

    1.6.0

  16. final class SaveMode extends Enum[SaveMode]

    Permalink
  17. type Strategy = GenericStrategy[SparkPlan]

    Permalink

    Converts a logical plan into zero or more SparkPlans.

    Converts a logical plan into zero or more SparkPlans. This API is exposed for experimenting with the query planner and is not designed to be stable across spark releases. Developers writing libraries should instead consider using the stable APIs provided in org.apache.spark.sql.sources

    Annotations
    @DeveloperApi()
  18. class TypedColumn[-T, U] extends Column

    Permalink

    A Column where an Encoder has been given for the expected input and return type.

    A Column where an Encoder has been given for the expected input and return type. To create a TypedColumn, use the as function on a Column.

    T

    The input type expected for this expression. Can be Any if the expression is type checked by the analyzer instead of the compiler (i.e. expr("sum(...)")).

    U

    The output type of this column.

    Since

    1.6.0

  19. class UDFRegistration extends Logging

    Permalink

    Functions for registering user-defined functions.

    Functions for registering user-defined functions. Use SQLContext.udf to access this.

    Since

    1.3.0

  20. case class UserDefinedFunction(f: AnyRef, dataType: DataType, inputTypes: Seq[DataType] = Nil) extends Product with Serializable

    Permalink

    A user-defined function.

    A user-defined function. To create one, use the udf functions in functions. As an example:

    // Defined a UDF that returns true or false based on some numeric score.
    val predict = udf((score: Double) => if (score > 0.5) true else false)
    
    // Projects a column that adds a prediction column based on the score column.
    df.select( predict(df("score")) )
    Annotations
    @Experimental()
    Since

    1.3.0

  21. type SchemaRDD = DataFrame

    Permalink

    Type alias for DataFrame.

    Type alias for DataFrame. Kept here for backward source compatibility for Scala.

    Annotations
    @deprecated
    Deprecated

    (Since version 1.3.0) use DataFrame

Value Members

  1. object SQLContext extends Serializable

    Permalink

    This SQLContext object contains utility functions to create a singleton SQLContext instance, or to get the created SQLContext instance.

    This SQLContext object contains utility functions to create a singleton SQLContext instance, or to get the created SQLContext instance.

    It also provides utility functions to support preference for threads in multiple sessions scenario, setActive could set a SQLContext for current thread, which will be returned by getOrCreate instead of the global one.

  2. package api

    Permalink

    Contains API classes that are specific to a single language (i.e.

    Contains API classes that are specific to a single language (i.e. Java).

  3. package execution

    Permalink

    The physical execution component of Spark SQL.

    The physical execution component of Spark SQL. Note that this is a private package. All classes in catalyst are considered an internal API to Spark SQL and are subject to change between minor releases.

  4. package expressions

    Permalink
  5. object functions extends LegacyFunctions

    Permalink

    :: Experimental :: Functions available for DataFrame.

    :: Experimental :: Functions available for DataFrame.

    Annotations
    @Experimental()
    Since

    1.3.0

  6. package jdbc

    Permalink
  7. package sources

    Permalink

    A set of APIs for adding data sources to Spark SQL.

  8. package util

    Permalink

Inherited from AnyRef

Inherited from Any

Ungrouped