:: Experimental :: A column in a DataFrame.
:: Experimental :: A column in a DataFrame.
1.3.0
:: Experimental :: A convenient class used for constructing schema.
:: Experimental :: A convenient class used for constructing schema.
1.3.0
:: Experimental :: A distributed collection of data organized into named columns.
:: Experimental :: A distributed collection of data organized into named columns.
A DataFrame is equivalent to a relational table in Spark SQL. The following example creates a DataFrame by pointing Spark SQL to a Parquet data set.
val people = sqlContext.read.parquet("...") // in Scala DataFrame people = sqlContext.read().parquet("...") // in Java
Once created, it can be manipulated using the various domain-specific-language (DSL) functions defined in: DataFrame (this class), Column, and functions.
To select a column from the data frame, use apply
method in Scala and col
in Java.
val ageCol = people("age") // in Scala Column ageCol = people.col("age") // in Java
Note that the Column type can also be manipulated through its various functions.
// The following creates a new column that increases everybody's age by 10. people("age") + 10 // in Scala people.col("age").plus(10); // in Java
A more concrete example in Scala:
// To create DataFrame using SQLContext val people = sqlContext.read.parquet("...") val department = sqlContext.read.parquet("...") people.filter("age > 30") .join(department, people("deptId") === department("id")) .groupBy(department("name"), "gender") .agg(avg(people("salary")), max(people("age")))
and in Java:
// To create DataFrame using SQLContext DataFrame people = sqlContext.read().parquet("..."); DataFrame department = sqlContext.read().parquet("..."); people.filter("age".gt(30)) .join(department, people.col("deptId").equalTo(department("id"))) .groupBy(department.col("name"), "gender") .agg(avg(people.col("salary")), max(people.col("age")));
1.3.0
:: Experimental :: Functionality for working with missing data in DataFrames.
:: Experimental :: Functionality for working with missing data in DataFrames.
1.3.1
:: Experimental :: Interface used to load a DataFrame from external storage systems (e.g.
:: Experimental :: Interface used to load a DataFrame from external storage systems (e.g. file systems, key-value stores, etc). Use SQLContext.read to access this.
1.4.0
:: Experimental :: Statistic functions for DataFrames.
:: Experimental :: Statistic functions for DataFrames.
1.4.0
:: Experimental :: Interface used to write a DataFrame to external storage systems (e.g.
:: Experimental :: Interface used to write a DataFrame to external storage systems (e.g. file systems, key-value stores, etc). Use DataFrame.write to access this.
1.4.0
:: Experimental :: Holder for experimental methods for the bravest.
:: Experimental :: Holder for experimental methods for the bravest. We make NO guarantee about the stability regarding binary compatibility and source compatibility of methods here.
sqlContext.experimental.extraStrategies += ...
1.3.0
:: Experimental :: A set of methods for aggregations on a DataFrame, created by DataFrame.groupBy.
:: Experimental :: A set of methods for aggregations on a DataFrame, created by DataFrame.groupBy.
1.3.0
The entry point for working with structured data (rows and columns) in Spark.
The entry point for working with structured data (rows and columns) in Spark. Allows the creation of DataFrame objects as well as the execution of SQL queries.
1.0.0
Converts a logical plan into zero or more SparkPlans.
Converts a logical plan into zero or more SparkPlans. This API is exposed for experimenting with the query planner and is not designed to be stable across spark releases. Developers writing libraries should instead consider using the stable APIs provided in org.apache.spark.sql.sources
Functions for registering user-defined functions.
Functions for registering user-defined functions. Use SQLContext.udf to access this.
1.3.0
A user-defined function.
A user-defined function. To create one, use the udf
functions in functions.
As an example:
// Defined a UDF that returns true or false based on some numeric score. val predict = udf((score: Double) => if (score > 0.5) true else false) // Projects a column that adds a prediction column based on the score column. df.select( predict(df("score")) )
1.3.0
Type alias for DataFrame.
Type alias for DataFrame. Kept here for backward source compatibility for Scala.
(Since version use DataFrame) 1.3.0
This SQLContext object contains utility functions to create a singleton SQLContext instance, or to get the last created SQLContext instance.
Contains API classes that are specific to a single language (i.e.
Contains API classes that are specific to a single language (i.e. Java).
:: DeveloperApi :: An execution engine for relational query plans that runs on top Spark and returns RDDs.
:: DeveloperApi :: An execution engine for relational query plans that runs on top Spark and returns RDDs.
Note that the operators in this package are created automatically by a query planner using a SQLContext and are not intended to be used directly by end users of Spark SQL. They are documented here in order to make it easier for others to understand the performance characteristics of query plans that are generated by Spark SQL.
:: Experimental :: Functions available for DataFrame.
:: Experimental :: Functions available for DataFrame.
1.3.0
A set of APIs for adding data sources to Spark SQL.
Allows the execution of relational queries, including those expressed in SQL using Spark.