RelationalGroupedDataset

Instance Constructors

new RelationalGroupedDataset(underlying: org.apache.spark.sql.RelationalGroupedDataset)

Value Members

final def !=(arg0: Any): Boolean

Definition Classes
AnyRef → Any
final def ##(): Int

Definition Classes
AnyRef → Any
final def ==(arg0: Any): Boolean

Definition Classes
AnyRef → Any
def agg(expr: Column, exprs: Column*): TryAnalysis[DataFrame]

Compute aggregates by specifying a series of aggregate columns.
Compute aggregates by specifying a series of aggregate columns. Note that this function by default retains the grouping columns in its output. To not retain grouping columns, set spark.sql.retainGroupColumns to false.
The available aggregate methods are defined in org.apache.spark.sql.functions.
```
// Selects the age of the oldest employee and the aggregate expense for each department

// Scala:
import org.apache.spark.sql.functions._
df.groupBy("department").agg(max("age"), sum("expense"))

// Java:
import static org.apache.spark.sql.functions.*;
df.groupBy("department").agg(max("age"), sum("expense"));
```
Note that before Spark 1.4, the default behavior is to NOT retain grouping columns. To change to that behavior, set config variable spark.sql.retainGroupColumns to false.
```
// Scala, 1.3.x:
df.groupBy("department").agg($"department", max("age"), sum("expense"))

// Java, 1.3.x:
df.groupBy("department").agg(col("department"), max("age"), sum("expense"));
```
Since
1.3.0
def agg(exprs: Map[String, String]): TryAnalysis[DataFrame]

Compute aggregates by specifying a map from column name to aggregate methods.
Compute aggregates by specifying a map from column name to aggregate methods. The resulting DataFrame will also contain the grouping columns.
The available aggregate methods are avg, max, min, sum, count.
```
// Selects the age of the oldest employee and the aggregate expense for each department
df.groupBy("department").agg(Map(
  "age" -> "max",
  "expense" -> "sum"
))
```
Since
1.3.0
def agg(aggExpr: (String, String), aggExprs: (String, String)*): TryAnalysis[DataFrame]

Compute aggregates by specifying the column names and aggregate methods.
Compute aggregates by specifying the column names and aggregate methods. The resulting DataFrame will also contain the grouping columns.
The available aggregate methods are avg, max, min, sum, count.
```
// Selects the age of the oldest employee and the aggregate expense for each department
df.groupBy("department").agg(
  "age" -> "max",
  "expense" -> "sum"
)
```
Since
1.3.0
final def asInstanceOf[T0]: T0

Definition Classes
Any
def avg(colNames: String*): TryAnalysis[DataFrame]

Compute the mean value for each numeric columns for each group.
Compute the mean value for each numeric columns for each group. The resulting DataFrame will also contain the grouping columns. When specified columns are given, only compute the mean values for them.

Since
1.3.0
def clone(): AnyRef

Attributes
protected[java.lang]
Definition Classes
AnyRef
Annotations
@throws( ... )
def count: DataFrame

Count the number of rows for each group.
Count the number of rows for each group. The resulting DataFrame will also contain the grouping columns.

Since
1.3.0
final def eq(arg0: AnyRef): Boolean

Definition Classes
AnyRef
def finalize(): Unit

Attributes
protected[java.lang]
Definition Classes
AnyRef
Annotations
@throws( classOf[java.lang.Throwable] )
def get[U](f: (org.apache.spark.sql.RelationalGroupedDataset) ⇒ U): U

Applies an action to the underlying RelationalGroupedDataset.
final def getClass(): Class[_]

Definition Classes
AnyRef → Any
def getWithAnalysis[U](f: (org.apache.spark.sql.RelationalGroupedDataset) ⇒ U): TryAnalysis[U]

Applies an action to the underlying RelationalGroupedDataset, it is used for transformations that can fail due to an AnalysisException.
final def isInstanceOf[T0]: Boolean

Definition Classes
Any
def max(colNames: String*): TryAnalysis[DataFrame]

Compute the max value for each numeric columns for each group.
Compute the max value for each numeric columns for each group. The resulting DataFrame will also contain the grouping columns. When specified columns are given, only compute the max values for them.

Since
1.3.0
def mean(colNames: String*): TryAnalysis[DataFrame]

Compute the average value for each numeric columns for each group.
Compute the average value for each numeric columns for each group. This is an alias for avg. The resulting DataFrame will also contain the grouping columns. When specified columns are given, only compute the average values for them.

Since
1.3.0
def min(colNames: String*): TryAnalysis[DataFrame]

Compute the min value for each numeric column for each group.
Compute the min value for each numeric column for each group. The resulting DataFrame will also contain the grouping columns. When specified columns are given, only compute the min values for them.

Since
1.3.0
final def ne(arg0: AnyRef): Boolean

Definition Classes
AnyRef
final def notify(): Unit

Definition Classes
AnyRef
final def notifyAll(): Unit

Definition Classes
AnyRef
def pivot(pivotColumn: Column, values: Seq[Any]): TryAnalysis[RelationalGroupedDataset]

Pivots a column of the current DataFrame and performs the specified aggregation.
Pivots a column of the current DataFrame and performs the specified aggregation. This is an overloaded version of the pivot method with pivotColumn of the String type.
```
// Compute the sum of earnings for each year by course with each course as a separate column
df.groupBy($"year").pivot($"course", Seq("dotNET", "Java")).sum($"earnings")
```
pivotColumn
the column to pivot.
values
List of values that will be translated to columns in the output DataFrame.

Since
2.4.0
def pivot(pivotColumn: Column): TryAnalysis[RelationalGroupedDataset]

Pivots a column of the current DataFrame and performs the specified aggregation.
Pivots a column of the current DataFrame and performs the specified aggregation. This is an overloaded version of the pivot method with pivotColumn of the String type.
```
// Or without specifying column values (less efficient)
df.groupBy($"year").pivot($"course").sum($"earnings");
```
pivotColumn
he column to pivot.

Since
2.4.0
def pivot(pivotColumn: String, values: Seq[Any]): TryAnalysis[RelationalGroupedDataset]

Pivots a column of the current DataFrame and performs the specified aggregation.
Pivots a column of the current DataFrame and performs the specified aggregation. There are two versions of pivot function: one that requires the caller to specify the list of distinct values to pivot on, and one that does not. The latter is more concise but less efficient, because Spark needs to first compute the list of distinct values internally.
```
// Compute the sum of earnings for each year by course with each course as a separate column
df.groupBy("year").pivot("course", Seq("dotNET", "Java")).sum("earnings")

// Or without specifying column values (less efficient)
df.groupBy("year").pivot("course").sum("earnings")
```
pivotColumn
Name of the column to pivot.
values
List of values that will be translated to columns in the output DataFrame.

Since
1.6.0
def pivot(pivotColumn: String): TryAnalysis[RelationalGroupedDataset]

Pivots a column of the current DataFrame and performs the specified aggregation.
Pivots a column of the current DataFrame and performs the specified aggregation.
There are two versions of pivot function: one that requires the caller to specify the list of distinct values to pivot on, and one that does not. The latter is more concise but less efficient, because Spark needs to first compute the list of distinct values internally.
```
// Compute the sum of earnings for each year by course with each course as a separate column
df.groupBy("year").pivot("course", Seq("dotNET", "Java")).sum("earnings")

// Or without specifying column values (less efficient)
df.groupBy("year").pivot("course").sum("earnings")
```
pivotColumn
Name of the column to pivot.

Since
1.6.0
def sum(colNames: String*): TryAnalysis[DataFrame]

Compute the sum for each numeric columns for each group.
Compute the sum for each numeric columns for each group. The resulting DataFrame will also contain the grouping columns. When specified columns are given, only compute the sum for them.

Since
1.3.0
final def synchronized[T0](arg0: ⇒ T0): T0

Definition Classes
AnyRef
def transformation(f: (org.apache.spark.sql.RelationalGroupedDataset) ⇒ org.apache.spark.sql.RelationalGroupedDataset): RelationalGroupedDataset

Applies a transformation to the underlying RelationalGroupedDataset.
def transformationWithAnalysis(f: (org.apache.spark.sql.RelationalGroupedDataset) ⇒ org.apache.spark.sql.RelationalGroupedDataset): TryAnalysis[RelationalGroupedDataset]

Applies a transformation to the underlying RelationalGroupedDataset, it is used for transformations that can fail due to an AnalysisException.
val underlying: org.apache.spark.sql.RelationalGroupedDataset
def unpack[U](f: (org.apache.spark.sql.RelationalGroupedDataset) ⇒ org.apache.spark.sql.Dataset[U]): Dataset[U]

Unpack the underlying RelationalGroupedDataset into a DataFrame.
def unpackWithAnalysis[U](f: (org.apache.spark.sql.RelationalGroupedDataset) ⇒ org.apache.spark.sql.Dataset[U]): TryAnalysis[Dataset[U]]

Unpack the underlying RelationalGroupedDataset into a DataFrame, it is used for transformations that can fail due to an AnalysisException.
final def wait(): Unit

Definition Classes
AnyRef
Annotations
@throws( ... )
final def wait(arg0: Long, arg1: Int): Unit

Definition Classes
AnyRef
Annotations
@throws( ... )
final def wait(arg0: Long): Unit

Definition Classes
AnyRef
Annotations
@throws( ... )

Related Doc: package sql

final case class RelationalGroupedDataset(underlying: org.apache.spark.sql.RelationalGroupedDataset) extends Product with Serializable

Instance Constructors

new RelationalGroupedDataset(underlying: org.apache.spark.sql.RelationalGroupedDataset)

Value Members

final def !=(arg0: Any): Boolean

final def ##(): Int

final def ==(arg0: Any): Boolean

def agg(expr: Column, exprs: Column*): TryAnalysis[DataFrame]

def agg(exprs: Map[String, String]): TryAnalysis[DataFrame]

def agg(aggExpr: (String, String), aggExprs: (String, String)*): TryAnalysis[DataFrame]

final def asInstanceOf[T0]: T0

def avg(colNames: String*): TryAnalysis[DataFrame]

def clone(): AnyRef

def count: DataFrame

final def eq(arg0: AnyRef): Boolean

def finalize(): Unit

def get[U](f: (org.apache.spark.sql.RelationalGroupedDataset) ⇒ U): U

final def getClass(): Class[_]

def getWithAnalysis[U](f: (org.apache.spark.sql.RelationalGroupedDataset) ⇒ U): TryAnalysis[U]

final def isInstanceOf[T0]: Boolean

def max(colNames: String*): TryAnalysis[DataFrame]

def mean(colNames: String*): TryAnalysis[DataFrame]

def min(colNames: String*): TryAnalysis[DataFrame]

final def ne(arg0: AnyRef): Boolean

final def notify(): Unit

final def notifyAll(): Unit

def pivot(pivotColumn: Column, values: Seq[Any]): TryAnalysis[RelationalGroupedDataset]

def pivot(pivotColumn: Column): TryAnalysis[RelationalGroupedDataset]

def pivot(pivotColumn: String, values: Seq[Any]): TryAnalysis[RelationalGroupedDataset]

def pivot(pivotColumn: String): TryAnalysis[RelationalGroupedDataset]

def sum(colNames: String*): TryAnalysis[DataFrame]

final def synchronized[T0](arg0: ⇒ T0): T0

def transformation(f: (org.apache.spark.sql.RelationalGroupedDataset) ⇒ org.apache.spark.sql.RelationalGroupedDataset): RelationalGroupedDataset

def transformationWithAnalysis(f: (org.apache.spark.sql.RelationalGroupedDataset) ⇒ org.apache.spark.sql.RelationalGroupedDataset): TryAnalysis[RelationalGroupedDataset]

val underlying: org.apache.spark.sql.RelationalGroupedDataset

def unpack[U](f: (org.apache.spark.sql.RelationalGroupedDataset) ⇒ org.apache.spark.sql.Dataset[U]): Dataset[U]

def unpackWithAnalysis[U](f: (org.apache.spark.sql.RelationalGroupedDataset) ⇒ org.apache.spark.sql.Dataset[U]): TryAnalysis[Dataset[U]]

final def wait(): Unit

final def wait(arg0: Long, arg1: Int): Unit

final def wait(arg0: Long): Unit

Inherited from Serializable

Inherited from Serializable

Inherited from Product

Inherited from Equals

Inherited from AnyRef

Inherited from Any

Ungrouped