class Dataset[T] extends sql.Dataset[T]
A Dataset is a strongly typed collection of domain-specific objects that can be transformed
in parallel using functional or relational operations. Each Dataset also has an untyped view
called a DataFrame
, which is a Dataset of Row.
Operations available on Datasets are divided into transformations and actions. Transformations
are the ones that produce new Datasets, and actions are the ones that trigger computation and
return results. Example transformations include map, filter, select, and aggregate (groupBy
).
Example actions count, show, or writing data out to file systems.
Datasets are "lazy", i.e. computations are only triggered when an action is invoked. Internally,
a Dataset represents a logical plan that describes the computation required to produce the data.
When an action is invoked, Spark's query optimizer optimizes the logical plan and generates a
physical plan for efficient execution in a parallel and distributed manner. To explore the
logical plan as well as optimized physical plan, use the explain
function.
To efficiently support domain-specific objects, an Encoder is required. The encoder maps
the domain specific type T
to Spark's internal type system. For example, given a class Person
with two fields, name
(string) and age
(int), an encoder is used to tell Spark to generate
code at runtime to serialize the Person
object into a binary structure. This binary structure
often has much lower memory footprint as well as are optimized for efficiency in data processing
(e.g. in a columnar format). To understand the internal binary representation for data, use the
schema
function.
There are typically two ways to create a Dataset. The most common way is by pointing Spark
to some files on storage systems, using the read
function available on a SparkSession
.
val people = spark.read.parquet("...").as[Person] // Scala Dataset<Person> people = spark.read().parquet("...").as(Encoders.bean(Person.class)); // Java
Datasets can also be created through transformations available on existing Datasets. For example, the following creates a new Dataset by applying a filter on the existing one:
val names = people.map(_.name) // in Scala; names is a Dataset[String] Dataset<String> names = people.map( (MapFunction<Person, String>) p -> p.name, Encoders.STRING()); // Java
Dataset operations can also be untyped, through various domain-specific-language (DSL) functions defined in: Dataset (this class), Column, and functions. These operations are very similar to the operations available in the data frame abstraction in R or Python.
To select a column from the Dataset, use apply
method in Scala and col
in Java.
val ageCol = people("age") // in Scala Column ageCol = people.col("age"); // in Java
Note that the Column type can also be manipulated through its various functions.
// The following creates a new column that increases everybody's age by 10. people("age") + 10 // in Scala people.col("age").plus(10); // in Java
A more concrete example in Scala:
// To create Dataset[Row] using SparkSession val people = spark.read.parquet("...") val department = spark.read.parquet("...") people.filter("age > 30") .join(department, people("deptId") === department("id")) .groupBy(department("name"), people("gender")) .agg(avg(people("salary")), max(people("age")))
and in Java:
// To create Dataset<Row> using SparkSession Dataset<Row> people = spark.read().parquet("..."); Dataset<Row> department = spark.read().parquet("..."); people.filter(people.col("age").gt(30)) .join(department, people.col("deptId").equalTo(department.col("id"))) .groupBy(department.col("name"), people.col("gender")) .agg(avg(people.col("salary")), max(people.col("age")));
- Annotations
- @Stable()
- Since
1.6.0
- Alphabetic
- By Inheritance
- Dataset
- Dataset
- Serializable
- AnyRef
- Any
- Hide All
- Show All
- Public
- Protected
Instance Constructors
- new Dataset(sqlContext: sql.SQLContext, logicalPlan: LogicalPlan, encoder: Encoder[T])
- new Dataset(sparkSession: SparkSession, logicalPlan: LogicalPlan, encoder: Encoder[T])
- new Dataset(sparkSession: SparkSession, logicalPlan: LogicalPlan, encoderGenerator: () => Encoder[T])
Value Members
- final def !=(arg0: Any): Boolean
- Definition Classes
- AnyRef → Any
- final def ##: Int
- Definition Classes
- AnyRef → Any
- final def ==(arg0: Any): Boolean
- Definition Classes
- AnyRef → Any
- def agg(expr: Column, exprs: Column*): DataFrame
<invalid inheritdoc annotation>
<invalid inheritdoc annotation>
- Definition Classes
- Dataset → Dataset
- Annotations
- @varargs()
- def agg(exprs: Map[String, String]): DataFrame
<invalid inheritdoc annotation>
<invalid inheritdoc annotation>
- Definition Classes
- Dataset → Dataset
- def agg(exprs: Map[String, String]): DataFrame
<invalid inheritdoc annotation>
<invalid inheritdoc annotation>
- Definition Classes
- Dataset → Dataset
- def agg(aggExpr: (String, String), aggExprs: (String, String)*): DataFrame
<invalid inheritdoc annotation>
<invalid inheritdoc annotation>
- Definition Classes
- Dataset → Dataset
- def alias(alias: Symbol): Dataset[T]
<invalid inheritdoc annotation>
<invalid inheritdoc annotation>
- Definition Classes
- Dataset → Dataset
- def alias(alias: String): Dataset[T]
<invalid inheritdoc annotation>
<invalid inheritdoc annotation>
- Definition Classes
- Dataset → Dataset
- def apply(colName: String): Column
- Definition Classes
- Dataset
- def as(alias: Symbol): Dataset[T]
<invalid inheritdoc annotation>
<invalid inheritdoc annotation>
- Definition Classes
- Dataset → Dataset
- def as(alias: String): Dataset[T]
<invalid inheritdoc annotation>
<invalid inheritdoc annotation>
- Definition Classes
- Dataset → Dataset
- def as[U](implicit arg0: Encoder[U]): Dataset[U]
<invalid inheritdoc annotation>
<invalid inheritdoc annotation>
- Definition Classes
- Dataset → Dataset
- final def asInstanceOf[T0]: T0
- Definition Classes
- Any
- def asTable(): TableArg
Converts the DataFrame into a
TableArg
object, which can be used as a table argument in a user-defined table function (UDTF).Converts the DataFrame into a
TableArg
object, which can be used as a table argument in a user-defined table function (UDTF).After obtaining a
TableArg
from a DataFrame using this method, you can specify partitioning and ordering for the table argument by calling methods such aspartitionBy
,orderBy
, andwithSinglePartition
on theTableArg
instance.- partitionBy(*cols): Partitions the data based on the specified columns. This method cannot be called after withSinglePartition() has been called.
- orderBy(*cols): Orders the data within partitions based on the specified columns.
- withSinglePartition(): Indicates that the data should be treated as a single partition. This method cannot be called after partitionBy() has been called.
- Since
4.0.0
- def cache(): Dataset.this.type
<invalid inheritdoc annotation>
<invalid inheritdoc annotation>
- Definition Classes
- Dataset → Dataset
- def checkpoint(eager: Boolean): Dataset[T]
<invalid inheritdoc annotation>
<invalid inheritdoc annotation>
- Definition Classes
- Dataset → Dataset
- def checkpoint(): Dataset[T]
<invalid inheritdoc annotation>
<invalid inheritdoc annotation>
- Definition Classes
- Dataset → Dataset
- def checkpoint(eager: Boolean, reliableCheckpoint: Boolean, storageLevel: Option[StorageLevel]): Dataset[T]
<invalid inheritdoc annotation>
- def clone(): AnyRef
- Attributes
- protected[lang]
- Definition Classes
- AnyRef
- Annotations
- @throws(classOf[java.lang.CloneNotSupportedException]) @IntrinsicCandidate() @native()
- def coalesce(numPartitions: Int): Dataset[T]
<invalid inheritdoc annotation>
<invalid inheritdoc annotation>
- Definition Classes
- Dataset → Dataset
- def col(colName: String): Column
<invalid inheritdoc annotation>
<invalid inheritdoc annotation>
- Definition Classes
- Dataset → Dataset
- def colRegex(colName: String): Column
<invalid inheritdoc annotation>
<invalid inheritdoc annotation>
- Definition Classes
- Dataset → Dataset
- def collect(): Array[T]
<invalid inheritdoc annotation>
<invalid inheritdoc annotation>
- Definition Classes
- Dataset → Dataset
- def collectAsList(): List[T]
<invalid inheritdoc annotation>
<invalid inheritdoc annotation>
- Definition Classes
- Dataset → Dataset
- def columns: Array[String]
- Definition Classes
- Dataset
- def count(): Long
<invalid inheritdoc annotation>
<invalid inheritdoc annotation>
- Definition Classes
- Dataset → Dataset
- def createGlobalTempView(viewName: String): Unit
- Definition Classes
- Dataset
- Annotations
- @throws(scala.this.throws.<init>$default$1[org.apache.spark.sql.AnalysisException])
- def createOrReplaceGlobalTempView(viewName: String): Unit
- Definition Classes
- Dataset
- def createOrReplaceTempView(viewName: String): Unit
- Definition Classes
- Dataset
- def createTempView(viewName: String, replace: Boolean, global: Boolean): Unit
- Attributes
- protected
- Definition Classes
- Dataset → Dataset
- def createTempView(viewName: String): Unit
- Definition Classes
- Dataset
- Annotations
- @throws(scala.this.throws.<init>$default$1[org.apache.spark.sql.AnalysisException])
- def crossJoin(right: sql.Dataset[_]): DataFrame
<invalid inheritdoc annotation>
<invalid inheritdoc annotation>
- Definition Classes
- Dataset → Dataset
- def cube(col1: String, cols: String*): RelationalGroupedDataset
<invalid inheritdoc annotation>
<invalid inheritdoc annotation>
- Definition Classes
- Dataset → Dataset
- Annotations
- @varargs()
- def cube(cols: Column*): RelationalGroupedDataset
<invalid inheritdoc annotation>
<invalid inheritdoc annotation>
- Definition Classes
- Dataset → Dataset
- Annotations
- @varargs()
- def describe(cols: String*): DataFrame
<invalid inheritdoc annotation>
<invalid inheritdoc annotation>
- Definition Classes
- Dataset → Dataset
- Annotations
- @varargs()
- def distinct(): Dataset[T]
<invalid inheritdoc annotation>
<invalid inheritdoc annotation>
- Definition Classes
- Dataset → Dataset
- def drop(col: Column): DataFrame
<invalid inheritdoc annotation>
<invalid inheritdoc annotation>
- Definition Classes
- Dataset → Dataset
- def drop(colName: String): DataFrame
<invalid inheritdoc annotation>
<invalid inheritdoc annotation>
- Definition Classes
- Dataset → Dataset
- def drop(col: Column, cols: Column*): DataFrame
<invalid inheritdoc annotation>
<invalid inheritdoc annotation>
- Definition Classes
- Dataset → Dataset
- Annotations
- @varargs()
- def drop(colNames: String*): DataFrame
<invalid inheritdoc annotation>
<invalid inheritdoc annotation>
- Definition Classes
- Dataset → Dataset
- Annotations
- @varargs()
- def dropDuplicates(col1: String, cols: String*): Dataset[T]
<invalid inheritdoc annotation>
<invalid inheritdoc annotation>
- Definition Classes
- Dataset → Dataset
- Annotations
- @varargs()
- def dropDuplicates(colNames: Array[String]): Dataset[T]
<invalid inheritdoc annotation>
<invalid inheritdoc annotation>
- Definition Classes
- Dataset → Dataset
- def dropDuplicates(colNames: Seq[String]): Dataset[T]
<invalid inheritdoc annotation>
<invalid inheritdoc annotation>
- Definition Classes
- Dataset → Dataset
- def dropDuplicates(): Dataset[T]
<invalid inheritdoc annotation>
<invalid inheritdoc annotation>
- Definition Classes
- Dataset → Dataset
- def dropDuplicatesWithinWatermark(col1: String, cols: String*): Dataset[T]
<invalid inheritdoc annotation>
<invalid inheritdoc annotation>
- Definition Classes
- Dataset → Dataset
- Annotations
- @varargs()
- def dropDuplicatesWithinWatermark(colNames: Array[String]): Dataset[T]
<invalid inheritdoc annotation>
<invalid inheritdoc annotation>
- Definition Classes
- Dataset → Dataset
- def dropDuplicatesWithinWatermark(colNames: Seq[String]): Dataset[T]
<invalid inheritdoc annotation>
<invalid inheritdoc annotation>
- Definition Classes
- Dataset → Dataset
- def dropDuplicatesWithinWatermark(): Dataset[T]
<invalid inheritdoc annotation>
<invalid inheritdoc annotation>
- Definition Classes
- Dataset → Dataset
- def dtypes: Array[(String, String)]
- Definition Classes
- Dataset
- lazy val encoder: Encoder[T]
- Definition Classes
- Dataset → Dataset
- Annotations
- @DeveloperApi() @Unstable() @transient()
- final def eq(arg0: AnyRef): Boolean
- Definition Classes
- AnyRef
- def equals(arg0: AnyRef): Boolean
- Definition Classes
- AnyRef → Any
- def except(other: sql.Dataset[T]): Dataset[T]
<invalid inheritdoc annotation>
<invalid inheritdoc annotation>
- Definition Classes
- Dataset → Dataset
- def exceptAll(other: sql.Dataset[T]): Dataset[T]
<invalid inheritdoc annotation>
<invalid inheritdoc annotation>
- Definition Classes
- Dataset → Dataset
- def exists(): Column
<invalid inheritdoc annotation>
<invalid inheritdoc annotation>
- Definition Classes
- Dataset → Dataset
- def explain(mode: String): Unit
<invalid inheritdoc annotation>
<invalid inheritdoc annotation>
- Definition Classes
- Dataset → Dataset
- def explain(): Unit
- Definition Classes
- Dataset
- def explain(extended: Boolean): Unit
- Definition Classes
- Dataset
- def filter(conditionExpr: String): Dataset[T]
<invalid inheritdoc annotation>
<invalid inheritdoc annotation>
- Definition Classes
- Dataset → Dataset
- def filter(func: FilterFunction[T]): Dataset[T]
<invalid inheritdoc annotation>
<invalid inheritdoc annotation>
- Definition Classes
- Dataset → Dataset
- def filter(func: (T) => Boolean): Dataset[T]
<invalid inheritdoc annotation>
<invalid inheritdoc annotation>
- Definition Classes
- Dataset → Dataset
- def filter(condition: Column): Dataset[T]
<invalid inheritdoc annotation>
<invalid inheritdoc annotation>
- Definition Classes
- Dataset → Dataset
- def first(): T
- Definition Classes
- Dataset
- def flatMap[U](f: FlatMapFunction[T, U], encoder: Encoder[U]): Dataset[U]
<invalid inheritdoc annotation>
<invalid inheritdoc annotation>
- Definition Classes
- Dataset → Dataset
- def flatMap[U](func: (T) => IterableOnce[U])(implicit arg0: Encoder[U]): Dataset[U]
<invalid inheritdoc annotation>
<invalid inheritdoc annotation>
- Definition Classes
- Dataset → Dataset
- def foreach(func: ForeachFunction[T]): Unit
- Definition Classes
- Dataset
- def foreach(f: (T) => Unit): Unit
- Definition Classes
- Dataset
- def foreachPartition(func: ForeachPartitionFunction[T]): Unit
<invalid inheritdoc annotation>
<invalid inheritdoc annotation>
- Definition Classes
- Dataset → Dataset
- def foreachPartition(f: (Iterator[T]) => Unit): Unit
<invalid inheritdoc annotation>
<invalid inheritdoc annotation>
- Definition Classes
- Dataset → Dataset
- final def getClass(): Class[_ <: AnyRef]
- Definition Classes
- AnyRef → Any
- Annotations
- @IntrinsicCandidate() @native()
- def groupBy(col1: String, cols: String*): RelationalGroupedDataset
<invalid inheritdoc annotation>
<invalid inheritdoc annotation>
- Definition Classes
- Dataset → Dataset
- Annotations
- @varargs()
- def groupBy(cols: Column*): RelationalGroupedDataset
<invalid inheritdoc annotation>
<invalid inheritdoc annotation>
- Definition Classes
- Dataset → Dataset
- Annotations
- @varargs()
- def groupByKey[K](func: MapFunction[T, K], encoder: Encoder[K]): KeyValueGroupedDataset[K, T]
<invalid inheritdoc annotation>
<invalid inheritdoc annotation>
- Definition Classes
- Dataset → Dataset
- def groupByKey[K](func: (T) => K)(implicit arg0: Encoder[K]): KeyValueGroupedDataset[K, T]
<invalid inheritdoc annotation>
<invalid inheritdoc annotation>
- Definition Classes
- Dataset → Dataset
- def groupingSets(groupingSets: Seq[Seq[Column]], cols: Column*): RelationalGroupedDataset
<invalid inheritdoc annotation>
<invalid inheritdoc annotation>
- Definition Classes
- Dataset → Dataset
- Annotations
- @varargs()
- def hashCode(): Int
- Definition Classes
- AnyRef → Any
- Annotations
- @IntrinsicCandidate() @native()
- def head(n: Int): Array[T]
<invalid inheritdoc annotation>
<invalid inheritdoc annotation>
- Definition Classes
- Dataset → Dataset
- def head(): T
- Definition Classes
- Dataset
- def hint(name: String, parameters: Any*): Dataset[T]
<invalid inheritdoc annotation>
<invalid inheritdoc annotation>
- Definition Classes
- Dataset → Dataset
- Annotations
- @varargs()
- def inputFiles: Array[String]
<invalid inheritdoc annotation>
<invalid inheritdoc annotation>
- Definition Classes
- Dataset → Dataset
- def intersect(other: sql.Dataset[T]): Dataset[T]
<invalid inheritdoc annotation>
<invalid inheritdoc annotation>
- Definition Classes
- Dataset → Dataset
- def intersectAll(other: sql.Dataset[T]): Dataset[T]
<invalid inheritdoc annotation>
<invalid inheritdoc annotation>
- Definition Classes
- Dataset → Dataset
- def isEmpty: Boolean
<invalid inheritdoc annotation>
<invalid inheritdoc annotation>
- Definition Classes
- Dataset → Dataset
- final def isInstanceOf[T0]: Boolean
- Definition Classes
- Any
- def isLocal: Boolean
<invalid inheritdoc annotation>
<invalid inheritdoc annotation>
- Definition Classes
- Dataset → Dataset
- def isStreaming: Boolean
<invalid inheritdoc annotation>
<invalid inheritdoc annotation>
- Definition Classes
- Dataset → Dataset
- def javaRDD: JavaRDD[T]
- Definition Classes
- Dataset
- Annotations
- @ClassicOnly()
- def join(right: sql.Dataset[_], joinExprs: Column): DataFrame
<invalid inheritdoc annotation>
<invalid inheritdoc annotation>
- Definition Classes
- Dataset → Dataset
- def join(right: sql.Dataset[_], usingColumns: Array[String], joinType: String): DataFrame
<invalid inheritdoc annotation>
<invalid inheritdoc annotation>
- Definition Classes
- Dataset → Dataset
- def join(right: sql.Dataset[_], usingColumn: String, joinType: String): DataFrame
<invalid inheritdoc annotation>
<invalid inheritdoc annotation>
- Definition Classes
- Dataset → Dataset
- def join(right: sql.Dataset[_], usingColumns: Seq[String]): DataFrame
<invalid inheritdoc annotation>
<invalid inheritdoc annotation>
- Definition Classes
- Dataset → Dataset
- def join(right: sql.Dataset[_], usingColumns: Array[String]): DataFrame
<invalid inheritdoc annotation>
<invalid inheritdoc annotation>
- Definition Classes
- Dataset → Dataset
- def join(right: sql.Dataset[_], usingColumn: String): DataFrame
<invalid inheritdoc annotation>
<invalid inheritdoc annotation>
- Definition Classes
- Dataset → Dataset
- def join(right: sql.Dataset[_], joinExprs: Column, joinType: String): DataFrame
<invalid inheritdoc annotation>
<invalid inheritdoc annotation>
- Definition Classes
- Dataset → Dataset
- def join(right: sql.Dataset[_], usingColumns: Seq[String], joinType: String): DataFrame
<invalid inheritdoc annotation>
<invalid inheritdoc annotation>
- Definition Classes
- Dataset → Dataset
- def join(right: sql.Dataset[_]): DataFrame
<invalid inheritdoc annotation>
<invalid inheritdoc annotation>
- Definition Classes
- Dataset → Dataset
- def joinWith[U](other: sql.Dataset[U], condition: Column): Dataset[(T, U)]
<invalid inheritdoc annotation>
<invalid inheritdoc annotation>
- Definition Classes
- Dataset → Dataset
- def joinWith[U](other: sql.Dataset[U], condition: Column, joinType: String): Dataset[(T, U)]
<invalid inheritdoc annotation>
<invalid inheritdoc annotation>
- Definition Classes
- Dataset → Dataset
- def lateralJoin(right: sql.Dataset[_], joinExprs: Column, joinType: String): DataFrame
<invalid inheritdoc annotation>
<invalid inheritdoc annotation>
- Definition Classes
- Dataset → Dataset
- def lateralJoin(right: sql.Dataset[_], joinType: String): DataFrame
<invalid inheritdoc annotation>
<invalid inheritdoc annotation>
- Definition Classes
- Dataset → Dataset
- def lateralJoin(right: sql.Dataset[_], joinExprs: Column): DataFrame
<invalid inheritdoc annotation>
<invalid inheritdoc annotation>
- Definition Classes
- Dataset → Dataset
- def lateralJoin(right: sql.Dataset[_]): DataFrame
<invalid inheritdoc annotation>
<invalid inheritdoc annotation>
- Definition Classes
- Dataset → Dataset
- def limit(n: Int): Dataset[T]
<invalid inheritdoc annotation>
<invalid inheritdoc annotation>
- Definition Classes
- Dataset → Dataset
- def localCheckpoint(eager: Boolean, storageLevel: StorageLevel): Dataset[T]
<invalid inheritdoc annotation>
<invalid inheritdoc annotation>
- Definition Classes
- Dataset → Dataset
- def localCheckpoint(eager: Boolean): Dataset[T]
<invalid inheritdoc annotation>
<invalid inheritdoc annotation>
- Definition Classes
- Dataset → Dataset
- def localCheckpoint(): Dataset[T]
<invalid inheritdoc annotation>
<invalid inheritdoc annotation>
- Definition Classes
- Dataset → Dataset
- def map[U](func: MapFunction[T, U], encoder: Encoder[U]): Dataset[U]
<invalid inheritdoc annotation>
<invalid inheritdoc annotation>
- Definition Classes
- Dataset → Dataset
- def map[U](func: (T) => U)(implicit arg0: Encoder[U]): Dataset[U]
<invalid inheritdoc annotation>
<invalid inheritdoc annotation>
- Definition Classes
- Dataset → Dataset
- def mapPartitions[U](f: MapPartitionsFunction[T, U], encoder: Encoder[U]): Dataset[U]
<invalid inheritdoc annotation>
<invalid inheritdoc annotation>
- Definition Classes
- Dataset → Dataset
- def mapPartitions[U](func: (Iterator[T]) => Iterator[U])(implicit arg0: Encoder[U]): Dataset[U]
<invalid inheritdoc annotation>
<invalid inheritdoc annotation>
- Definition Classes
- Dataset → Dataset
- def melt(ids: Array[Column], variableColumnName: String, valueColumnName: String): DataFrame
<invalid inheritdoc annotation>
<invalid inheritdoc annotation>
- Definition Classes
- Dataset → Dataset
- def melt(ids: Array[Column], values: Array[Column], variableColumnName: String, valueColumnName: String): DataFrame
- Definition Classes
- Dataset → Dataset
- def mergeInto(table: String, condition: Column): MergeIntoWriter[T]
<invalid inheritdoc annotation>
<invalid inheritdoc annotation>
- Definition Classes
- Dataset → Dataset
- def metadataColumn(colName: String): Column
<invalid inheritdoc annotation>
<invalid inheritdoc annotation>
- Definition Classes
- Dataset → Dataset
- def na: DataFrameNaFunctions
<invalid inheritdoc annotation>
<invalid inheritdoc annotation>
- Definition Classes
- Dataset → Dataset
- final def ne(arg0: AnyRef): Boolean
- Definition Classes
- AnyRef
- final def notify(): Unit
- Definition Classes
- AnyRef
- Annotations
- @IntrinsicCandidate() @native()
- final def notifyAll(): Unit
- Definition Classes
- AnyRef
- Annotations
- @IntrinsicCandidate() @native()
- def observe(observation: Observation, expr: Column, exprs: Column*): Dataset[T]
<invalid inheritdoc annotation>
<invalid inheritdoc annotation>
- Definition Classes
- Dataset → Dataset
- Annotations
- @varargs()
- def observe(name: String, expr: Column, exprs: Column*): Dataset[T]
<invalid inheritdoc annotation>
<invalid inheritdoc annotation>
- Definition Classes
- Dataset → Dataset
- Annotations
- @varargs()
- def offset(n: Int): Dataset[T]
<invalid inheritdoc annotation>
<invalid inheritdoc annotation>
- Definition Classes
- Dataset → Dataset
- def orderBy(sortExprs: Column*): Dataset[T]
<invalid inheritdoc annotation>
<invalid inheritdoc annotation>
- Definition Classes
- Dataset → Dataset
- Annotations
- @varargs()
- def orderBy(sortCol: String, sortCols: String*): Dataset[T]
<invalid inheritdoc annotation>
<invalid inheritdoc annotation>
- Definition Classes
- Dataset → Dataset
- Annotations
- @varargs()
- def persist(newLevel: StorageLevel): Dataset.this.type
<invalid inheritdoc annotation>
<invalid inheritdoc annotation>
- Definition Classes
- Dataset → Dataset
- def persist(): Dataset.this.type
<invalid inheritdoc annotation>
<invalid inheritdoc annotation>
- Definition Classes
- Dataset → Dataset
- def printSchema(level: Int): Unit
- Definition Classes
- Dataset
- def printSchema(): Unit
- Definition Classes
- Dataset
- val queryExecution: QueryExecution
- Definition Classes
- Dataset → Dataset
- def randomSplit(weights: Array[Double]): Array[sql.Dataset[T]]
<invalid inheritdoc annotation>
<invalid inheritdoc annotation>
- Definition Classes
- Dataset → Dataset
- def randomSplit(weights: Array[Double], seed: Long): Array[sql.Dataset[T]]
<invalid inheritdoc annotation>
<invalid inheritdoc annotation>
- Definition Classes
- Dataset → Dataset
- def randomSplitAsList(weights: Array[Double], seed: Long): List[sql.Dataset[T]]
<invalid inheritdoc annotation>
<invalid inheritdoc annotation>
- Definition Classes
- Dataset → Dataset
- lazy val rdd: RDD[T]
<invalid inheritdoc annotation>
<invalid inheritdoc annotation>
- Definition Classes
- Dataset → Dataset
- def reduce(func: (T, T) => T): T
<invalid inheritdoc annotation>
<invalid inheritdoc annotation>
- Definition Classes
- Dataset → Dataset
- def reduce(func: ReduceFunction[T]): T
- Definition Classes
- Dataset
- def repartition(partitionExprs: Column*): Dataset[T]
<invalid inheritdoc annotation>
<invalid inheritdoc annotation>
- Definition Classes
- Dataset → Dataset
- Annotations
- @varargs()
- def repartition(numPartitions: Int, partitionExprs: Column*): Dataset[T]
<invalid inheritdoc annotation>
<invalid inheritdoc annotation>
- Definition Classes
- Dataset → Dataset
- Annotations
- @varargs()
- def repartition(numPartitions: Int): Dataset[T]
<invalid inheritdoc annotation>
<invalid inheritdoc annotation>
- Definition Classes
- Dataset → Dataset
- def repartitionByExpression(numPartitions: Option[Int], partitionExprs: Seq[Column]): Dataset[T]
- Attributes
- protected
- Definition Classes
- Dataset → Dataset
- def repartitionByRange(partitionExprs: Column*): Dataset[T]
<invalid inheritdoc annotation>
<invalid inheritdoc annotation>
- Definition Classes
- Dataset → Dataset
- Annotations
- @varargs()
- def repartitionByRange(numPartitions: Int, partitionExprs: Column*): Dataset[T]
<invalid inheritdoc annotation>
<invalid inheritdoc annotation>
- Definition Classes
- Dataset → Dataset
- Annotations
- @varargs()
- def repartitionByRange(numPartitions: Option[Int], partitionExprs: Seq[Column]): Dataset[T]
- Attributes
- protected
- Definition Classes
- Dataset → Dataset
- def rollup(col1: String, cols: String*): RelationalGroupedDataset
<invalid inheritdoc annotation>
<invalid inheritdoc annotation>
- Definition Classes
- Dataset → Dataset
- Annotations
- @varargs()
- def rollup(cols: Column*): RelationalGroupedDataset
<invalid inheritdoc annotation>
<invalid inheritdoc annotation>
- Definition Classes
- Dataset → Dataset
- Annotations
- @varargs()
- def sameSemantics(other: sql.Dataset[T]): Boolean
<invalid inheritdoc annotation>
<invalid inheritdoc annotation>
- Definition Classes
- Dataset → Dataset
- Annotations
- @DeveloperApi()
- def sample(withReplacement: Boolean, fraction: Double): Dataset[T]
<invalid inheritdoc annotation>
<invalid inheritdoc annotation>
- Definition Classes
- Dataset → Dataset
- def sample(fraction: Double): Dataset[T]
<invalid inheritdoc annotation>
<invalid inheritdoc annotation>
- Definition Classes
- Dataset → Dataset
- def sample(fraction: Double, seed: Long): Dataset[T]
<invalid inheritdoc annotation>
<invalid inheritdoc annotation>
- Definition Classes
- Dataset → Dataset
- def sample(withReplacement: Boolean, fraction: Double, seed: Long): Dataset[T]
<invalid inheritdoc annotation>
<invalid inheritdoc annotation>
- Definition Classes
- Dataset → Dataset
- def scalar(): Column
<invalid inheritdoc annotation>
<invalid inheritdoc annotation>
- Definition Classes
- Dataset → Dataset
- def schema: StructType
<invalid inheritdoc annotation>
<invalid inheritdoc annotation>
- Definition Classes
- Dataset → Dataset
- def select[U1, U2, U3, U4, U5](c1: TypedColumn[T, U1], c2: TypedColumn[T, U2], c3: TypedColumn[T, U3], c4: TypedColumn[T, U4], c5: TypedColumn[T, U5]): Dataset[(U1, U2, U3, U4, U5)]
<invalid inheritdoc annotation>
<invalid inheritdoc annotation>
- Definition Classes
- Dataset → Dataset
- def select[U1, U2, U3, U4](c1: TypedColumn[T, U1], c2: TypedColumn[T, U2], c3: TypedColumn[T, U3], c4: TypedColumn[T, U4]): Dataset[(U1, U2, U3, U4)]
<invalid inheritdoc annotation>
<invalid inheritdoc annotation>
- Definition Classes
- Dataset → Dataset
- def select[U1, U2, U3](c1: TypedColumn[T, U1], c2: TypedColumn[T, U2], c3: TypedColumn[T, U3]): Dataset[(U1, U2, U3)]
<invalid inheritdoc annotation>
<invalid inheritdoc annotation>
- Definition Classes
- Dataset → Dataset
- def select[U1, U2](c1: TypedColumn[T, U1], c2: TypedColumn[T, U2]): Dataset[(U1, U2)]
<invalid inheritdoc annotation>
<invalid inheritdoc annotation>
- Definition Classes
- Dataset → Dataset
- def select(col: String, cols: String*): DataFrame
<invalid inheritdoc annotation>
<invalid inheritdoc annotation>
- Definition Classes
- Dataset → Dataset
- Annotations
- @varargs()
- def select[U1](c1: TypedColumn[T, U1]): Dataset[U1]
<invalid inheritdoc annotation>
<invalid inheritdoc annotation>
- Definition Classes
- Dataset → Dataset
- def select(cols: Column*): DataFrame
<invalid inheritdoc annotation>
<invalid inheritdoc annotation>
- Definition Classes
- Dataset → Dataset
- Annotations
- @varargs()
- def selectExpr(exprs: String*): DataFrame
<invalid inheritdoc annotation>
<invalid inheritdoc annotation>
- Definition Classes
- Dataset → Dataset
- Annotations
- @varargs()
- def selectUntyped(columns: TypedColumn[_, _]*): Dataset[_]
<invalid inheritdoc annotation>
<invalid inheritdoc annotation>
- Attributes
- protected
- Definition Classes
- Dataset → Dataset
- def semanticHash(): Int
<invalid inheritdoc annotation>
<invalid inheritdoc annotation>
- Definition Classes
- Dataset → Dataset
- Annotations
- @DeveloperApi()
- def show(numRows: Int, truncate: Int, vertical: Boolean): Unit
<invalid inheritdoc annotation>
<invalid inheritdoc annotation>
- Definition Classes
- Dataset → Dataset
- def show(numRows: Int, truncate: Boolean): Unit
<invalid inheritdoc annotation>
<invalid inheritdoc annotation>
- Definition Classes
- Dataset → Dataset
- def show(numRows: Int, truncate: Int): Unit
- Definition Classes
- Dataset
- def show(truncate: Boolean): Unit
- Definition Classes
- Dataset
- def show(): Unit
- Definition Classes
- Dataset
- def show(numRows: Int): Unit
- Definition Classes
- Dataset
- def sort(sortExprs: Column*): Dataset[T]
<invalid inheritdoc annotation>
<invalid inheritdoc annotation>
- Definition Classes
- Dataset → Dataset
- Annotations
- @varargs()
- def sort(sortCol: String, sortCols: String*): Dataset[T]
<invalid inheritdoc annotation>
<invalid inheritdoc annotation>
- Definition Classes
- Dataset → Dataset
- Annotations
- @varargs()
- def sortInternal(global: Boolean, sortExprs: Seq[Column]): Dataset[T]
- Attributes
- protected
- Definition Classes
- Dataset → Dataset
- def sortWithinPartitions(sortExprs: Column*): Dataset[T]
<invalid inheritdoc annotation>
<invalid inheritdoc annotation>
- Definition Classes
- Dataset → Dataset
- Annotations
- @varargs()
- def sortWithinPartitions(sortCol: String, sortCols: String*): Dataset[T]
<invalid inheritdoc annotation>
<invalid inheritdoc annotation>
- Definition Classes
- Dataset → Dataset
- Annotations
- @varargs()
- lazy val sparkSession: SparkSession
- Definition Classes
- Dataset → Dataset
- Annotations
- @transient()
- lazy val sqlContext: sql.SQLContext
- Annotations
- @transient()
- def stat: DataFrameStatFunctions
<invalid inheritdoc annotation>
<invalid inheritdoc annotation>
- Definition Classes
- Dataset → Dataset
- def storageLevel: StorageLevel
<invalid inheritdoc annotation>
<invalid inheritdoc annotation>
- Definition Classes
- Dataset → Dataset
- def summary(statistics: String*): DataFrame
<invalid inheritdoc annotation>
<invalid inheritdoc annotation>
- Definition Classes
- Dataset → Dataset
- Annotations
- @varargs()
- final def synchronized[T0](arg0: => T0): T0
- Definition Classes
- AnyRef
- def tail(n: Int): Array[T]
<invalid inheritdoc annotation>
<invalid inheritdoc annotation>
- Definition Classes
- Dataset → Dataset
- def take(n: Int): Array[T]
- Definition Classes
- Dataset
- def takeAsList(n: Int): List[T]
- Definition Classes
- Dataset
- def to(schema: StructType): DataFrame
<invalid inheritdoc annotation>
<invalid inheritdoc annotation>
- Definition Classes
- Dataset → Dataset
- def toDF(colNames: String*): DataFrame
<invalid inheritdoc annotation>
<invalid inheritdoc annotation>
- Definition Classes
- Dataset → Dataset
- Annotations
- @varargs()
- def toDF(): DataFrame
<invalid inheritdoc annotation>
<invalid inheritdoc annotation>
- Definition Classes
- Dataset → Dataset
- def toJSON: Dataset[String]
<invalid inheritdoc annotation>
<invalid inheritdoc annotation>
- Definition Classes
- Dataset → Dataset
- def toJavaRDD: JavaRDD[T]
<invalid inheritdoc annotation>
<invalid inheritdoc annotation>
- Definition Classes
- Dataset → Dataset
- def toLocalIterator(): Iterator[T]
<invalid inheritdoc annotation>
<invalid inheritdoc annotation>
- Definition Classes
- Dataset → Dataset
- def toString(): String
- Definition Classes
- Dataset → AnyRef → Any
- def transform[U, DSO[_] <: sql.Dataset[_]](t: (Dataset.this.type) => DSO[U]): DSO[U]
- Definition Classes
- Dataset
- def transpose(): DataFrame
<invalid inheritdoc annotation>
<invalid inheritdoc annotation>
- Definition Classes
- Dataset → Dataset
- def transpose(indexColumn: Column): DataFrame
<invalid inheritdoc annotation>
<invalid inheritdoc annotation>
- Definition Classes
- Dataset → Dataset
- def union(other: sql.Dataset[T]): Dataset[T]
<invalid inheritdoc annotation>
<invalid inheritdoc annotation>
- Definition Classes
- Dataset → Dataset
- def unionAll(other: sql.Dataset[T]): Dataset[T]
<invalid inheritdoc annotation>
<invalid inheritdoc annotation>
- Definition Classes
- Dataset → Dataset
- def unionByName(other: sql.Dataset[T]): Dataset[T]
<invalid inheritdoc annotation>
<invalid inheritdoc annotation>
- Definition Classes
- Dataset → Dataset
- def unionByName(other: sql.Dataset[T], allowMissingColumns: Boolean): Dataset[T]
<invalid inheritdoc annotation>
<invalid inheritdoc annotation>
- Definition Classes
- Dataset → Dataset
- def unpersist(): Dataset.this.type
<invalid inheritdoc annotation>
<invalid inheritdoc annotation>
- Definition Classes
- Dataset → Dataset
- def unpersist(blocking: Boolean): Dataset.this.type
<invalid inheritdoc annotation>
<invalid inheritdoc annotation>
- Definition Classes
- Dataset → Dataset
- def unpivot(ids: Array[Column], variableColumnName: String, valueColumnName: String): DataFrame
<invalid inheritdoc annotation>
<invalid inheritdoc annotation>
- Definition Classes
- Dataset → Dataset
- def unpivot(ids: Array[Column], values: Array[Column], variableColumnName: String, valueColumnName: String): DataFrame
<invalid inheritdoc annotation>
<invalid inheritdoc annotation>
- Definition Classes
- Dataset → Dataset
- final def wait(arg0: Long, arg1: Int): Unit
- Definition Classes
- AnyRef
- Annotations
- @throws(classOf[java.lang.InterruptedException])
- final def wait(arg0: Long): Unit
- Definition Classes
- AnyRef
- Annotations
- @throws(classOf[java.lang.InterruptedException]) @native()
- final def wait(): Unit
- Definition Classes
- AnyRef
- Annotations
- @throws(classOf[java.lang.InterruptedException])
- def where(conditionExpr: String): Dataset[T]
<invalid inheritdoc annotation>
<invalid inheritdoc annotation>
- Definition Classes
- Dataset → Dataset
- def where(condition: Column): Dataset[T]
<invalid inheritdoc annotation>
<invalid inheritdoc annotation>
- Definition Classes
- Dataset → Dataset
- def withColumn(colName: String, col: Column): DataFrame
<invalid inheritdoc annotation>
<invalid inheritdoc annotation>
- Definition Classes
- Dataset → Dataset
- def withColumnRenamed(existingName: String, newName: String): DataFrame
<invalid inheritdoc annotation>
<invalid inheritdoc annotation>
- Definition Classes
- Dataset → Dataset
- def withColumns(colsMap: Map[String, Column]): DataFrame
<invalid inheritdoc annotation>
<invalid inheritdoc annotation>
- Definition Classes
- Dataset → Dataset
- def withColumns(colsMap: Map[String, Column]): DataFrame
<invalid inheritdoc annotation>
<invalid inheritdoc annotation>
- Definition Classes
- Dataset → Dataset
- def withColumns(colNames: Seq[String], cols: Seq[Column]): DataFrame
<invalid inheritdoc annotation>
- def withColumnsRenamed(colsMap: Map[String, String]): DataFrame
<invalid inheritdoc annotation>
<invalid inheritdoc annotation>
- Definition Classes
- Dataset → Dataset
- def withColumnsRenamed(colsMap: Map[String, String]): DataFrame
<invalid inheritdoc annotation>
<invalid inheritdoc annotation>
- Definition Classes
- Dataset → Dataset
- def withColumnsRenamed(colNames: Seq[String], newColNames: Seq[String]): DataFrame
- def withMetadata(columnName: String, metadata: Metadata): DataFrame
<invalid inheritdoc annotation>
<invalid inheritdoc annotation>
- Definition Classes
- Dataset → Dataset
- def withWatermark(eventTime: String, delayThreshold: String): Dataset[T]
<invalid inheritdoc annotation>
<invalid inheritdoc annotation>
- Definition Classes
- Dataset → Dataset
- def write: DataFrameWriter[T]
<invalid inheritdoc annotation>
<invalid inheritdoc annotation>
- Definition Classes
- Dataset → Dataset
- def writeStream: DataStreamWriter[T]
<invalid inheritdoc annotation>
<invalid inheritdoc annotation>
- Definition Classes
- Dataset → Dataset
- def writeTo(table: String): DataFrameWriterV2[T]
<invalid inheritdoc annotation>
<invalid inheritdoc annotation>
- Definition Classes
- Dataset → Dataset
Deprecated Value Members
- def explode[A, B](inputColumn: String, outputColumn: String)(f: (A) => IterableOnce[B])(implicit arg0: scala.reflect.api.JavaUniverse.TypeTag[B]): DataFrame
<invalid inheritdoc annotation>
<invalid inheritdoc annotation>
- Definition Classes
- Dataset → Dataset
- Annotations
- @deprecated
- Deprecated
(Since version 2.0.0) use flatMap() or select() with functions.explode() instead
- def explode[A <: Product](input: Column*)(f: (Row) => IterableOnce[A])(implicit arg0: scala.reflect.api.JavaUniverse.TypeTag[A]): DataFrame
<invalid inheritdoc annotation>
<invalid inheritdoc annotation>
- Definition Classes
- Dataset → Dataset
- Annotations
- @deprecated
- Deprecated
(Since version 2.0.0) use flatMap() or select() with functions.explode() instead
- def finalize(): Unit
- Attributes
- protected[lang]
- Definition Classes
- AnyRef
- Annotations
- @throws(classOf[java.lang.Throwable]) @Deprecated
- Deprecated
(Since version 9)
- def registerTempTable(tableName: String): Unit
- Definition Classes
- Dataset
- Annotations
- @deprecated
- Deprecated
(Since version 2.0.0) Use createOrReplaceTempView(viewName) instead.