class Dataset[T] extends sql.Dataset[T]
A Dataset is a strongly typed collection of domain-specific objects that can be transformed in
parallel using functional or relational operations. Each Dataset also has an untyped view
called a DataFrame
, which is a Dataset of Row.
Operations available on Datasets are divided into transformations and actions. Transformations
are the ones that produce new Datasets, and actions are the ones that trigger computation and
return results. Example transformations include map, filter, select, and aggregate (groupBy
).
Example actions count, show, or writing data out to file systems.
Datasets are "lazy", i.e. computations are only triggered when an action is invoked.
Internally, a Dataset represents a logical plan that describes the computation required to
produce the data. When an action is invoked, Spark's query optimizer optimizes the logical plan
and generates a physical plan for efficient execution in a parallel and distributed manner. To
explore the logical plan as well as optimized physical plan, use the explain
function.
To efficiently support domain-specific objects, an Encoder is required. The encoder maps
the domain specific type T
to Spark's internal type system. For example, given a class
Person
with two fields, name
(string) and age
(int), an encoder is used to tell Spark to
generate code at runtime to serialize the Person
object into a binary structure. This binary
structure often has much lower memory footprint as well as are optimized for efficiency in data
processing (e.g. in a columnar format). To understand the internal binary representation for
data, use the schema
function.
There are typically two ways to create a Dataset. The most common way is by pointing Spark to
some files on storage systems, using the read
function available on a SparkSession
.
val people = spark.read.parquet("...").as[Person] // Scala Dataset<Person> people = spark.read().parquet("...").as(Encoders.bean(Person.class)); // Java
Datasets can also be created through transformations available on existing Datasets. For example, the following creates a new Dataset by applying a filter on the existing one:
val names = people.map(_.name) // in Scala; names is a Dataset[String] Dataset<String> names = people.map((Person p) -> p.name, Encoders.STRING));
Dataset operations can also be untyped, through various domain-specific-language (DSL) functions defined in: Dataset (this class), Column, and functions. These operations are very similar to the operations available in the data frame abstraction in R or Python.
To select a column from the Dataset, use apply
method in Scala and col
in Java.
val ageCol = people("age") // in Scala Column ageCol = people.col("age"); // in Java
Note that the Column type can also be manipulated through its various functions.
// The following creates a new column that increases everybody's age by 10. people("age") + 10 // in Scala people.col("age").plus(10); // in Java
A more concrete example in Scala:
// To create Dataset[Row] using SparkSession val people = spark.read.parquet("...") val department = spark.read.parquet("...") people.filter("age > 30") .join(department, people("deptId") === department("id")) .groupBy(department("name"), people("gender")) .agg(avg(people("salary")), max(people("age")))
and in Java:
// To create Dataset<Row> using SparkSession Dataset<Row> people = spark.read().parquet("..."); Dataset<Row> department = spark.read().parquet("..."); people.filter(people.col("age").gt(30)) .join(department, people.col("deptId").equalTo(department.col("id"))) .groupBy(department.col("name"), people.col("gender")) .agg(avg(people.col("salary")), max(people.col("age")));
- Since
3.4.0
- Alphabetic
- By Inheritance
- Dataset
- Dataset
- Serializable
- AnyRef
- Any
- Hide All
- Show All
- Public
- Protected
Value Members
- final def !=(arg0: Any): Boolean
- Definition Classes
- AnyRef → Any
- final def ##: Int
- Definition Classes
- AnyRef → Any
- final def ==(arg0: Any): Boolean
- Definition Classes
- AnyRef → Any
- def agg(expr: Column, exprs: Column*): DataFrame
<invalid inheritdoc annotation>
<invalid inheritdoc annotation>
- Definition Classes
- Dataset → Dataset
- Annotations
- @varargs()
- def agg(exprs: Map[String, String]): DataFrame
<invalid inheritdoc annotation>
<invalid inheritdoc annotation>
- Definition Classes
- Dataset → Dataset
- def agg(exprs: Map[String, String]): DataFrame
<invalid inheritdoc annotation>
<invalid inheritdoc annotation>
- Definition Classes
- Dataset → Dataset
- def agg(aggExpr: (String, String), aggExprs: (String, String)*): DataFrame
<invalid inheritdoc annotation>
<invalid inheritdoc annotation>
- Definition Classes
- Dataset → Dataset
- def alias(alias: Symbol): Dataset[T]
<invalid inheritdoc annotation>
<invalid inheritdoc annotation>
- Definition Classes
- Dataset → Dataset
- def alias(alias: String): Dataset[T]
<invalid inheritdoc annotation>
<invalid inheritdoc annotation>
- Definition Classes
- Dataset → Dataset
- def apply(colName: String): Column
- Definition Classes
- Dataset
- def as(alias: Symbol): Dataset[T]
<invalid inheritdoc annotation>
<invalid inheritdoc annotation>
- Definition Classes
- Dataset → Dataset
- def as(alias: String): Dataset[T]
<invalid inheritdoc annotation>
<invalid inheritdoc annotation>
- Definition Classes
- Dataset → Dataset
- def as[U](implicit arg0: Encoder[U]): Dataset[U]
<invalid inheritdoc annotation>
<invalid inheritdoc annotation>
- Definition Classes
- Dataset → Dataset
- final def asInstanceOf[T0]: T0
- Definition Classes
- Any
- def cache(): Dataset.this.type
<invalid inheritdoc annotation>
<invalid inheritdoc annotation>
- Definition Classes
- Dataset → Dataset
- def checkpoint(eager: Boolean): Dataset[T]
<invalid inheritdoc annotation>
<invalid inheritdoc annotation>
- Definition Classes
- Dataset → Dataset
- def checkpoint(): Dataset[T]
<invalid inheritdoc annotation>
<invalid inheritdoc annotation>
- Definition Classes
- Dataset → Dataset
- def checkpoint(eager: Boolean, reliableCheckpoint: Boolean, storageLevel: Option[StorageLevel]): Dataset[T]
<invalid inheritdoc annotation>
<invalid inheritdoc annotation>
- Attributes
- protected
- Definition Classes
- Dataset → Dataset
- def clone(): AnyRef
- Attributes
- protected[lang]
- Definition Classes
- AnyRef
- Annotations
- @throws(classOf[java.lang.CloneNotSupportedException]) @IntrinsicCandidate() @native()
- def coalesce(numPartitions: Int): Dataset[T]
<invalid inheritdoc annotation>
<invalid inheritdoc annotation>
- Definition Classes
- Dataset → Dataset
- def col(colName: String): Column
<invalid inheritdoc annotation>
<invalid inheritdoc annotation>
- Definition Classes
- Dataset → Dataset
- def colRegex(colName: String): Column
<invalid inheritdoc annotation>
<invalid inheritdoc annotation>
- Definition Classes
- Dataset → Dataset
- def collect(): Array[T]
<invalid inheritdoc annotation>
<invalid inheritdoc annotation>
- Definition Classes
- Dataset → Dataset
- def collectAsList(): List[T]
<invalid inheritdoc annotation>
<invalid inheritdoc annotation>
- Definition Classes
- Dataset → Dataset
- def collectResult(): SparkResult[T]
- def columns: Array[String]
- Definition Classes
- Dataset
- def count(): Long
<invalid inheritdoc annotation>
<invalid inheritdoc annotation>
- Definition Classes
- Dataset → Dataset
- def createGlobalTempView(viewName: String): Unit
- Definition Classes
- Dataset
- Annotations
- @throws(scala.this.throws.<init>$default$1[org.apache.spark.sql.AnalysisException])
- def createOrReplaceGlobalTempView(viewName: String): Unit
- Definition Classes
- Dataset
- def createOrReplaceTempView(viewName: String): Unit
- Definition Classes
- Dataset
- def createTempView(viewName: String, replace: Boolean, global: Boolean): Unit
- Attributes
- protected
- Definition Classes
- Dataset → Dataset
- def createTempView(viewName: String): Unit
- Definition Classes
- Dataset
- Annotations
- @throws(scala.this.throws.<init>$default$1[org.apache.spark.sql.AnalysisException])
- def crossJoin(right: sql.Dataset[_]): DataFrame
<invalid inheritdoc annotation>
<invalid inheritdoc annotation>
- Definition Classes
- Dataset → Dataset
- def cube(col1: String, cols: String*): RelationalGroupedDataset
<invalid inheritdoc annotation>
<invalid inheritdoc annotation>
- Definition Classes
- Dataset → Dataset
- Annotations
- @varargs()
- def cube(cols: Column*): RelationalGroupedDataset
<invalid inheritdoc annotation>
<invalid inheritdoc annotation>
- Definition Classes
- Dataset → Dataset
- Annotations
- @varargs()
- def describe(cols: String*): DataFrame
<invalid inheritdoc annotation>
<invalid inheritdoc annotation>
- Definition Classes
- Dataset → Dataset
- Annotations
- @varargs()
- def distinct(): Dataset[T]
<invalid inheritdoc annotation>
<invalid inheritdoc annotation>
- Definition Classes
- Dataset → Dataset
- def drop(col: Column): DataFrame
<invalid inheritdoc annotation>
<invalid inheritdoc annotation>
- Definition Classes
- Dataset → Dataset
- def drop(colName: String): DataFrame
<invalid inheritdoc annotation>
<invalid inheritdoc annotation>
- Definition Classes
- Dataset → Dataset
- def drop(col: Column, cols: Column*): DataFrame
<invalid inheritdoc annotation>
<invalid inheritdoc annotation>
- Definition Classes
- Dataset → Dataset
- Annotations
- @varargs()
- def drop(colNames: String*): DataFrame
<invalid inheritdoc annotation>
<invalid inheritdoc annotation>
- Definition Classes
- Dataset → Dataset
- Annotations
- @varargs()
- def dropDuplicates(col1: String, cols: String*): Dataset[T]
<invalid inheritdoc annotation>
<invalid inheritdoc annotation>
- Definition Classes
- Dataset → Dataset
- Annotations
- @varargs()
- def dropDuplicates(colNames: Array[String]): Dataset[T]
<invalid inheritdoc annotation>
<invalid inheritdoc annotation>
- Definition Classes
- Dataset → Dataset
- def dropDuplicates(colNames: Seq[String]): Dataset[T]
<invalid inheritdoc annotation>
<invalid inheritdoc annotation>
- Definition Classes
- Dataset → Dataset
- def dropDuplicates(): Dataset[T]
<invalid inheritdoc annotation>
<invalid inheritdoc annotation>
- Definition Classes
- Dataset → Dataset
- def dropDuplicatesWithinWatermark(col1: String, cols: String*): Dataset[T]
<invalid inheritdoc annotation>
<invalid inheritdoc annotation>
- Definition Classes
- Dataset → Dataset
- Annotations
- @varargs()
- def dropDuplicatesWithinWatermark(colNames: Array[String]): Dataset[T]
<invalid inheritdoc annotation>
<invalid inheritdoc annotation>
- Definition Classes
- Dataset → Dataset
- def dropDuplicatesWithinWatermark(colNames: Seq[String]): Dataset[T]
<invalid inheritdoc annotation>
<invalid inheritdoc annotation>
- Definition Classes
- Dataset → Dataset
- def dropDuplicatesWithinWatermark(): Dataset[T]
<invalid inheritdoc annotation>
<invalid inheritdoc annotation>
- Definition Classes
- Dataset → Dataset
- def dtypes: Array[(String, String)]
- Definition Classes
- Dataset
- val encoder: Encoder[T]
- Definition Classes
- Dataset → Dataset
- final def eq(arg0: AnyRef): Boolean
- Definition Classes
- AnyRef
- def equals(arg0: AnyRef): Boolean
- Definition Classes
- AnyRef → Any
- def except(other: sql.Dataset[T]): Dataset[T]
<invalid inheritdoc annotation>
<invalid inheritdoc annotation>
- Definition Classes
- Dataset → Dataset
- def exceptAll(other: sql.Dataset[T]): Dataset[T]
<invalid inheritdoc annotation>
<invalid inheritdoc annotation>
- Definition Classes
- Dataset → Dataset
- def exists(): Column
<invalid inheritdoc annotation>
<invalid inheritdoc annotation>
- Definition Classes
- Dataset → Dataset
- def explain(mode: String): Unit
<invalid inheritdoc annotation>
<invalid inheritdoc annotation>
- Definition Classes
- Dataset → Dataset
- def explain(): Unit
- Definition Classes
- Dataset
- def explain(extended: Boolean): Unit
- Definition Classes
- Dataset
- def filter(conditionExpr: String): Dataset[T]
<invalid inheritdoc annotation>
<invalid inheritdoc annotation>
- Definition Classes
- Dataset → Dataset
- def filter(f: FilterFunction[T]): Dataset[T]
<invalid inheritdoc annotation>
<invalid inheritdoc annotation>
- Definition Classes
- Dataset → Dataset
- def filter(func: (T) => Boolean): Dataset[T]
<invalid inheritdoc annotation>
<invalid inheritdoc annotation>
- Definition Classes
- Dataset → Dataset
- def filter(condition: Column): Dataset[T]
<invalid inheritdoc annotation>
<invalid inheritdoc annotation>
- Definition Classes
- Dataset → Dataset
- def first(): T
- Definition Classes
- Dataset
- def flatMap[U](f: FlatMapFunction[T, U], encoder: Encoder[U]): Dataset[U]
<invalid inheritdoc annotation>
<invalid inheritdoc annotation>
- Definition Classes
- Dataset → Dataset
- def flatMap[U](func: (T) => IterableOnce[U])(implicit arg0: Encoder[U]): Dataset[U]
<invalid inheritdoc annotation>
<invalid inheritdoc annotation>
- Definition Classes
- Dataset → Dataset
- def foreach(func: ForeachFunction[T]): Unit
- Definition Classes
- Dataset
- def foreach(f: (T) => Unit): Unit
- Definition Classes
- Dataset
- def foreachPartition(func: ForeachPartitionFunction[T]): Unit
<invalid inheritdoc annotation>
<invalid inheritdoc annotation>
- Definition Classes
- Dataset → Dataset
- def foreachPartition(f: (Iterator[T]) => Unit): Unit
<invalid inheritdoc annotation>
<invalid inheritdoc annotation>
- Definition Classes
- Dataset → Dataset
- final def getClass(): Class[_ <: AnyRef]
- Definition Classes
- AnyRef → Any
- Annotations
- @IntrinsicCandidate() @native()
- def groupBy(col1: String, cols: String*): RelationalGroupedDataset
<invalid inheritdoc annotation>
<invalid inheritdoc annotation>
- Definition Classes
- Dataset → Dataset
- Annotations
- @varargs()
- def groupBy(cols: Column*): RelationalGroupedDataset
<invalid inheritdoc annotation>
<invalid inheritdoc annotation>
- Definition Classes
- Dataset → Dataset
- Annotations
- @varargs()
- def groupByKey[K](func: MapFunction[T, K], encoder: Encoder[K]): KeyValueGroupedDataset[K, T]
<invalid inheritdoc annotation>
<invalid inheritdoc annotation>
- Definition Classes
- Dataset → Dataset
- def groupByKey[K](func: (T) => K)(implicit arg0: Encoder[K]): KeyValueGroupedDataset[K, T]
<invalid inheritdoc annotation>
<invalid inheritdoc annotation>
- Definition Classes
- Dataset → Dataset
- def groupingSets(groupingSets: Seq[Seq[Column]], cols: Column*): RelationalGroupedDataset
<invalid inheritdoc annotation>
<invalid inheritdoc annotation>
- Definition Classes
- Dataset → Dataset
- Annotations
- @varargs()
- def hashCode(): Int
- Definition Classes
- AnyRef → Any
- Annotations
- @IntrinsicCandidate() @native()
- def head(n: Int): Array[T]
<invalid inheritdoc annotation>
<invalid inheritdoc annotation>
- Definition Classes
- Dataset → Dataset
- def head(): T
- Definition Classes
- Dataset
- def hint(name: String, parameters: Any*): Dataset[T]
<invalid inheritdoc annotation>
<invalid inheritdoc annotation>
- Definition Classes
- Dataset → Dataset
- Annotations
- @varargs()
- def inputFiles: Array[String]
<invalid inheritdoc annotation>
<invalid inheritdoc annotation>
- Definition Classes
- Dataset → Dataset
- def intersect(other: sql.Dataset[T]): Dataset[T]
<invalid inheritdoc annotation>
<invalid inheritdoc annotation>
- Definition Classes
- Dataset → Dataset
- def intersectAll(other: sql.Dataset[T]): Dataset[T]
<invalid inheritdoc annotation>
<invalid inheritdoc annotation>
- Definition Classes
- Dataset → Dataset
- def isEmpty: Boolean
<invalid inheritdoc annotation>
<invalid inheritdoc annotation>
- Definition Classes
- Dataset → Dataset
- final def isInstanceOf[T0]: Boolean
- Definition Classes
- Any
- def isLocal: Boolean
<invalid inheritdoc annotation>
<invalid inheritdoc annotation>
- Definition Classes
- Dataset → Dataset
- def isStreaming: Boolean
<invalid inheritdoc annotation>
<invalid inheritdoc annotation>
- Definition Classes
- Dataset → Dataset
- def javaRDD: JavaRDD[T]
- Definition Classes
- Dataset
- Annotations
- @ClassicOnly()
- def join(right: sql.Dataset[_], joinExprs: Column): DataFrame
<invalid inheritdoc annotation>
<invalid inheritdoc annotation>
- Definition Classes
- Dataset → Dataset
- def join(right: sql.Dataset[_], usingColumns: Array[String], joinType: String): DataFrame
<invalid inheritdoc annotation>
<invalid inheritdoc annotation>
- Definition Classes
- Dataset → Dataset
- def join(right: sql.Dataset[_], usingColumn: String, joinType: String): DataFrame
<invalid inheritdoc annotation>
<invalid inheritdoc annotation>
- Definition Classes
- Dataset → Dataset
- def join(right: sql.Dataset[_], usingColumns: Seq[String]): DataFrame
<invalid inheritdoc annotation>
<invalid inheritdoc annotation>
- Definition Classes
- Dataset → Dataset
- def join(right: sql.Dataset[_], usingColumns: Array[String]): DataFrame
<invalid inheritdoc annotation>
<invalid inheritdoc annotation>
- Definition Classes
- Dataset → Dataset
- def join(right: sql.Dataset[_], usingColumn: String): DataFrame
<invalid inheritdoc annotation>
<invalid inheritdoc annotation>
- Definition Classes
- Dataset → Dataset
- def join(right: sql.Dataset[_], joinExprs: Column, joinType: String): DataFrame
<invalid inheritdoc annotation>
<invalid inheritdoc annotation>
- Definition Classes
- Dataset → Dataset
- def join(right: sql.Dataset[_], usingColumns: Seq[String], joinType: String): DataFrame
<invalid inheritdoc annotation>
<invalid inheritdoc annotation>
- Definition Classes
- Dataset → Dataset
- def join(right: sql.Dataset[_]): DataFrame
<invalid inheritdoc annotation>
<invalid inheritdoc annotation>
- Definition Classes
- Dataset → Dataset
- def joinWith[U](other: sql.Dataset[U], condition: Column): Dataset[(T, U)]
<invalid inheritdoc annotation>
<invalid inheritdoc annotation>
- Definition Classes
- Dataset → Dataset
- def joinWith[U](other: sql.Dataset[U], condition: Column, joinType: String): Dataset[(T, U)]
<invalid inheritdoc annotation>
<invalid inheritdoc annotation>
- Definition Classes
- Dataset → Dataset
- def lateralJoin(right: sql.Dataset[_], joinExprs: Column, joinType: String): DataFrame
<invalid inheritdoc annotation>
<invalid inheritdoc annotation>
- Definition Classes
- Dataset → Dataset
- def lateralJoin(right: sql.Dataset[_], joinType: String): DataFrame
<invalid inheritdoc annotation>
<invalid inheritdoc annotation>
- Definition Classes
- Dataset → Dataset
- def lateralJoin(right: sql.Dataset[_], joinExprs: Column): DataFrame
<invalid inheritdoc annotation>
<invalid inheritdoc annotation>
- Definition Classes
- Dataset → Dataset
- def lateralJoin(right: sql.Dataset[_]): DataFrame
<invalid inheritdoc annotation>
<invalid inheritdoc annotation>
- Definition Classes
- Dataset → Dataset
- def limit(n: Int): Dataset[T]
<invalid inheritdoc annotation>
<invalid inheritdoc annotation>
- Definition Classes
- Dataset → Dataset
- def localCheckpoint(eager: Boolean, storageLevel: StorageLevel): Dataset[T]
<invalid inheritdoc annotation>
<invalid inheritdoc annotation>
- Definition Classes
- Dataset → Dataset
- def localCheckpoint(eager: Boolean): Dataset[T]
<invalid inheritdoc annotation>
<invalid inheritdoc annotation>
- Definition Classes
- Dataset → Dataset
- def localCheckpoint(): Dataset[T]
<invalid inheritdoc annotation>
<invalid inheritdoc annotation>
- Definition Classes
- Dataset → Dataset
- def map[U](f: MapFunction[T, U], encoder: Encoder[U]): Dataset[U]
<invalid inheritdoc annotation>
<invalid inheritdoc annotation>
- Definition Classes
- Dataset → Dataset
- def map[U](f: (T) => U)(implicit arg0: Encoder[U]): Dataset[U]
<invalid inheritdoc annotation>
<invalid inheritdoc annotation>
- Definition Classes
- Dataset → Dataset
- def mapPartitions[U](f: MapPartitionsFunction[T, U], encoder: Encoder[U]): Dataset[U]
<invalid inheritdoc annotation>
<invalid inheritdoc annotation>
- Definition Classes
- Dataset → Dataset
- def mapPartitions[U](func: (Iterator[T]) => Iterator[U])(implicit arg0: Encoder[U]): Dataset[U]
<invalid inheritdoc annotation>
<invalid inheritdoc annotation>
- Definition Classes
- Dataset → Dataset
- def melt(ids: Array[Column], variableColumnName: String, valueColumnName: String): DataFrame
<invalid inheritdoc annotation>
<invalid inheritdoc annotation>
- Definition Classes
- Dataset → Dataset
- def melt(ids: Array[Column], values: Array[Column], variableColumnName: String, valueColumnName: String): DataFrame
- Definition Classes
- Dataset → Dataset
- def mergeInto(table: String, condition: Column): MergeIntoWriter[T]
<invalid inheritdoc annotation>
<invalid inheritdoc annotation>
- Definition Classes
- Dataset → Dataset
- def metadataColumn(colName: String): Column
<invalid inheritdoc annotation>
<invalid inheritdoc annotation>
- Definition Classes
- Dataset → Dataset
- def na: DataFrameNaFunctions
<invalid inheritdoc annotation>
<invalid inheritdoc annotation>
- Definition Classes
- Dataset → Dataset
- final def ne(arg0: AnyRef): Boolean
- Definition Classes
- AnyRef
- final def notify(): Unit
- Definition Classes
- AnyRef
- Annotations
- @IntrinsicCandidate() @native()
- final def notifyAll(): Unit
- Definition Classes
- AnyRef
- Annotations
- @IntrinsicCandidate() @native()
- def observe(observation: Observation, expr: Column, exprs: Column*): Dataset[T]
<invalid inheritdoc annotation>
<invalid inheritdoc annotation>
- Definition Classes
- Dataset → Dataset
- Annotations
- @varargs()
- def observe(name: String, expr: Column, exprs: Column*): Dataset[T]
<invalid inheritdoc annotation>
<invalid inheritdoc annotation>
- Definition Classes
- Dataset → Dataset
- Annotations
- @varargs()
- def offset(n: Int): Dataset[T]
<invalid inheritdoc annotation>
<invalid inheritdoc annotation>
- Definition Classes
- Dataset → Dataset
- def orderBy(sortExprs: Column*): Dataset[T]
<invalid inheritdoc annotation>
<invalid inheritdoc annotation>
- Definition Classes
- Dataset → Dataset
- Annotations
- @varargs()
- def orderBy(sortCol: String, sortCols: String*): Dataset[T]
<invalid inheritdoc annotation>
<invalid inheritdoc annotation>
- Definition Classes
- Dataset → Dataset
- Annotations
- @varargs()
- def persist(newLevel: StorageLevel): Dataset.this.type
<invalid inheritdoc annotation>
<invalid inheritdoc annotation>
- Definition Classes
- Dataset → Dataset
- def persist(): Dataset.this.type
<invalid inheritdoc annotation>
<invalid inheritdoc annotation>
- Definition Classes
- Dataset → Dataset
- val plan: Plan
- def printSchema(level: Int): Unit
- Definition Classes
- Dataset
- def printSchema(): Unit
- Definition Classes
- Dataset
- def queryExecution: QueryExecution
- Definition Classes
- Dataset → Dataset
- def randomSplit(weights: Array[Double]): Array[sql.Dataset[T]]
<invalid inheritdoc annotation>
<invalid inheritdoc annotation>
- Definition Classes
- Dataset → Dataset
- def randomSplit(weights: Array[Double], seed: Long): Array[sql.Dataset[T]]
<invalid inheritdoc annotation>
<invalid inheritdoc annotation>
- Definition Classes
- Dataset → Dataset
- def randomSplitAsList(weights: Array[Double], seed: Long): List[sql.Dataset[T]]
<invalid inheritdoc annotation>
<invalid inheritdoc annotation>
- Definition Classes
- Dataset → Dataset
- def rdd: RDD[T]
<invalid inheritdoc annotation>
<invalid inheritdoc annotation>
- Definition Classes
- Dataset → Dataset
- def reduce(func: (T, T) => T): T
<invalid inheritdoc annotation>
<invalid inheritdoc annotation>
- Definition Classes
- Dataset → Dataset
- def reduce(func: ReduceFunction[T]): T
- Definition Classes
- Dataset
- def repartition(partitionExprs: Column*): Dataset[T]
<invalid inheritdoc annotation>
<invalid inheritdoc annotation>
- Definition Classes
- Dataset → Dataset
- Annotations
- @varargs()
- def repartition(numPartitions: Int, partitionExprs: Column*): Dataset[T]
<invalid inheritdoc annotation>
<invalid inheritdoc annotation>
- Definition Classes
- Dataset → Dataset
- Annotations
- @varargs()
- def repartition(numPartitions: Int): Dataset[T]
<invalid inheritdoc annotation>
<invalid inheritdoc annotation>
- Definition Classes
- Dataset → Dataset
- def repartitionByExpression(numPartitions: Option[Int], partitionExprs: Seq[Column]): Dataset[T]
- Attributes
- protected[this]
- Definition Classes
- Dataset → Dataset
- def repartitionByRange(partitionExprs: Column*): Dataset[T]
<invalid inheritdoc annotation>
<invalid inheritdoc annotation>
- Definition Classes
- Dataset → Dataset
- Annotations
- @varargs()
- def repartitionByRange(numPartitions: Int, partitionExprs: Column*): Dataset[T]
<invalid inheritdoc annotation>
<invalid inheritdoc annotation>
- Definition Classes
- Dataset → Dataset
- Annotations
- @varargs()
- def repartitionByRange(numPartitions: Option[Int], partitionExprs: Seq[Column]): Dataset[T]
- Attributes
- protected
- Definition Classes
- Dataset → Dataset
- def rollup(col1: String, cols: String*): RelationalGroupedDataset
<invalid inheritdoc annotation>
<invalid inheritdoc annotation>
- Definition Classes
- Dataset → Dataset
- Annotations
- @varargs()
- def rollup(cols: Column*): RelationalGroupedDataset
<invalid inheritdoc annotation>
<invalid inheritdoc annotation>
- Definition Classes
- Dataset → Dataset
- Annotations
- @varargs()
- def sameSemantics(other: sql.Dataset[T]): Boolean
<invalid inheritdoc annotation>
<invalid inheritdoc annotation>
- Definition Classes
- Dataset → Dataset
- Annotations
- @DeveloperApi()
- def sample(withReplacement: Boolean, fraction: Double): Dataset[T]
<invalid inheritdoc annotation>
<invalid inheritdoc annotation>
- Definition Classes
- Dataset → Dataset
- def sample(fraction: Double): Dataset[T]
<invalid inheritdoc annotation>
<invalid inheritdoc annotation>
- Definition Classes
- Dataset → Dataset
- def sample(fraction: Double, seed: Long): Dataset[T]
<invalid inheritdoc annotation>
<invalid inheritdoc annotation>
- Definition Classes
- Dataset → Dataset
- def sample(withReplacement: Boolean, fraction: Double, seed: Long): Dataset[T]
<invalid inheritdoc annotation>
<invalid inheritdoc annotation>
- Definition Classes
- Dataset → Dataset
- def scalar(): Column
<invalid inheritdoc annotation>
<invalid inheritdoc annotation>
- Definition Classes
- Dataset → Dataset
- def schema: StructType
<invalid inheritdoc annotation>
<invalid inheritdoc annotation>
- Definition Classes
- Dataset → Dataset
- def select[U1, U2, U3, U4, U5](c1: TypedColumn[T, U1], c2: TypedColumn[T, U2], c3: TypedColumn[T, U3], c4: TypedColumn[T, U4], c5: TypedColumn[T, U5]): Dataset[(U1, U2, U3, U4, U5)]
<invalid inheritdoc annotation>
<invalid inheritdoc annotation>
- Definition Classes
- Dataset → Dataset
- def select[U1, U2, U3, U4](c1: TypedColumn[T, U1], c2: TypedColumn[T, U2], c3: TypedColumn[T, U3], c4: TypedColumn[T, U4]): Dataset[(U1, U2, U3, U4)]
<invalid inheritdoc annotation>
<invalid inheritdoc annotation>
- Definition Classes
- Dataset → Dataset
- def select[U1, U2, U3](c1: TypedColumn[T, U1], c2: TypedColumn[T, U2], c3: TypedColumn[T, U3]): Dataset[(U1, U2, U3)]
<invalid inheritdoc annotation>
<invalid inheritdoc annotation>
- Definition Classes
- Dataset → Dataset
- def select[U1, U2](c1: TypedColumn[T, U1], c2: TypedColumn[T, U2]): Dataset[(U1, U2)]
<invalid inheritdoc annotation>
<invalid inheritdoc annotation>
- Definition Classes
- Dataset → Dataset
- def select(col: String, cols: String*): DataFrame
<invalid inheritdoc annotation>
<invalid inheritdoc annotation>
- Definition Classes
- Dataset → Dataset
- Annotations
- @varargs()
- def select[U1](c1: TypedColumn[T, U1]): Dataset[U1]
<invalid inheritdoc annotation>
<invalid inheritdoc annotation>
- Definition Classes
- Dataset → Dataset
- def select(cols: Column*): DataFrame
<invalid inheritdoc annotation>
<invalid inheritdoc annotation>
- Definition Classes
- Dataset → Dataset
- Annotations
- @varargs()
- def selectExpr(exprs: String*): DataFrame
<invalid inheritdoc annotation>
<invalid inheritdoc annotation>
- Definition Classes
- Dataset → Dataset
- Annotations
- @varargs()
- def selectUntyped(columns: TypedColumn[_, _]*): Dataset[_]
<invalid inheritdoc annotation>
<invalid inheritdoc annotation>
- Attributes
- protected
- Definition Classes
- Dataset → Dataset
- def semanticHash(): Int
<invalid inheritdoc annotation>
<invalid inheritdoc annotation>
- Definition Classes
- Dataset → Dataset
- Annotations
- @DeveloperApi()
- def show(numRows: Int, truncate: Int, vertical: Boolean): Unit
<invalid inheritdoc annotation>
<invalid inheritdoc annotation>
- Definition Classes
- Dataset → Dataset
- def show(numRows: Int, truncate: Boolean): Unit
<invalid inheritdoc annotation>
<invalid inheritdoc annotation>
- Definition Classes
- Dataset → Dataset
- def show(numRows: Int, truncate: Int): Unit
- Definition Classes
- Dataset
- def show(truncate: Boolean): Unit
- Definition Classes
- Dataset
- def show(): Unit
- Definition Classes
- Dataset
- def show(numRows: Int): Unit
- Definition Classes
- Dataset
- def sort(sortExprs: Column*): Dataset[T]
<invalid inheritdoc annotation>
<invalid inheritdoc annotation>
- Definition Classes
- Dataset → Dataset
- Annotations
- @varargs()
- def sort(sortCol: String, sortCols: String*): Dataset[T]
<invalid inheritdoc annotation>
<invalid inheritdoc annotation>
- Definition Classes
- Dataset → Dataset
- Annotations
- @varargs()
- def sortInternal(global: Boolean, sortCols: Seq[Column]): Dataset[T]
- Attributes
- protected
- Definition Classes
- Dataset → Dataset
- def sortWithinPartitions(sortExprs: Column*): Dataset[T]
<invalid inheritdoc annotation>
<invalid inheritdoc annotation>
- Definition Classes
- Dataset → Dataset
- Annotations
- @varargs()
- def sortWithinPartitions(sortCol: String, sortCols: String*): Dataset[T]
<invalid inheritdoc annotation>
<invalid inheritdoc annotation>
- Definition Classes
- Dataset → Dataset
- Annotations
- @varargs()
- val sparkSession: SparkSession
- Definition Classes
- Dataset → Dataset
- def stat: DataFrameStatFunctions
<invalid inheritdoc annotation>
<invalid inheritdoc annotation>
- Definition Classes
- Dataset → Dataset
- def storageLevel: StorageLevel
<invalid inheritdoc annotation>
<invalid inheritdoc annotation>
- Definition Classes
- Dataset → Dataset
- def summary(statistics: String*): DataFrame
<invalid inheritdoc annotation>
<invalid inheritdoc annotation>
- Definition Classes
- Dataset → Dataset
- Annotations
- @varargs()
- final def synchronized[T0](arg0: => T0): T0
- Definition Classes
- AnyRef
- def tail(n: Int): Array[T]
<invalid inheritdoc annotation>
<invalid inheritdoc annotation>
- Definition Classes
- Dataset → Dataset
- def take(n: Int): Array[T]
- Definition Classes
- Dataset
- def takeAsList(n: Int): List[T]
- Definition Classes
- Dataset
- def to(schema: StructType): DataFrame
<invalid inheritdoc annotation>
<invalid inheritdoc annotation>
- Definition Classes
- Dataset → Dataset
- def toDF(colNames: String*): DataFrame
<invalid inheritdoc annotation>
<invalid inheritdoc annotation>
- Definition Classes
- Dataset → Dataset
- Annotations
- @varargs()
- def toDF(): DataFrame
<invalid inheritdoc annotation>
<invalid inheritdoc annotation>
- Definition Classes
- Dataset → Dataset
- def toJSON: Dataset[String]
<invalid inheritdoc annotation>
<invalid inheritdoc annotation>
- Definition Classes
- Dataset → Dataset
- def toJavaRDD: JavaRDD[T]
<invalid inheritdoc annotation>
<invalid inheritdoc annotation>
- Definition Classes
- Dataset → Dataset
- def toLocalIterator(): Iterator[T]
<invalid inheritdoc annotation>
<invalid inheritdoc annotation>
- Definition Classes
- Dataset → Dataset
- def toString(): String
- Definition Classes
- Dataset → AnyRef → Any
- def transform[U, DSO[_] <: sql.Dataset[_]](t: (Dataset.this.type) => DSO[U]): DSO[U]
- Definition Classes
- Dataset
- def transpose(): DataFrame
<invalid inheritdoc annotation>
<invalid inheritdoc annotation>
- Definition Classes
- Dataset → Dataset
- def transpose(indexColumn: Column): DataFrame
<invalid inheritdoc annotation>
<invalid inheritdoc annotation>
- Definition Classes
- Dataset → Dataset
- def union(other: sql.Dataset[T]): Dataset[T]
<invalid inheritdoc annotation>
<invalid inheritdoc annotation>
- Definition Classes
- Dataset → Dataset
- def unionAll(other: sql.Dataset[T]): Dataset[T]
<invalid inheritdoc annotation>
<invalid inheritdoc annotation>
- Definition Classes
- Dataset → Dataset
- def unionByName(other: sql.Dataset[T]): Dataset[T]
<invalid inheritdoc annotation>
<invalid inheritdoc annotation>
- Definition Classes
- Dataset → Dataset
- def unionByName(other: sql.Dataset[T], allowMissingColumns: Boolean): Dataset[T]
<invalid inheritdoc annotation>
<invalid inheritdoc annotation>
- Definition Classes
- Dataset → Dataset
- def unpersist(): Dataset.this.type
<invalid inheritdoc annotation>
<invalid inheritdoc annotation>
- Definition Classes
- Dataset → Dataset
- def unpersist(blocking: Boolean): Dataset.this.type
<invalid inheritdoc annotation>
<invalid inheritdoc annotation>
- Definition Classes
- Dataset → Dataset
- def unpivot(ids: Array[Column], variableColumnName: String, valueColumnName: String): DataFrame
<invalid inheritdoc annotation>
<invalid inheritdoc annotation>
- Definition Classes
- Dataset → Dataset
- def unpivot(ids: Array[Column], values: Array[Column], variableColumnName: String, valueColumnName: String): DataFrame
<invalid inheritdoc annotation>
<invalid inheritdoc annotation>
- Definition Classes
- Dataset → Dataset
- final def wait(arg0: Long, arg1: Int): Unit
- Definition Classes
- AnyRef
- Annotations
- @throws(classOf[java.lang.InterruptedException])
- final def wait(arg0: Long): Unit
- Definition Classes
- AnyRef
- Annotations
- @throws(classOf[java.lang.InterruptedException]) @native()
- final def wait(): Unit
- Definition Classes
- AnyRef
- Annotations
- @throws(classOf[java.lang.InterruptedException])
- def where(conditionExpr: String): Dataset[T]
<invalid inheritdoc annotation>
<invalid inheritdoc annotation>
- Definition Classes
- Dataset → Dataset
- def where(condition: Column): Dataset[T]
<invalid inheritdoc annotation>
<invalid inheritdoc annotation>
- Definition Classes
- Dataset → Dataset
- def withColumn(colName: String, col: Column): DataFrame
<invalid inheritdoc annotation>
<invalid inheritdoc annotation>
- Definition Classes
- Dataset → Dataset
- def withColumnRenamed(existingName: String, newName: String): DataFrame
<invalid inheritdoc annotation>
<invalid inheritdoc annotation>
- Definition Classes
- Dataset → Dataset
- def withColumns(colsMap: Map[String, Column]): DataFrame
<invalid inheritdoc annotation>
<invalid inheritdoc annotation>
- Definition Classes
- Dataset → Dataset
- def withColumns(colsMap: Map[String, Column]): DataFrame
<invalid inheritdoc annotation>
<invalid inheritdoc annotation>
- Definition Classes
- Dataset → Dataset
- def withColumnsRenamed(colsMap: Map[String, String]): DataFrame
<invalid inheritdoc annotation>
<invalid inheritdoc annotation>
- Definition Classes
- Dataset → Dataset
- def withColumnsRenamed(colsMap: Map[String, String]): DataFrame
<invalid inheritdoc annotation>
<invalid inheritdoc annotation>
- Definition Classes
- Dataset → Dataset
- def withColumnsRenamed(colNames: Seq[String], newColNames: Seq[String]): DataFrame
- Attributes
- protected
- Definition Classes
- Dataset → Dataset
- def withMetadata(columnName: String, metadata: Metadata): DataFrame
<invalid inheritdoc annotation>
<invalid inheritdoc annotation>
- Definition Classes
- Dataset → Dataset
- def withWatermark(eventTime: String, delayThreshold: String): Dataset[T]
<invalid inheritdoc annotation>
<invalid inheritdoc annotation>
- Definition Classes
- Dataset → Dataset
- def write: DataFrameWriter[T]
<invalid inheritdoc annotation>
<invalid inheritdoc annotation>
- Definition Classes
- Dataset → Dataset
- def writeStream: DataStreamWriter[T]
<invalid inheritdoc annotation>
<invalid inheritdoc annotation>
- Definition Classes
- Dataset → Dataset
- def writeTo(table: String): DataFrameWriterV2[T]
<invalid inheritdoc annotation>
<invalid inheritdoc annotation>
- Definition Classes
- Dataset → Dataset
Deprecated Value Members
- def explode[A, B](inputColumn: String, outputColumn: String)(f: (A) => IterableOnce[B])(implicit arg0: scala.reflect.api.JavaUniverse.TypeTag[B]): DataFrame
<invalid inheritdoc annotation>
<invalid inheritdoc annotation>
- Definition Classes
- Dataset → Dataset
- Annotations
- @deprecated
- Deprecated
(Since version 3.5.0) use flatMap() or select() with functions.explode() instead
- def explode[A <: Product](input: Column*)(f: (Row) => IterableOnce[A])(implicit arg0: scala.reflect.api.JavaUniverse.TypeTag[A]): DataFrame
<invalid inheritdoc annotation>
<invalid inheritdoc annotation>
- Definition Classes
- Dataset → Dataset
- Annotations
- @deprecated
- Deprecated
(Since version 3.5.0) use flatMap() or select() with functions.explode() instead
- def finalize(): Unit
- Attributes
- protected[lang]
- Definition Classes
- AnyRef
- Annotations
- @throws(classOf[java.lang.Throwable]) @Deprecated
- Deprecated
(Since version 9)
- def registerTempTable(tableName: String): Unit
- Definition Classes
- Dataset
- Annotations
- @deprecated
- Deprecated
(Since version 2.0.0) Use createOrReplaceTempView(viewName) instead.