class Dataset[T] extends sql.Dataset[T]

A Dataset is a strongly typed collection of domain-specific objects that can be transformed in parallel using functional or relational operations. Each Dataset also has an untyped view called a DataFrame, which is a Dataset of Row.

Operations available on Datasets are divided into transformations and actions. Transformations are the ones that produce new Datasets, and actions are the ones that trigger computation and return results. Example transformations include map, filter, select, and aggregate (groupBy). Example actions count, show, or writing data out to file systems.

Datasets are "lazy", i.e. computations are only triggered when an action is invoked. Internally, a Dataset represents a logical plan that describes the computation required to produce the data. When an action is invoked, Spark's query optimizer optimizes the logical plan and generates a physical plan for efficient execution in a parallel and distributed manner. To explore the logical plan as well as optimized physical plan, use the explain function.

To efficiently support domain-specific objects, an Encoder is required. The encoder maps the domain specific type T to Spark's internal type system. For example, given a class Person with two fields, name (string) and age (int), an encoder is used to tell Spark to generate code at runtime to serialize the Person object into a binary structure. This binary structure often has much lower memory footprint as well as are optimized for efficiency in data processing (e.g. in a columnar format). To understand the internal binary representation for data, use the schema function.

There are typically two ways to create a Dataset. The most common way is by pointing Spark to some files on storage systems, using the read function available on a SparkSession.

val people = spark.read.parquet("...").as[Person]  // Scala
Dataset<Person> people = spark.read().parquet("...").as(Encoders.bean(Person.class)); // Java

Datasets can also be created through transformations available on existing Datasets. For example, the following creates a new Dataset by applying a filter on the existing one:

val names = people.map(_.name)  // in Scala; names is a Dataset[String]
Dataset<String> names = people.map((Person p) -> p.name, Encoders.STRING));

Dataset operations can also be untyped, through various domain-specific-language (DSL) functions defined in: Dataset (this class), Column, and functions. These operations are very similar to the operations available in the data frame abstraction in R or Python.

To select a column from the Dataset, use apply method in Scala and col in Java.

val ageCol = people("age")  // in Scala
Column ageCol = people.col("age"); // in Java

Note that the Column type can also be manipulated through its various functions.

// The following creates a new column that increases everybody's age by 10.
people("age") + 10  // in Scala
people.col("age").plus(10);  // in Java

A more concrete example in Scala:

// To create Dataset[Row] using SparkSession
val people = spark.read.parquet("...")
val department = spark.read.parquet("...")

people.filter("age > 30")
  .join(department, people("deptId") === department("id"))
  .groupBy(department("name"), people("gender"))
  .agg(avg(people("salary")), max(people("age")))

and in Java:

// To create Dataset<Row> using SparkSession
Dataset<Row> people = spark.read().parquet("...");
Dataset<Row> department = spark.read().parquet("...");

people.filter(people.col("age").gt(30))
  .join(department, people.col("deptId").equalTo(department.col("id")))
  .groupBy(department.col("name"), people.col("gender"))
  .agg(avg(people.col("salary")), max(people.col("age")));
Since

3.4.0

Linear Supertypes
sql.Dataset[T], Serializable, AnyRef, Any
Ordering
  1. Alphabetic
  2. By Inheritance
Inherited
  1. Dataset
  2. Dataset
  3. Serializable
  4. AnyRef
  5. Any
  1. Hide All
  2. Show All
Visibility
  1. Public
  2. Protected

Value Members

  1. final def !=(arg0: Any): Boolean
    Definition Classes
    AnyRef → Any
  2. final def ##: Int
    Definition Classes
    AnyRef → Any
  3. final def ==(arg0: Any): Boolean
    Definition Classes
    AnyRef → Any
  4. def agg(expr: Column, exprs: Column*): DataFrame

    <invalid inheritdoc annotation>

    <invalid inheritdoc annotation>

    Definition Classes
    Dataset → Dataset
    Annotations
    @varargs()
  5. def agg(exprs: Map[String, String]): DataFrame

    <invalid inheritdoc annotation>

    <invalid inheritdoc annotation>

    Definition Classes
    Dataset → Dataset
  6. def agg(exprs: Map[String, String]): DataFrame

    <invalid inheritdoc annotation>

    <invalid inheritdoc annotation>

    Definition Classes
    Dataset → Dataset
  7. def agg(aggExpr: (String, String), aggExprs: (String, String)*): DataFrame

    <invalid inheritdoc annotation>

    <invalid inheritdoc annotation>

    Definition Classes
    Dataset → Dataset
  8. def alias(alias: Symbol): Dataset[T]

    <invalid inheritdoc annotation>

    <invalid inheritdoc annotation>

    Definition Classes
    Dataset → Dataset
  9. def alias(alias: String): Dataset[T]

    <invalid inheritdoc annotation>

    <invalid inheritdoc annotation>

    Definition Classes
    Dataset → Dataset
  10. def apply(colName: String): Column
    Definition Classes
    Dataset
  11. def as(alias: Symbol): Dataset[T]

    <invalid inheritdoc annotation>

    <invalid inheritdoc annotation>

    Definition Classes
    Dataset → Dataset
  12. def as(alias: String): Dataset[T]

    <invalid inheritdoc annotation>

    <invalid inheritdoc annotation>

    Definition Classes
    Dataset → Dataset
  13. def as[U](implicit arg0: Encoder[U]): Dataset[U]

    <invalid inheritdoc annotation>

    <invalid inheritdoc annotation>

    Definition Classes
    Dataset → Dataset
  14. final def asInstanceOf[T0]: T0
    Definition Classes
    Any
  15. def cache(): Dataset.this.type

    <invalid inheritdoc annotation>

    <invalid inheritdoc annotation>

    Definition Classes
    Dataset → Dataset
  16. def checkpoint(eager: Boolean): Dataset[T]

    <invalid inheritdoc annotation>

    <invalid inheritdoc annotation>

    Definition Classes
    Dataset → Dataset
  17. def checkpoint(): Dataset[T]

    <invalid inheritdoc annotation>

    <invalid inheritdoc annotation>

    Definition Classes
    Dataset → Dataset
  18. def checkpoint(eager: Boolean, reliableCheckpoint: Boolean, storageLevel: Option[StorageLevel]): Dataset[T]

    <invalid inheritdoc annotation>

    <invalid inheritdoc annotation>

    Attributes
    protected
    Definition Classes
    Dataset → Dataset
  19. def clone(): AnyRef
    Attributes
    protected[lang]
    Definition Classes
    AnyRef
    Annotations
    @throws(classOf[java.lang.CloneNotSupportedException]) @IntrinsicCandidate() @native()
  20. def coalesce(numPartitions: Int): Dataset[T]

    <invalid inheritdoc annotation>

    <invalid inheritdoc annotation>

    Definition Classes
    Dataset → Dataset
  21. def col(colName: String): Column

    <invalid inheritdoc annotation>

    <invalid inheritdoc annotation>

    Definition Classes
    Dataset → Dataset
  22. def colRegex(colName: String): Column

    <invalid inheritdoc annotation>

    <invalid inheritdoc annotation>

    Definition Classes
    Dataset → Dataset
  23. def collect(): Array[T]

    <invalid inheritdoc annotation>

    <invalid inheritdoc annotation>

    Definition Classes
    Dataset → Dataset
  24. def collectAsList(): List[T]

    <invalid inheritdoc annotation>

    <invalid inheritdoc annotation>

    Definition Classes
    Dataset → Dataset
  25. def collectResult(): SparkResult[T]
  26. def columns: Array[String]
    Definition Classes
    Dataset
  27. def count(): Long

    <invalid inheritdoc annotation>

    <invalid inheritdoc annotation>

    Definition Classes
    Dataset → Dataset
  28. def createGlobalTempView(viewName: String): Unit
    Definition Classes
    Dataset
    Annotations
    @throws(scala.this.throws.<init>$default$1[org.apache.spark.sql.AnalysisException])
  29. def createOrReplaceGlobalTempView(viewName: String): Unit
    Definition Classes
    Dataset
  30. def createOrReplaceTempView(viewName: String): Unit
    Definition Classes
    Dataset
  31. def createTempView(viewName: String, replace: Boolean, global: Boolean): Unit
    Attributes
    protected
    Definition Classes
    Dataset → Dataset
  32. def createTempView(viewName: String): Unit
    Definition Classes
    Dataset
    Annotations
    @throws(scala.this.throws.<init>$default$1[org.apache.spark.sql.AnalysisException])
  33. def crossJoin(right: sql.Dataset[_]): DataFrame

    <invalid inheritdoc annotation>

    <invalid inheritdoc annotation>

    Definition Classes
    Dataset → Dataset
  34. def cube(col1: String, cols: String*): RelationalGroupedDataset

    <invalid inheritdoc annotation>

    <invalid inheritdoc annotation>

    Definition Classes
    Dataset → Dataset
    Annotations
    @varargs()
  35. def cube(cols: Column*): RelationalGroupedDataset

    <invalid inheritdoc annotation>

    <invalid inheritdoc annotation>

    Definition Classes
    Dataset → Dataset
    Annotations
    @varargs()
  36. def describe(cols: String*): DataFrame

    <invalid inheritdoc annotation>

    <invalid inheritdoc annotation>

    Definition Classes
    Dataset → Dataset
    Annotations
    @varargs()
  37. def distinct(): Dataset[T]

    <invalid inheritdoc annotation>

    <invalid inheritdoc annotation>

    Definition Classes
    Dataset → Dataset
  38. def drop(col: Column): DataFrame

    <invalid inheritdoc annotation>

    <invalid inheritdoc annotation>

    Definition Classes
    Dataset → Dataset
  39. def drop(colName: String): DataFrame

    <invalid inheritdoc annotation>

    <invalid inheritdoc annotation>

    Definition Classes
    Dataset → Dataset
  40. def drop(col: Column, cols: Column*): DataFrame

    <invalid inheritdoc annotation>

    <invalid inheritdoc annotation>

    Definition Classes
    Dataset → Dataset
    Annotations
    @varargs()
  41. def drop(colNames: String*): DataFrame

    <invalid inheritdoc annotation>

    <invalid inheritdoc annotation>

    Definition Classes
    Dataset → Dataset
    Annotations
    @varargs()
  42. def dropDuplicates(col1: String, cols: String*): Dataset[T]

    <invalid inheritdoc annotation>

    <invalid inheritdoc annotation>

    Definition Classes
    Dataset → Dataset
    Annotations
    @varargs()
  43. def dropDuplicates(colNames: Array[String]): Dataset[T]

    <invalid inheritdoc annotation>

    <invalid inheritdoc annotation>

    Definition Classes
    Dataset → Dataset
  44. def dropDuplicates(colNames: Seq[String]): Dataset[T]

    <invalid inheritdoc annotation>

    <invalid inheritdoc annotation>

    Definition Classes
    Dataset → Dataset
  45. def dropDuplicates(): Dataset[T]

    <invalid inheritdoc annotation>

    <invalid inheritdoc annotation>

    Definition Classes
    Dataset → Dataset
  46. def dropDuplicatesWithinWatermark(col1: String, cols: String*): Dataset[T]

    <invalid inheritdoc annotation>

    <invalid inheritdoc annotation>

    Definition Classes
    Dataset → Dataset
    Annotations
    @varargs()
  47. def dropDuplicatesWithinWatermark(colNames: Array[String]): Dataset[T]

    <invalid inheritdoc annotation>

    <invalid inheritdoc annotation>

    Definition Classes
    Dataset → Dataset
  48. def dropDuplicatesWithinWatermark(colNames: Seq[String]): Dataset[T]

    <invalid inheritdoc annotation>

    <invalid inheritdoc annotation>

    Definition Classes
    Dataset → Dataset
  49. def dropDuplicatesWithinWatermark(): Dataset[T]

    <invalid inheritdoc annotation>

    <invalid inheritdoc annotation>

    Definition Classes
    Dataset → Dataset
  50. def dtypes: Array[(String, String)]
    Definition Classes
    Dataset
  51. val encoder: Encoder[T]
    Definition Classes
    Dataset → Dataset
  52. final def eq(arg0: AnyRef): Boolean
    Definition Classes
    AnyRef
  53. def equals(arg0: AnyRef): Boolean
    Definition Classes
    AnyRef → Any
  54. def except(other: sql.Dataset[T]): Dataset[T]

    <invalid inheritdoc annotation>

    <invalid inheritdoc annotation>

    Definition Classes
    Dataset → Dataset
  55. def exceptAll(other: sql.Dataset[T]): Dataset[T]

    <invalid inheritdoc annotation>

    <invalid inheritdoc annotation>

    Definition Classes
    Dataset → Dataset
  56. def exists(): Column

    <invalid inheritdoc annotation>

    <invalid inheritdoc annotation>

    Definition Classes
    Dataset → Dataset
  57. def explain(mode: String): Unit

    <invalid inheritdoc annotation>

    <invalid inheritdoc annotation>

    Definition Classes
    Dataset → Dataset
  58. def explain(): Unit
    Definition Classes
    Dataset
  59. def explain(extended: Boolean): Unit
    Definition Classes
    Dataset
  60. def filter(conditionExpr: String): Dataset[T]

    <invalid inheritdoc annotation>

    <invalid inheritdoc annotation>

    Definition Classes
    Dataset → Dataset
  61. def filter(f: FilterFunction[T]): Dataset[T]

    <invalid inheritdoc annotation>

    <invalid inheritdoc annotation>

    Definition Classes
    Dataset → Dataset
  62. def filter(func: (T) => Boolean): Dataset[T]

    <invalid inheritdoc annotation>

    <invalid inheritdoc annotation>

    Definition Classes
    Dataset → Dataset
  63. def filter(condition: Column): Dataset[T]

    <invalid inheritdoc annotation>

    <invalid inheritdoc annotation>

    Definition Classes
    Dataset → Dataset
  64. def first(): T
    Definition Classes
    Dataset
  65. def flatMap[U](f: FlatMapFunction[T, U], encoder: Encoder[U]): Dataset[U]

    <invalid inheritdoc annotation>

    <invalid inheritdoc annotation>

    Definition Classes
    Dataset → Dataset
  66. def flatMap[U](func: (T) => IterableOnce[U])(implicit arg0: Encoder[U]): Dataset[U]

    <invalid inheritdoc annotation>

    <invalid inheritdoc annotation>

    Definition Classes
    Dataset → Dataset
  67. def foreach(func: ForeachFunction[T]): Unit
    Definition Classes
    Dataset
  68. def foreach(f: (T) => Unit): Unit
    Definition Classes
    Dataset
  69. def foreachPartition(func: ForeachPartitionFunction[T]): Unit

    <invalid inheritdoc annotation>

    <invalid inheritdoc annotation>

    Definition Classes
    Dataset → Dataset
  70. def foreachPartition(f: (Iterator[T]) => Unit): Unit

    <invalid inheritdoc annotation>

    <invalid inheritdoc annotation>

    Definition Classes
    Dataset → Dataset
  71. final def getClass(): Class[_ <: AnyRef]
    Definition Classes
    AnyRef → Any
    Annotations
    @IntrinsicCandidate() @native()
  72. def groupBy(col1: String, cols: String*): RelationalGroupedDataset

    <invalid inheritdoc annotation>

    <invalid inheritdoc annotation>

    Definition Classes
    Dataset → Dataset
    Annotations
    @varargs()
  73. def groupBy(cols: Column*): RelationalGroupedDataset

    <invalid inheritdoc annotation>

    <invalid inheritdoc annotation>

    Definition Classes
    Dataset → Dataset
    Annotations
    @varargs()
  74. def groupByKey[K](func: MapFunction[T, K], encoder: Encoder[K]): KeyValueGroupedDataset[K, T]

    <invalid inheritdoc annotation>

    <invalid inheritdoc annotation>

    Definition Classes
    Dataset → Dataset
  75. def groupByKey[K](func: (T) => K)(implicit arg0: Encoder[K]): KeyValueGroupedDataset[K, T]

    <invalid inheritdoc annotation>

    <invalid inheritdoc annotation>

    Definition Classes
    Dataset → Dataset
  76. def groupingSets(groupingSets: Seq[Seq[Column]], cols: Column*): RelationalGroupedDataset

    <invalid inheritdoc annotation>

    <invalid inheritdoc annotation>

    Definition Classes
    Dataset → Dataset
    Annotations
    @varargs()
  77. def hashCode(): Int
    Definition Classes
    AnyRef → Any
    Annotations
    @IntrinsicCandidate() @native()
  78. def head(n: Int): Array[T]

    <invalid inheritdoc annotation>

    <invalid inheritdoc annotation>

    Definition Classes
    Dataset → Dataset
  79. def head(): T
    Definition Classes
    Dataset
  80. def hint(name: String, parameters: Any*): Dataset[T]

    <invalid inheritdoc annotation>

    <invalid inheritdoc annotation>

    Definition Classes
    Dataset → Dataset
    Annotations
    @varargs()
  81. def inputFiles: Array[String]

    <invalid inheritdoc annotation>

    <invalid inheritdoc annotation>

    Definition Classes
    Dataset → Dataset
  82. def intersect(other: sql.Dataset[T]): Dataset[T]

    <invalid inheritdoc annotation>

    <invalid inheritdoc annotation>

    Definition Classes
    Dataset → Dataset
  83. def intersectAll(other: sql.Dataset[T]): Dataset[T]

    <invalid inheritdoc annotation>

    <invalid inheritdoc annotation>

    Definition Classes
    Dataset → Dataset
  84. def isEmpty: Boolean

    <invalid inheritdoc annotation>

    <invalid inheritdoc annotation>

    Definition Classes
    Dataset → Dataset
  85. final def isInstanceOf[T0]: Boolean
    Definition Classes
    Any
  86. def isLocal: Boolean

    <invalid inheritdoc annotation>

    <invalid inheritdoc annotation>

    Definition Classes
    Dataset → Dataset
  87. def isStreaming: Boolean

    <invalid inheritdoc annotation>

    <invalid inheritdoc annotation>

    Definition Classes
    Dataset → Dataset
  88. def javaRDD: JavaRDD[T]
    Definition Classes
    Dataset
    Annotations
    @ClassicOnly()
  89. def join(right: sql.Dataset[_], joinExprs: Column): DataFrame

    <invalid inheritdoc annotation>

    <invalid inheritdoc annotation>

    Definition Classes
    Dataset → Dataset
  90. def join(right: sql.Dataset[_], usingColumns: Array[String], joinType: String): DataFrame

    <invalid inheritdoc annotation>

    <invalid inheritdoc annotation>

    Definition Classes
    Dataset → Dataset
  91. def join(right: sql.Dataset[_], usingColumn: String, joinType: String): DataFrame

    <invalid inheritdoc annotation>

    <invalid inheritdoc annotation>

    Definition Classes
    Dataset → Dataset
  92. def join(right: sql.Dataset[_], usingColumns: Seq[String]): DataFrame

    <invalid inheritdoc annotation>

    <invalid inheritdoc annotation>

    Definition Classes
    Dataset → Dataset
  93. def join(right: sql.Dataset[_], usingColumns: Array[String]): DataFrame

    <invalid inheritdoc annotation>

    <invalid inheritdoc annotation>

    Definition Classes
    Dataset → Dataset
  94. def join(right: sql.Dataset[_], usingColumn: String): DataFrame

    <invalid inheritdoc annotation>

    <invalid inheritdoc annotation>

    Definition Classes
    Dataset → Dataset
  95. def join(right: sql.Dataset[_], joinExprs: Column, joinType: String): DataFrame

    <invalid inheritdoc annotation>

    <invalid inheritdoc annotation>

    Definition Classes
    Dataset → Dataset
  96. def join(right: sql.Dataset[_], usingColumns: Seq[String], joinType: String): DataFrame

    <invalid inheritdoc annotation>

    <invalid inheritdoc annotation>

    Definition Classes
    Dataset → Dataset
  97. def join(right: sql.Dataset[_]): DataFrame

    <invalid inheritdoc annotation>

    <invalid inheritdoc annotation>

    Definition Classes
    Dataset → Dataset
  98. def joinWith[U](other: sql.Dataset[U], condition: Column): Dataset[(T, U)]

    <invalid inheritdoc annotation>

    <invalid inheritdoc annotation>

    Definition Classes
    Dataset → Dataset
  99. def joinWith[U](other: sql.Dataset[U], condition: Column, joinType: String): Dataset[(T, U)]

    <invalid inheritdoc annotation>

    <invalid inheritdoc annotation>

    Definition Classes
    Dataset → Dataset
  100. def lateralJoin(right: sql.Dataset[_], joinExprs: Column, joinType: String): DataFrame

    <invalid inheritdoc annotation>

    <invalid inheritdoc annotation>

    Definition Classes
    Dataset → Dataset
  101. def lateralJoin(right: sql.Dataset[_], joinType: String): DataFrame

    <invalid inheritdoc annotation>

    <invalid inheritdoc annotation>

    Definition Classes
    Dataset → Dataset
  102. def lateralJoin(right: sql.Dataset[_], joinExprs: Column): DataFrame

    <invalid inheritdoc annotation>

    <invalid inheritdoc annotation>

    Definition Classes
    Dataset → Dataset
  103. def lateralJoin(right: sql.Dataset[_]): DataFrame

    <invalid inheritdoc annotation>

    <invalid inheritdoc annotation>

    Definition Classes
    Dataset → Dataset
  104. def limit(n: Int): Dataset[T]

    <invalid inheritdoc annotation>

    <invalid inheritdoc annotation>

    Definition Classes
    Dataset → Dataset
  105. def localCheckpoint(eager: Boolean, storageLevel: StorageLevel): Dataset[T]

    <invalid inheritdoc annotation>

    <invalid inheritdoc annotation>

    Definition Classes
    Dataset → Dataset
  106. def localCheckpoint(eager: Boolean): Dataset[T]

    <invalid inheritdoc annotation>

    <invalid inheritdoc annotation>

    Definition Classes
    Dataset → Dataset
  107. def localCheckpoint(): Dataset[T]

    <invalid inheritdoc annotation>

    <invalid inheritdoc annotation>

    Definition Classes
    Dataset → Dataset
  108. def map[U](f: MapFunction[T, U], encoder: Encoder[U]): Dataset[U]

    <invalid inheritdoc annotation>

    <invalid inheritdoc annotation>

    Definition Classes
    Dataset → Dataset
  109. def map[U](f: (T) => U)(implicit arg0: Encoder[U]): Dataset[U]

    <invalid inheritdoc annotation>

    <invalid inheritdoc annotation>

    Definition Classes
    Dataset → Dataset
  110. def mapPartitions[U](f: MapPartitionsFunction[T, U], encoder: Encoder[U]): Dataset[U]

    <invalid inheritdoc annotation>

    <invalid inheritdoc annotation>

    Definition Classes
    Dataset → Dataset
  111. def mapPartitions[U](func: (Iterator[T]) => Iterator[U])(implicit arg0: Encoder[U]): Dataset[U]

    <invalid inheritdoc annotation>

    <invalid inheritdoc annotation>

    Definition Classes
    Dataset → Dataset
  112. def melt(ids: Array[Column], variableColumnName: String, valueColumnName: String): DataFrame

    <invalid inheritdoc annotation>

    <invalid inheritdoc annotation>

    Definition Classes
    Dataset → Dataset
  113. def melt(ids: Array[Column], values: Array[Column], variableColumnName: String, valueColumnName: String): DataFrame
    Definition Classes
    Dataset → Dataset
  114. def mergeInto(table: String, condition: Column): MergeIntoWriter[T]

    <invalid inheritdoc annotation>

    <invalid inheritdoc annotation>

    Definition Classes
    Dataset → Dataset
  115. def metadataColumn(colName: String): Column

    <invalid inheritdoc annotation>

    <invalid inheritdoc annotation>

    Definition Classes
    Dataset → Dataset
  116. def na: DataFrameNaFunctions

    <invalid inheritdoc annotation>

    <invalid inheritdoc annotation>

    Definition Classes
    Dataset → Dataset
  117. final def ne(arg0: AnyRef): Boolean
    Definition Classes
    AnyRef
  118. final def notify(): Unit
    Definition Classes
    AnyRef
    Annotations
    @IntrinsicCandidate() @native()
  119. final def notifyAll(): Unit
    Definition Classes
    AnyRef
    Annotations
    @IntrinsicCandidate() @native()
  120. def observe(observation: Observation, expr: Column, exprs: Column*): Dataset[T]

    <invalid inheritdoc annotation>

    <invalid inheritdoc annotation>

    Definition Classes
    Dataset → Dataset
    Annotations
    @varargs()
  121. def observe(name: String, expr: Column, exprs: Column*): Dataset[T]

    <invalid inheritdoc annotation>

    <invalid inheritdoc annotation>

    Definition Classes
    Dataset → Dataset
    Annotations
    @varargs()
  122. def offset(n: Int): Dataset[T]

    <invalid inheritdoc annotation>

    <invalid inheritdoc annotation>

    Definition Classes
    Dataset → Dataset
  123. def orderBy(sortExprs: Column*): Dataset[T]

    <invalid inheritdoc annotation>

    <invalid inheritdoc annotation>

    Definition Classes
    Dataset → Dataset
    Annotations
    @varargs()
  124. def orderBy(sortCol: String, sortCols: String*): Dataset[T]

    <invalid inheritdoc annotation>

    <invalid inheritdoc annotation>

    Definition Classes
    Dataset → Dataset
    Annotations
    @varargs()
  125. def persist(newLevel: StorageLevel): Dataset.this.type

    <invalid inheritdoc annotation>

    <invalid inheritdoc annotation>

    Definition Classes
    Dataset → Dataset
  126. def persist(): Dataset.this.type

    <invalid inheritdoc annotation>

    <invalid inheritdoc annotation>

    Definition Classes
    Dataset → Dataset
  127. val plan: Plan
  128. def printSchema(level: Int): Unit
    Definition Classes
    Dataset
  129. def printSchema(): Unit
    Definition Classes
    Dataset
  130. def queryExecution: QueryExecution
    Definition Classes
    Dataset → Dataset
  131. def randomSplit(weights: Array[Double]): Array[sql.Dataset[T]]

    <invalid inheritdoc annotation>

    <invalid inheritdoc annotation>

    Definition Classes
    Dataset → Dataset
  132. def randomSplit(weights: Array[Double], seed: Long): Array[sql.Dataset[T]]

    <invalid inheritdoc annotation>

    <invalid inheritdoc annotation>

    Definition Classes
    Dataset → Dataset
  133. def randomSplitAsList(weights: Array[Double], seed: Long): List[sql.Dataset[T]]

    <invalid inheritdoc annotation>

    <invalid inheritdoc annotation>

    Definition Classes
    Dataset → Dataset
  134. def rdd: RDD[T]

    <invalid inheritdoc annotation>

    <invalid inheritdoc annotation>

    Definition Classes
    Dataset → Dataset
  135. def reduce(func: (T, T) => T): T

    <invalid inheritdoc annotation>

    <invalid inheritdoc annotation>

    Definition Classes
    Dataset → Dataset
  136. def reduce(func: ReduceFunction[T]): T
    Definition Classes
    Dataset
  137. def repartition(partitionExprs: Column*): Dataset[T]

    <invalid inheritdoc annotation>

    <invalid inheritdoc annotation>

    Definition Classes
    Dataset → Dataset
    Annotations
    @varargs()
  138. def repartition(numPartitions: Int, partitionExprs: Column*): Dataset[T]

    <invalid inheritdoc annotation>

    <invalid inheritdoc annotation>

    Definition Classes
    Dataset → Dataset
    Annotations
    @varargs()
  139. def repartition(numPartitions: Int): Dataset[T]

    <invalid inheritdoc annotation>

    <invalid inheritdoc annotation>

    Definition Classes
    Dataset → Dataset
  140. def repartitionByExpression(numPartitions: Option[Int], partitionExprs: Seq[Column]): Dataset[T]
    Attributes
    protected[this]
    Definition Classes
    Dataset → Dataset
  141. def repartitionByRange(partitionExprs: Column*): Dataset[T]

    <invalid inheritdoc annotation>

    <invalid inheritdoc annotation>

    Definition Classes
    Dataset → Dataset
    Annotations
    @varargs()
  142. def repartitionByRange(numPartitions: Int, partitionExprs: Column*): Dataset[T]

    <invalid inheritdoc annotation>

    <invalid inheritdoc annotation>

    Definition Classes
    Dataset → Dataset
    Annotations
    @varargs()
  143. def repartitionByRange(numPartitions: Option[Int], partitionExprs: Seq[Column]): Dataset[T]
    Attributes
    protected
    Definition Classes
    Dataset → Dataset
  144. def rollup(col1: String, cols: String*): RelationalGroupedDataset

    <invalid inheritdoc annotation>

    <invalid inheritdoc annotation>

    Definition Classes
    Dataset → Dataset
    Annotations
    @varargs()
  145. def rollup(cols: Column*): RelationalGroupedDataset

    <invalid inheritdoc annotation>

    <invalid inheritdoc annotation>

    Definition Classes
    Dataset → Dataset
    Annotations
    @varargs()
  146. def sameSemantics(other: sql.Dataset[T]): Boolean

    <invalid inheritdoc annotation>

    <invalid inheritdoc annotation>

    Definition Classes
    Dataset → Dataset
    Annotations
    @DeveloperApi()
  147. def sample(withReplacement: Boolean, fraction: Double): Dataset[T]

    <invalid inheritdoc annotation>

    <invalid inheritdoc annotation>

    Definition Classes
    Dataset → Dataset
  148. def sample(fraction: Double): Dataset[T]

    <invalid inheritdoc annotation>

    <invalid inheritdoc annotation>

    Definition Classes
    Dataset → Dataset
  149. def sample(fraction: Double, seed: Long): Dataset[T]

    <invalid inheritdoc annotation>

    <invalid inheritdoc annotation>

    Definition Classes
    Dataset → Dataset
  150. def sample(withReplacement: Boolean, fraction: Double, seed: Long): Dataset[T]

    <invalid inheritdoc annotation>

    <invalid inheritdoc annotation>

    Definition Classes
    Dataset → Dataset
  151. def scalar(): Column

    <invalid inheritdoc annotation>

    <invalid inheritdoc annotation>

    Definition Classes
    Dataset → Dataset
  152. def schema: StructType

    <invalid inheritdoc annotation>

    <invalid inheritdoc annotation>

    Definition Classes
    Dataset → Dataset
  153. def select[U1, U2, U3, U4, U5](c1: TypedColumn[T, U1], c2: TypedColumn[T, U2], c3: TypedColumn[T, U3], c4: TypedColumn[T, U4], c5: TypedColumn[T, U5]): Dataset[(U1, U2, U3, U4, U5)]

    <invalid inheritdoc annotation>

    <invalid inheritdoc annotation>

    Definition Classes
    Dataset → Dataset
  154. def select[U1, U2, U3, U4](c1: TypedColumn[T, U1], c2: TypedColumn[T, U2], c3: TypedColumn[T, U3], c4: TypedColumn[T, U4]): Dataset[(U1, U2, U3, U4)]

    <invalid inheritdoc annotation>

    <invalid inheritdoc annotation>

    Definition Classes
    Dataset → Dataset
  155. def select[U1, U2, U3](c1: TypedColumn[T, U1], c2: TypedColumn[T, U2], c3: TypedColumn[T, U3]): Dataset[(U1, U2, U3)]

    <invalid inheritdoc annotation>

    <invalid inheritdoc annotation>

    Definition Classes
    Dataset → Dataset
  156. def select[U1, U2](c1: TypedColumn[T, U1], c2: TypedColumn[T, U2]): Dataset[(U1, U2)]

    <invalid inheritdoc annotation>

    <invalid inheritdoc annotation>

    Definition Classes
    Dataset → Dataset
  157. def select(col: String, cols: String*): DataFrame

    <invalid inheritdoc annotation>

    <invalid inheritdoc annotation>

    Definition Classes
    Dataset → Dataset
    Annotations
    @varargs()
  158. def select[U1](c1: TypedColumn[T, U1]): Dataset[U1]

    <invalid inheritdoc annotation>

    <invalid inheritdoc annotation>

    Definition Classes
    Dataset → Dataset
  159. def select(cols: Column*): DataFrame

    <invalid inheritdoc annotation>

    <invalid inheritdoc annotation>

    Definition Classes
    Dataset → Dataset
    Annotations
    @varargs()
  160. def selectExpr(exprs: String*): DataFrame

    <invalid inheritdoc annotation>

    <invalid inheritdoc annotation>

    Definition Classes
    Dataset → Dataset
    Annotations
    @varargs()
  161. def selectUntyped(columns: TypedColumn[_, _]*): Dataset[_]

    <invalid inheritdoc annotation>

    <invalid inheritdoc annotation>

    Attributes
    protected
    Definition Classes
    Dataset → Dataset
  162. def semanticHash(): Int

    <invalid inheritdoc annotation>

    <invalid inheritdoc annotation>

    Definition Classes
    Dataset → Dataset
    Annotations
    @DeveloperApi()
  163. def show(numRows: Int, truncate: Int, vertical: Boolean): Unit

    <invalid inheritdoc annotation>

    <invalid inheritdoc annotation>

    Definition Classes
    Dataset → Dataset
  164. def show(numRows: Int, truncate: Boolean): Unit

    <invalid inheritdoc annotation>

    <invalid inheritdoc annotation>

    Definition Classes
    Dataset → Dataset
  165. def show(numRows: Int, truncate: Int): Unit
    Definition Classes
    Dataset
  166. def show(truncate: Boolean): Unit
    Definition Classes
    Dataset
  167. def show(): Unit
    Definition Classes
    Dataset
  168. def show(numRows: Int): Unit
    Definition Classes
    Dataset
  169. def sort(sortExprs: Column*): Dataset[T]

    <invalid inheritdoc annotation>

    <invalid inheritdoc annotation>

    Definition Classes
    Dataset → Dataset
    Annotations
    @varargs()
  170. def sort(sortCol: String, sortCols: String*): Dataset[T]

    <invalid inheritdoc annotation>

    <invalid inheritdoc annotation>

    Definition Classes
    Dataset → Dataset
    Annotations
    @varargs()
  171. def sortInternal(global: Boolean, sortCols: Seq[Column]): Dataset[T]
    Attributes
    protected
    Definition Classes
    Dataset → Dataset
  172. def sortWithinPartitions(sortExprs: Column*): Dataset[T]

    <invalid inheritdoc annotation>

    <invalid inheritdoc annotation>

    Definition Classes
    Dataset → Dataset
    Annotations
    @varargs()
  173. def sortWithinPartitions(sortCol: String, sortCols: String*): Dataset[T]

    <invalid inheritdoc annotation>

    <invalid inheritdoc annotation>

    Definition Classes
    Dataset → Dataset
    Annotations
    @varargs()
  174. val sparkSession: SparkSession
    Definition Classes
    Dataset → Dataset
  175. def stat: DataFrameStatFunctions

    <invalid inheritdoc annotation>

    <invalid inheritdoc annotation>

    Definition Classes
    Dataset → Dataset
  176. def storageLevel: StorageLevel

    <invalid inheritdoc annotation>

    <invalid inheritdoc annotation>

    Definition Classes
    Dataset → Dataset
  177. def summary(statistics: String*): DataFrame

    <invalid inheritdoc annotation>

    <invalid inheritdoc annotation>

    Definition Classes
    Dataset → Dataset
    Annotations
    @varargs()
  178. final def synchronized[T0](arg0: => T0): T0
    Definition Classes
    AnyRef
  179. def tail(n: Int): Array[T]

    <invalid inheritdoc annotation>

    <invalid inheritdoc annotation>

    Definition Classes
    Dataset → Dataset
  180. def take(n: Int): Array[T]
    Definition Classes
    Dataset
  181. def takeAsList(n: Int): List[T]
    Definition Classes
    Dataset
  182. def to(schema: StructType): DataFrame

    <invalid inheritdoc annotation>

    <invalid inheritdoc annotation>

    Definition Classes
    Dataset → Dataset
  183. def toDF(colNames: String*): DataFrame

    <invalid inheritdoc annotation>

    <invalid inheritdoc annotation>

    Definition Classes
    Dataset → Dataset
    Annotations
    @varargs()
  184. def toDF(): DataFrame

    <invalid inheritdoc annotation>

    <invalid inheritdoc annotation>

    Definition Classes
    Dataset → Dataset
  185. def toJSON: Dataset[String]

    <invalid inheritdoc annotation>

    <invalid inheritdoc annotation>

    Definition Classes
    Dataset → Dataset
  186. def toJavaRDD: JavaRDD[T]

    <invalid inheritdoc annotation>

    <invalid inheritdoc annotation>

    Definition Classes
    Dataset → Dataset
  187. def toLocalIterator(): Iterator[T]

    <invalid inheritdoc annotation>

    <invalid inheritdoc annotation>

    Definition Classes
    Dataset → Dataset
  188. def toString(): String
    Definition Classes
    Dataset → AnyRef → Any
  189. def transform[U, DSO[_] <: sql.Dataset[_]](t: (Dataset.this.type) => DSO[U]): DSO[U]
    Definition Classes
    Dataset
  190. def transpose(): DataFrame

    <invalid inheritdoc annotation>

    <invalid inheritdoc annotation>

    Definition Classes
    Dataset → Dataset
  191. def transpose(indexColumn: Column): DataFrame

    <invalid inheritdoc annotation>

    <invalid inheritdoc annotation>

    Definition Classes
    Dataset → Dataset
  192. def union(other: sql.Dataset[T]): Dataset[T]

    <invalid inheritdoc annotation>

    <invalid inheritdoc annotation>

    Definition Classes
    Dataset → Dataset
  193. def unionAll(other: sql.Dataset[T]): Dataset[T]

    <invalid inheritdoc annotation>

    <invalid inheritdoc annotation>

    Definition Classes
    Dataset → Dataset
  194. def unionByName(other: sql.Dataset[T]): Dataset[T]

    <invalid inheritdoc annotation>

    <invalid inheritdoc annotation>

    Definition Classes
    Dataset → Dataset
  195. def unionByName(other: sql.Dataset[T], allowMissingColumns: Boolean): Dataset[T]

    <invalid inheritdoc annotation>

    <invalid inheritdoc annotation>

    Definition Classes
    Dataset → Dataset
  196. def unpersist(): Dataset.this.type

    <invalid inheritdoc annotation>

    <invalid inheritdoc annotation>

    Definition Classes
    Dataset → Dataset
  197. def unpersist(blocking: Boolean): Dataset.this.type

    <invalid inheritdoc annotation>

    <invalid inheritdoc annotation>

    Definition Classes
    Dataset → Dataset
  198. def unpivot(ids: Array[Column], variableColumnName: String, valueColumnName: String): DataFrame

    <invalid inheritdoc annotation>

    <invalid inheritdoc annotation>

    Definition Classes
    Dataset → Dataset
  199. def unpivot(ids: Array[Column], values: Array[Column], variableColumnName: String, valueColumnName: String): DataFrame

    <invalid inheritdoc annotation>

    <invalid inheritdoc annotation>

    Definition Classes
    Dataset → Dataset
  200. final def wait(arg0: Long, arg1: Int): Unit
    Definition Classes
    AnyRef
    Annotations
    @throws(classOf[java.lang.InterruptedException])
  201. final def wait(arg0: Long): Unit
    Definition Classes
    AnyRef
    Annotations
    @throws(classOf[java.lang.InterruptedException]) @native()
  202. final def wait(): Unit
    Definition Classes
    AnyRef
    Annotations
    @throws(classOf[java.lang.InterruptedException])
  203. def where(conditionExpr: String): Dataset[T]

    <invalid inheritdoc annotation>

    <invalid inheritdoc annotation>

    Definition Classes
    Dataset → Dataset
  204. def where(condition: Column): Dataset[T]

    <invalid inheritdoc annotation>

    <invalid inheritdoc annotation>

    Definition Classes
    Dataset → Dataset
  205. def withColumn(colName: String, col: Column): DataFrame

    <invalid inheritdoc annotation>

    <invalid inheritdoc annotation>

    Definition Classes
    Dataset → Dataset
  206. def withColumnRenamed(existingName: String, newName: String): DataFrame

    <invalid inheritdoc annotation>

    <invalid inheritdoc annotation>

    Definition Classes
    Dataset → Dataset
  207. def withColumns(colsMap: Map[String, Column]): DataFrame

    <invalid inheritdoc annotation>

    <invalid inheritdoc annotation>

    Definition Classes
    Dataset → Dataset
  208. def withColumns(colsMap: Map[String, Column]): DataFrame

    <invalid inheritdoc annotation>

    <invalid inheritdoc annotation>

    Definition Classes
    Dataset → Dataset
  209. def withColumnsRenamed(colsMap: Map[String, String]): DataFrame

    <invalid inheritdoc annotation>

    <invalid inheritdoc annotation>

    Definition Classes
    Dataset → Dataset
  210. def withColumnsRenamed(colsMap: Map[String, String]): DataFrame

    <invalid inheritdoc annotation>

    <invalid inheritdoc annotation>

    Definition Classes
    Dataset → Dataset
  211. def withColumnsRenamed(colNames: Seq[String], newColNames: Seq[String]): DataFrame
    Attributes
    protected
    Definition Classes
    Dataset → Dataset
  212. def withMetadata(columnName: String, metadata: Metadata): DataFrame

    <invalid inheritdoc annotation>

    <invalid inheritdoc annotation>

    Definition Classes
    Dataset → Dataset
  213. def withWatermark(eventTime: String, delayThreshold: String): Dataset[T]

    <invalid inheritdoc annotation>

    <invalid inheritdoc annotation>

    Definition Classes
    Dataset → Dataset
  214. def write: DataFrameWriter[T]

    <invalid inheritdoc annotation>

    <invalid inheritdoc annotation>

    Definition Classes
    Dataset → Dataset
  215. def writeStream: DataStreamWriter[T]

    <invalid inheritdoc annotation>

    <invalid inheritdoc annotation>

    Definition Classes
    Dataset → Dataset
  216. def writeTo(table: String): DataFrameWriterV2[T]

    <invalid inheritdoc annotation>

    <invalid inheritdoc annotation>

    Definition Classes
    Dataset → Dataset

Deprecated Value Members

  1. def explode[A, B](inputColumn: String, outputColumn: String)(f: (A) => IterableOnce[B])(implicit arg0: scala.reflect.api.JavaUniverse.TypeTag[B]): DataFrame

    <invalid inheritdoc annotation>

    <invalid inheritdoc annotation>

    Definition Classes
    Dataset → Dataset
    Annotations
    @deprecated
    Deprecated

    (Since version 3.5.0) use flatMap() or select() with functions.explode() instead

  2. def explode[A <: Product](input: Column*)(f: (Row) => IterableOnce[A])(implicit arg0: scala.reflect.api.JavaUniverse.TypeTag[A]): DataFrame

    <invalid inheritdoc annotation>

    <invalid inheritdoc annotation>

    Definition Classes
    Dataset → Dataset
    Annotations
    @deprecated
    Deprecated

    (Since version 3.5.0) use flatMap() or select() with functions.explode() instead

  3. def finalize(): Unit
    Attributes
    protected[lang]
    Definition Classes
    AnyRef
    Annotations
    @throws(classOf[java.lang.Throwable]) @Deprecated
    Deprecated

    (Since version 9)

  4. def registerTempTable(tableName: String): Unit
    Definition Classes
    Dataset
    Annotations
    @deprecated
    Deprecated

    (Since version 2.0.0) Use createOrReplaceTempView(viewName) instead.

Inherited from sql.Dataset[T]

Inherited from Serializable

Inherited from AnyRef

Inherited from Any

Ungrouped