Packages

class Dataset[T] extends sql.api.Dataset[T, Dataset]

A Dataset is a strongly typed collection of domain-specific objects that can be transformed in parallel using functional or relational operations. Each Dataset also has an untyped view called a DataFrame, which is a Dataset of Row.

Operations available on Datasets are divided into transformations and actions. Transformations are the ones that produce new Datasets, and actions are the ones that trigger computation and return results. Example transformations include map, filter, select, and aggregate (groupBy). Example actions count, show, or writing data out to file systems.

Datasets are "lazy", i.e. computations are only triggered when an action is invoked. Internally, a Dataset represents a logical plan that describes the computation required to produce the data. When an action is invoked, Spark's query optimizer optimizes the logical plan and generates a physical plan for efficient execution in a parallel and distributed manner. To explore the logical plan as well as optimized physical plan, use the explain function.

To efficiently support domain-specific objects, an Encoder is required. The encoder maps the domain specific type T to Spark's internal type system. For example, given a class Person with two fields, name (string) and age (int), an encoder is used to tell Spark to generate code at runtime to serialize the Person object into a binary structure. This binary structure often has much lower memory footprint as well as are optimized for efficiency in data processing (e.g. in a columnar format). To understand the internal binary representation for data, use the schema function.

There are typically two ways to create a Dataset. The most common way is by pointing Spark to some files on storage systems, using the read function available on a SparkSession.

val people = spark.read.parquet("...").as[Person]  // Scala
Dataset<Person> people = spark.read().parquet("...").as(Encoders.bean(Person.class)); // Java

Datasets can also be created through transformations available on existing Datasets. For example, the following creates a new Dataset by applying a filter on the existing one:

val names = people.map(_.name)  // in Scala; names is a Dataset[String]
Dataset<String> names = people.map(
  (MapFunction<Person, String>) p -> p.name, Encoders.STRING()); // Java

Dataset operations can also be untyped, through various domain-specific-language (DSL) functions defined in: Dataset (this class), Column, and functions. These operations are very similar to the operations available in the data frame abstraction in R or Python.

To select a column from the Dataset, use apply method in Scala and col in Java.

val ageCol = people("age")  // in Scala
Column ageCol = people.col("age"); // in Java

Note that the Column type can also be manipulated through its various functions.

// The following creates a new column that increases everybody's age by 10.
people("age") + 10  // in Scala
people.col("age").plus(10);  // in Java

A more concrete example in Scala:

// To create Dataset[Row] using SparkSession
val people = spark.read.parquet("...")
val department = spark.read.parquet("...")

people.filter("age > 30")
  .join(department, people("deptId") === department("id"))
  .groupBy(department("name"), people("gender"))
  .agg(avg(people("salary")), max(people("age")))

and in Java:

// To create Dataset<Row> using SparkSession
Dataset<Row> people = spark.read().parquet("...");
Dataset<Row> department = spark.read().parquet("...");

people.filter(people.col("age").gt(30))
  .join(department, people.col("deptId").equalTo(department.col("id")))
  .groupBy(department.col("name"), people.col("gender"))
  .agg(avg(people.col("salary")), max(people.col("age")));
Annotations
@Stable()
Since

1.6.0

Linear Supertypes
api.Dataset[T, Dataset], Serializable, AnyRef, Any
Ordering
  1. Alphabetic
  2. By Inheritance
Inherited
  1. Dataset
  2. Dataset
  3. Serializable
  4. AnyRef
  5. Any
  1. Hide All
  2. Show All
Visibility
  1. Public
  2. Protected

Instance Constructors

  1. new Dataset(sqlContext: SQLContext, logicalPlan: LogicalPlan, encoder: Encoder[T])
  2. new Dataset(sparkSession: SparkSession, logicalPlan: LogicalPlan, encoder: Encoder[T])

Type Members

  1. type RGD = RelationalGroupedDataset
    Definition Classes
    Dataset → Dataset

Value Members

  1. final def !=(arg0: Any): Boolean
    Definition Classes
    AnyRef → Any
  2. final def ##: Int
    Definition Classes
    AnyRef → Any
  3. final def ==(arg0: Any): Boolean
    Definition Classes
    AnyRef → Any
  4. def agg(expr: Column, exprs: Column*): DataFrame

    <invalid inheritdoc annotation>

    <invalid inheritdoc annotation>

    Definition Classes
    Dataset → Dataset
    Annotations
    @varargs()
  5. def agg(exprs: Map[String, String]): DataFrame

    <invalid inheritdoc annotation>

    <invalid inheritdoc annotation>

    Definition Classes
    Dataset → Dataset
  6. def agg(exprs: Map[String, String]): DataFrame

    <invalid inheritdoc annotation>

    <invalid inheritdoc annotation>

    Definition Classes
    Dataset → Dataset
  7. def agg(aggExpr: (String, String), aggExprs: (String, String)*): DataFrame

    <invalid inheritdoc annotation>

    <invalid inheritdoc annotation>

    Definition Classes
    Dataset → Dataset
  8. def alias(alias: Symbol): Dataset[T]

    <invalid inheritdoc annotation>

    <invalid inheritdoc annotation>

    Definition Classes
    Dataset → Dataset
  9. def alias(alias: String): Dataset[T]

    <invalid inheritdoc annotation>

    <invalid inheritdoc annotation>

    Definition Classes
    Dataset → Dataset
  10. def apply(colName: String): Column
    Definition Classes
    Dataset
  11. def as(alias: Symbol): Dataset[T]

    <invalid inheritdoc annotation>

    <invalid inheritdoc annotation>

    Definition Classes
    Dataset → Dataset
  12. def as(alias: String): Dataset[T]

    <invalid inheritdoc annotation>

    <invalid inheritdoc annotation>

    Definition Classes
    Dataset → Dataset
  13. def as[U](implicit arg0: Encoder[U]): Dataset[U]

    <invalid inheritdoc annotation>

    <invalid inheritdoc annotation>

    Definition Classes
    Dataset → Dataset
  14. final def asInstanceOf[T0]: T0
    Definition Classes
    Any
  15. def cache(): Dataset.this.type

    <invalid inheritdoc annotation>

    <invalid inheritdoc annotation>

    Definition Classes
    Dataset → Dataset
  16. def checkpoint(eager: Boolean): Dataset[T]

    <invalid inheritdoc annotation>

    <invalid inheritdoc annotation>

    Definition Classes
    Dataset → Dataset
  17. def checkpoint(): Dataset[T]

    <invalid inheritdoc annotation>

    <invalid inheritdoc annotation>

    Definition Classes
    Dataset → Dataset
  18. def checkpoint(eager: Boolean, reliableCheckpoint: Boolean): Dataset[T]

    <invalid inheritdoc annotation>

    <invalid inheritdoc annotation>

    Attributes
    protected[sql]
    Definition Classes
    Dataset → Dataset
  19. def clone(): AnyRef
    Attributes
    protected[lang]
    Definition Classes
    AnyRef
    Annotations
    @throws(classOf[java.lang.CloneNotSupportedException]) @IntrinsicCandidate() @native()
  20. def coalesce(numPartitions: Int): Dataset[T]

    <invalid inheritdoc annotation>

    <invalid inheritdoc annotation>

    Definition Classes
    Dataset → Dataset
  21. def col(colName: String): Column

    <invalid inheritdoc annotation>

    <invalid inheritdoc annotation>

    Definition Classes
    Dataset → Dataset
  22. def colRegex(colName: String): Column

    <invalid inheritdoc annotation>

    <invalid inheritdoc annotation>

    Definition Classes
    Dataset → Dataset
  23. def collect(): Array[T]

    <invalid inheritdoc annotation>

    <invalid inheritdoc annotation>

    Definition Classes
    Dataset → Dataset
  24. def collectAsList(): List[T]

    <invalid inheritdoc annotation>

    <invalid inheritdoc annotation>

    Definition Classes
    Dataset → Dataset
  25. def columns: Array[String]
    Definition Classes
    Dataset
  26. def count(): Long

    <invalid inheritdoc annotation>

    <invalid inheritdoc annotation>

    Definition Classes
    Dataset → Dataset
  27. def createGlobalTempView(viewName: String): Unit
    Definition Classes
    Dataset
    Annotations
    @throws(scala.this.throws.<init>$default$1[org.apache.spark.sql.AnalysisException])
  28. def createOrReplaceGlobalTempView(viewName: String): Unit
    Definition Classes
    Dataset
  29. def createOrReplaceTempView(viewName: String): Unit
    Definition Classes
    Dataset
  30. def createTempView(viewName: String, replace: Boolean, global: Boolean): Unit
    Attributes
    protected
    Definition Classes
    Dataset → Dataset
  31. def createTempView(viewName: String): Unit
    Definition Classes
    Dataset
    Annotations
    @throws(scala.this.throws.<init>$default$1[org.apache.spark.sql.AnalysisException])
  32. def crossJoin(right: Dataset[_]): DataFrame

    <invalid inheritdoc annotation>

    <invalid inheritdoc annotation>

    Definition Classes
    Dataset → Dataset
  33. def cube(col1: String, cols: String*): RelationalGroupedDataset

    <invalid inheritdoc annotation>

    <invalid inheritdoc annotation>

    Definition Classes
    Dataset → Dataset
    Annotations
    @varargs()
  34. def cube(cols: Column*): RelationalGroupedDataset

    <invalid inheritdoc annotation>

    <invalid inheritdoc annotation>

    Definition Classes
    Dataset → Dataset
    Annotations
    @varargs()
  35. def describe(cols: String*): DataFrame

    <invalid inheritdoc annotation>

    <invalid inheritdoc annotation>

    Definition Classes
    Dataset → Dataset
    Annotations
    @varargs()
  36. def distinct(): Dataset[T]

    <invalid inheritdoc annotation>

    <invalid inheritdoc annotation>

    Definition Classes
    Dataset → Dataset
  37. def drop(col: Column): DataFrame

    <invalid inheritdoc annotation>

    <invalid inheritdoc annotation>

    Definition Classes
    Dataset → Dataset
  38. def drop(colName: String): DataFrame

    <invalid inheritdoc annotation>

    <invalid inheritdoc annotation>

    Definition Classes
    Dataset → Dataset
  39. def drop(col: Column, cols: Column*): DataFrame

    <invalid inheritdoc annotation>

    <invalid inheritdoc annotation>

    Definition Classes
    Dataset → Dataset
    Annotations
    @varargs()
  40. def drop(colNames: String*): DataFrame

    <invalid inheritdoc annotation>

    <invalid inheritdoc annotation>

    Definition Classes
    Dataset → Dataset
    Annotations
    @varargs()
  41. def dropDuplicates(col1: String, cols: String*): Dataset[T]

    <invalid inheritdoc annotation>

    <invalid inheritdoc annotation>

    Definition Classes
    Dataset → Dataset
    Annotations
    @varargs()
  42. def dropDuplicates(colNames: Array[String]): Dataset[T]

    <invalid inheritdoc annotation>

    <invalid inheritdoc annotation>

    Definition Classes
    Dataset → Dataset
  43. def dropDuplicates(colNames: Seq[String]): Dataset[T]

    <invalid inheritdoc annotation>

    <invalid inheritdoc annotation>

    Definition Classes
    Dataset → Dataset
  44. def dropDuplicates(): Dataset[T]

    <invalid inheritdoc annotation>

    <invalid inheritdoc annotation>

    Definition Classes
    Dataset → Dataset
  45. def dropDuplicatesWithinWatermark(col1: String, cols: String*): Dataset[T]

    <invalid inheritdoc annotation>

    <invalid inheritdoc annotation>

    Definition Classes
    Dataset → Dataset
    Annotations
    @varargs()
  46. def dropDuplicatesWithinWatermark(colNames: Array[String]): Dataset[T]

    <invalid inheritdoc annotation>

    <invalid inheritdoc annotation>

    Definition Classes
    Dataset → Dataset
  47. def dropDuplicatesWithinWatermark(colNames: Seq[String]): Dataset[T]

    <invalid inheritdoc annotation>

    <invalid inheritdoc annotation>

    Definition Classes
    Dataset → Dataset
  48. def dropDuplicatesWithinWatermark(): Dataset[T]

    <invalid inheritdoc annotation>

    <invalid inheritdoc annotation>

    Definition Classes
    Dataset → Dataset
  49. def dtypes: Array[(String, String)]
    Definition Classes
    Dataset
  50. val encoder: Encoder[T]
    Definition Classes
    Dataset → Dataset
  51. final def eq(arg0: AnyRef): Boolean
    Definition Classes
    AnyRef
  52. def equals(arg0: AnyRef): Boolean
    Definition Classes
    AnyRef → Any
  53. def except(other: Dataset[T]): Dataset[T]

    <invalid inheritdoc annotation>

    <invalid inheritdoc annotation>

    Definition Classes
    Dataset → Dataset
  54. def exceptAll(other: Dataset[T]): Dataset[T]

    <invalid inheritdoc annotation>

    <invalid inheritdoc annotation>

    Definition Classes
    Dataset → Dataset
  55. def explain(mode: String): Unit

    <invalid inheritdoc annotation>

    <invalid inheritdoc annotation>

    Definition Classes
    Dataset → Dataset
  56. def explain(): Unit
    Definition Classes
    Dataset
  57. def explain(extended: Boolean): Unit
    Definition Classes
    Dataset
  58. def filter(conditionExpr: String): Dataset[T]

    <invalid inheritdoc annotation>

    <invalid inheritdoc annotation>

    Definition Classes
    Dataset → Dataset
  59. def filter(func: FilterFunction[T]): Dataset[T]

    <invalid inheritdoc annotation>

    <invalid inheritdoc annotation>

    Definition Classes
    Dataset → Dataset
  60. def filter(func: (T) => Boolean): Dataset[T]

    <invalid inheritdoc annotation>

    <invalid inheritdoc annotation>

    Definition Classes
    Dataset → Dataset
  61. def filter(condition: Column): Dataset[T]

    <invalid inheritdoc annotation>

    <invalid inheritdoc annotation>

    Definition Classes
    Dataset → Dataset
  62. def first(): T
    Definition Classes
    Dataset
  63. def flatMap[U](f: FlatMapFunction[T, U], encoder: Encoder[U]): Dataset[U]

    <invalid inheritdoc annotation>

    <invalid inheritdoc annotation>

    Definition Classes
    Dataset → Dataset
  64. def flatMap[U](func: (T) => IterableOnce[U])(implicit arg0: Encoder[U]): Dataset[U]

    <invalid inheritdoc annotation>

    <invalid inheritdoc annotation>

    Definition Classes
    Dataset → Dataset
  65. def foreach(func: ForeachFunction[T]): Unit
    Definition Classes
    Dataset
  66. def foreach(f: (T) => Unit): Unit
    Definition Classes
    Dataset
  67. def foreachPartition(func: ForeachPartitionFunction[T]): Unit

    <invalid inheritdoc annotation>

    <invalid inheritdoc annotation>

    Definition Classes
    Dataset → Dataset
  68. def foreachPartition(f: (Iterator[T]) => Unit): Unit

    <invalid inheritdoc annotation>

    <invalid inheritdoc annotation>

    Definition Classes
    Dataset → Dataset
  69. final def getClass(): Class[_ <: AnyRef]
    Definition Classes
    AnyRef → Any
    Annotations
    @IntrinsicCandidate() @native()
  70. def groupBy(col1: String, cols: String*): RelationalGroupedDataset

    <invalid inheritdoc annotation>

    <invalid inheritdoc annotation>

    Definition Classes
    Dataset → Dataset
    Annotations
    @varargs()
  71. def groupBy(cols: Column*): RelationalGroupedDataset

    Groups the Dataset using the specified columns, so we can run aggregation on them.

    Groups the Dataset using the specified columns, so we can run aggregation on them. See RelationalGroupedDataset for all the available aggregate functions.

    // Compute the average for all numeric columns grouped by department.
    ds.groupBy($"department").avg()
    
    // Compute the max age and average salary, grouped by department and gender.
    ds.groupBy($"department", $"gender").agg(Map(
      "salary" -> "avg",
      "age" -> "max"
    ))
    Definition Classes
    Dataset → Dataset
    Annotations
    @varargs()
    Since

    2.0.0

  72. def groupByKey[K](func: MapFunction[T, K], encoder: Encoder[K]): KeyValueGroupedDataset[K, T]

    (Java-specific) Returns a KeyValueGroupedDataset where the data is grouped by the given key func.

    (Java-specific) Returns a KeyValueGroupedDataset where the data is grouped by the given key func.

    Since

    2.0.0

  73. def groupByKey[K](func: (T) => K)(implicit arg0: Encoder[K]): KeyValueGroupedDataset[K, T]

    (Scala-specific) Returns a KeyValueGroupedDataset where the data is grouped by the given key func.

    (Scala-specific) Returns a KeyValueGroupedDataset where the data is grouped by the given key func.

    Since

    2.0.0

  74. def groupingSets(groupingSets: Seq[Seq[Column]], cols: Column*): RelationalGroupedDataset

    <invalid inheritdoc annotation>

    <invalid inheritdoc annotation>

    Definition Classes
    Dataset → Dataset
    Annotations
    @varargs()
  75. def hashCode(): Int
    Definition Classes
    AnyRef → Any
    Annotations
    @IntrinsicCandidate() @native()
  76. def head(n: Int): Array[T]

    <invalid inheritdoc annotation>

    <invalid inheritdoc annotation>

    Definition Classes
    Dataset → Dataset
  77. def head(): T
    Definition Classes
    Dataset
  78. def hint(name: String, parameters: Any*): Dataset[T]

    <invalid inheritdoc annotation>

    <invalid inheritdoc annotation>

    Definition Classes
    Dataset → Dataset
    Annotations
    @varargs()
  79. def inputFiles: Array[String]

    <invalid inheritdoc annotation>

    <invalid inheritdoc annotation>

    Definition Classes
    Dataset → Dataset
  80. def intersect(other: Dataset[T]): Dataset[T]

    <invalid inheritdoc annotation>

    <invalid inheritdoc annotation>

    Definition Classes
    Dataset → Dataset
  81. def intersectAll(other: Dataset[T]): Dataset[T]

    <invalid inheritdoc annotation>

    <invalid inheritdoc annotation>

    Definition Classes
    Dataset → Dataset
  82. def isEmpty: Boolean

    <invalid inheritdoc annotation>

    <invalid inheritdoc annotation>

    Definition Classes
    Dataset → Dataset
  83. final def isInstanceOf[T0]: Boolean
    Definition Classes
    Any
  84. def isLocal: Boolean

    <invalid inheritdoc annotation>

    <invalid inheritdoc annotation>

    Definition Classes
    Dataset → Dataset
  85. def isStreaming: Boolean

    <invalid inheritdoc annotation>

    <invalid inheritdoc annotation>

    Definition Classes
    Dataset → Dataset
  86. def javaRDD: JavaRDD[T]

    Returns the content of the Dataset as a JavaRDD of Ts.

    Returns the content of the Dataset as a JavaRDD of Ts.

    Since

    1.6.0

  87. def join(right: Dataset[_], joinExprs: Column): DataFrame

    <invalid inheritdoc annotation>

    <invalid inheritdoc annotation>

    Definition Classes
    Dataset → Dataset
  88. def join(right: Dataset[_], usingColumns: Array[String], joinType: String): DataFrame

    <invalid inheritdoc annotation>

    <invalid inheritdoc annotation>

    Definition Classes
    Dataset → Dataset
  89. def join(right: Dataset[_], usingColumn: String, joinType: String): DataFrame

    <invalid inheritdoc annotation>

    <invalid inheritdoc annotation>

    Definition Classes
    Dataset → Dataset
  90. def join(right: Dataset[_], usingColumns: Seq[String]): DataFrame

    <invalid inheritdoc annotation>

    <invalid inheritdoc annotation>

    Definition Classes
    Dataset → Dataset
  91. def join(right: Dataset[_], usingColumns: Array[String]): DataFrame

    <invalid inheritdoc annotation>

    <invalid inheritdoc annotation>

    Definition Classes
    Dataset → Dataset
  92. def join(right: Dataset[_], usingColumn: String): DataFrame

    <invalid inheritdoc annotation>

    <invalid inheritdoc annotation>

    Definition Classes
    Dataset → Dataset
  93. def join(right: Dataset[_], joinExprs: Column, joinType: String): DataFrame

    <invalid inheritdoc annotation>

    <invalid inheritdoc annotation>

    Definition Classes
    Dataset → Dataset
  94. def join(right: Dataset[_], usingColumns: Seq[String], joinType: String): DataFrame

    <invalid inheritdoc annotation>

    <invalid inheritdoc annotation>

    Definition Classes
    Dataset → Dataset
  95. def join(right: Dataset[_]): DataFrame

    <invalid inheritdoc annotation>

    <invalid inheritdoc annotation>

    Definition Classes
    Dataset → Dataset
  96. def joinWith[U](other: Dataset[U], condition: Column): Dataset[(T, U)]

    <invalid inheritdoc annotation>

    <invalid inheritdoc annotation>

    Definition Classes
    Dataset → Dataset
  97. def joinWith[U](other: Dataset[U], condition: Column, joinType: String): Dataset[(T, U)]

    <invalid inheritdoc annotation>

    <invalid inheritdoc annotation>

    Definition Classes
    Dataset → Dataset
  98. def limit(n: Int): Dataset[T]

    <invalid inheritdoc annotation>

    <invalid inheritdoc annotation>

    Definition Classes
    Dataset → Dataset
  99. def localCheckpoint(eager: Boolean): Dataset[T]

    <invalid inheritdoc annotation>

    <invalid inheritdoc annotation>

    Definition Classes
    Dataset → Dataset
  100. def localCheckpoint(): Dataset[T]

    <invalid inheritdoc annotation>

    <invalid inheritdoc annotation>

    Definition Classes
    Dataset → Dataset
  101. def map[U](func: MapFunction[T, U], encoder: Encoder[U]): Dataset[U]

    <invalid inheritdoc annotation>

    <invalid inheritdoc annotation>

    Definition Classes
    Dataset → Dataset
  102. def map[U](func: (T) => U)(implicit arg0: Encoder[U]): Dataset[U]

    <invalid inheritdoc annotation>

    <invalid inheritdoc annotation>

    Definition Classes
    Dataset → Dataset
  103. def mapPartitions[U](f: MapPartitionsFunction[T, U], encoder: Encoder[U]): Dataset[U]

    <invalid inheritdoc annotation>

    <invalid inheritdoc annotation>

    Definition Classes
    Dataset → Dataset
  104. def mapPartitions[U](func: (Iterator[T]) => Iterator[U])(implicit arg0: Encoder[U]): Dataset[U]

    <invalid inheritdoc annotation>

    <invalid inheritdoc annotation>

    Definition Classes
    Dataset → Dataset
  105. def melt(ids: Array[Column], variableColumnName: String, valueColumnName: String): DataFrame

    <invalid inheritdoc annotation>

    <invalid inheritdoc annotation>

    Definition Classes
    Dataset → Dataset
  106. def melt(ids: Array[Column], values: Array[Column], variableColumnName: String, valueColumnName: String): DataFrame
    Definition Classes
    Dataset → Dataset
  107. def mergeInto(table: String, condition: Column): MergeIntoWriter[T]

    Merges a set of updates, insertions, and deletions based on a source table into a target table.

    Merges a set of updates, insertions, and deletions based on a source table into a target table.

    Scala Examples:

    spark.table("source")
      .mergeInto("target", $"source.id" === $"target.id")
      .whenMatched($"salary" === 100)
      .delete()
      .whenNotMatched()
      .insertAll()
      .whenNotMatchedBySource($"salary" === 100)
      .update(Map(
        "salary" -> lit(200)
      ))
      .merge()
    Definition Classes
    Dataset → Dataset
    Since

    4.0.0

  108. def metadataColumn(colName: String): Column

    <invalid inheritdoc annotation>

    <invalid inheritdoc annotation>

    Definition Classes
    Dataset → Dataset
  109. def na: DataFrameNaFunctions

    <invalid inheritdoc annotation>

    <invalid inheritdoc annotation>

    Definition Classes
    Dataset → Dataset
  110. final def ne(arg0: AnyRef): Boolean
    Definition Classes
    AnyRef
  111. final def notify(): Unit
    Definition Classes
    AnyRef
    Annotations
    @IntrinsicCandidate() @native()
  112. final def notifyAll(): Unit
    Definition Classes
    AnyRef
    Annotations
    @IntrinsicCandidate() @native()
  113. def observe(observation: Observation, expr: Column, exprs: Column*): Dataset[T]

    <invalid inheritdoc annotation>

    <invalid inheritdoc annotation>

    Definition Classes
    Dataset → Dataset
    Annotations
    @varargs()
  114. def observe(name: String, expr: Column, exprs: Column*): Dataset[T]

    <invalid inheritdoc annotation>

    <invalid inheritdoc annotation>

    Definition Classes
    Dataset → Dataset
    Annotations
    @varargs()
  115. def offset(n: Int): Dataset[T]

    <invalid inheritdoc annotation>

    <invalid inheritdoc annotation>

    Definition Classes
    Dataset → Dataset
  116. def orderBy(sortExprs: Column*): Dataset[T]

    <invalid inheritdoc annotation>

    <invalid inheritdoc annotation>

    Definition Classes
    Dataset → Dataset
    Annotations
    @varargs()
  117. def orderBy(sortCol: String, sortCols: String*): Dataset[T]

    <invalid inheritdoc annotation>

    <invalid inheritdoc annotation>

    Definition Classes
    Dataset → Dataset
    Annotations
    @varargs()
  118. def persist(newLevel: StorageLevel): Dataset.this.type

    <invalid inheritdoc annotation>

    <invalid inheritdoc annotation>

    Definition Classes
    Dataset → Dataset
  119. def persist(): Dataset.this.type

    <invalid inheritdoc annotation>

    <invalid inheritdoc annotation>

    Definition Classes
    Dataset → Dataset
  120. def printSchema(level: Int): Unit
    Definition Classes
    Dataset
  121. def printSchema(): Unit
    Definition Classes
    Dataset
  122. val queryExecution: QueryExecution
  123. def randomSplit(weights: Array[Double]): Array[Dataset[T]]

    <invalid inheritdoc annotation>

    <invalid inheritdoc annotation>

    Definition Classes
    Dataset → Dataset
  124. def randomSplit(weights: Array[Double], seed: Long): Array[Dataset[T]]

    <invalid inheritdoc annotation>

    <invalid inheritdoc annotation>

    Definition Classes
    Dataset → Dataset
  125. def randomSplitAsList(weights: Array[Double], seed: Long): List[Dataset[T]]

    <invalid inheritdoc annotation>

    <invalid inheritdoc annotation>

    Definition Classes
    Dataset → Dataset
  126. lazy val rdd: RDD[T]

    Represents the content of the Dataset as an RDD of T.

    Represents the content of the Dataset as an RDD of T.

    Since

    1.6.0

  127. def reduce(func: (T, T) => T): T

    <invalid inheritdoc annotation>

    <invalid inheritdoc annotation>

    Definition Classes
    Dataset → Dataset
  128. def reduce(func: ReduceFunction[T]): T
    Definition Classes
    Dataset
  129. def repartition(partitionExprs: Column*): Dataset[T]

    <invalid inheritdoc annotation>

    <invalid inheritdoc annotation>

    Definition Classes
    Dataset → Dataset
    Annotations
    @varargs()
  130. def repartition(numPartitions: Int, partitionExprs: Column*): Dataset[T]

    <invalid inheritdoc annotation>

    <invalid inheritdoc annotation>

    Definition Classes
    Dataset → Dataset
    Annotations
    @varargs()
  131. def repartition(numPartitions: Int): Dataset[T]

    <invalid inheritdoc annotation>

    <invalid inheritdoc annotation>

    Definition Classes
    Dataset → Dataset
  132. def repartitionByExpression(numPartitions: Option[Int], partitionExprs: Seq[Column]): Dataset[T]
    Attributes
    protected
    Definition Classes
    Dataset → Dataset
  133. def repartitionByRange(partitionExprs: Column*): Dataset[T]

    <invalid inheritdoc annotation>

    <invalid inheritdoc annotation>

    Definition Classes
    Dataset → Dataset
    Annotations
    @varargs()
  134. def repartitionByRange(numPartitions: Int, partitionExprs: Column*): Dataset[T]

    <invalid inheritdoc annotation>

    <invalid inheritdoc annotation>

    Definition Classes
    Dataset → Dataset
    Annotations
    @varargs()
  135. def repartitionByRange(numPartitions: Option[Int], partitionExprs: Seq[Column]): Dataset[T]
    Attributes
    protected
    Definition Classes
    Dataset → Dataset
  136. def rollup(col1: String, cols: String*): RelationalGroupedDataset

    <invalid inheritdoc annotation>

    <invalid inheritdoc annotation>

    Definition Classes
    Dataset → Dataset
    Annotations
    @varargs()
  137. def rollup(cols: Column*): RelationalGroupedDataset

    <invalid inheritdoc annotation>

    <invalid inheritdoc annotation>

    Definition Classes
    Dataset → Dataset
    Annotations
    @varargs()
  138. def sameSemantics(other: Dataset[T]): Boolean

    <invalid inheritdoc annotation>

    <invalid inheritdoc annotation>

    Definition Classes
    Dataset → Dataset
    Annotations
    @DeveloperApi()
  139. def sample(withReplacement: Boolean, fraction: Double): Dataset[T]

    <invalid inheritdoc annotation>

    <invalid inheritdoc annotation>

    Definition Classes
    Dataset → Dataset
  140. def sample(fraction: Double): Dataset[T]

    <invalid inheritdoc annotation>

    <invalid inheritdoc annotation>

    Definition Classes
    Dataset → Dataset
  141. def sample(fraction: Double, seed: Long): Dataset[T]

    <invalid inheritdoc annotation>

    <invalid inheritdoc annotation>

    Definition Classes
    Dataset → Dataset
  142. def sample(withReplacement: Boolean, fraction: Double, seed: Long): Dataset[T]

    <invalid inheritdoc annotation>

    <invalid inheritdoc annotation>

    Definition Classes
    Dataset → Dataset
  143. def schema: StructType

    <invalid inheritdoc annotation>

    <invalid inheritdoc annotation>

    Definition Classes
    Dataset → Dataset
  144. def select[U1, U2, U3, U4, U5](c1: TypedColumn[T, U1], c2: TypedColumn[T, U2], c3: TypedColumn[T, U3], c4: TypedColumn[T, U4], c5: TypedColumn[T, U5]): Dataset[(U1, U2, U3, U4, U5)]

    <invalid inheritdoc annotation>

    <invalid inheritdoc annotation>

    Definition Classes
    Dataset → Dataset
  145. def select[U1, U2, U3, U4](c1: TypedColumn[T, U1], c2: TypedColumn[T, U2], c3: TypedColumn[T, U3], c4: TypedColumn[T, U4]): Dataset[(U1, U2, U3, U4)]

    <invalid inheritdoc annotation>

    <invalid inheritdoc annotation>

    Definition Classes
    Dataset → Dataset
  146. def select[U1, U2, U3](c1: TypedColumn[T, U1], c2: TypedColumn[T, U2], c3: TypedColumn[T, U3]): Dataset[(U1, U2, U3)]

    <invalid inheritdoc annotation>

    <invalid inheritdoc annotation>

    Definition Classes
    Dataset → Dataset
  147. def select[U1, U2](c1: TypedColumn[T, U1], c2: TypedColumn[T, U2]): Dataset[(U1, U2)]

    <invalid inheritdoc annotation>

    <invalid inheritdoc annotation>

    Definition Classes
    Dataset → Dataset
  148. def select(col: String, cols: String*): DataFrame

    <invalid inheritdoc annotation>

    <invalid inheritdoc annotation>

    Definition Classes
    Dataset → Dataset
    Annotations
    @varargs()
  149. def select[U1](c1: TypedColumn[T, U1]): Dataset[U1]

    <invalid inheritdoc annotation>

    <invalid inheritdoc annotation>

    Definition Classes
    Dataset → Dataset
  150. def select(cols: Column*): DataFrame

    <invalid inheritdoc annotation>

    <invalid inheritdoc annotation>

    Definition Classes
    Dataset → Dataset
    Annotations
    @varargs()
  151. def selectExpr(exprs: String*): DataFrame

    <invalid inheritdoc annotation>

    <invalid inheritdoc annotation>

    Definition Classes
    Dataset → Dataset
    Annotations
    @varargs()
  152. def selectUntyped(columns: TypedColumn[_, _]*): Dataset[_]

    <invalid inheritdoc annotation>

    <invalid inheritdoc annotation>

    Attributes
    protected
    Definition Classes
    Dataset → Dataset
  153. def semanticHash(): Int

    <invalid inheritdoc annotation>

    <invalid inheritdoc annotation>

    Definition Classes
    Dataset → Dataset
    Annotations
    @DeveloperApi()
  154. def show(numRows: Int, truncate: Int, vertical: Boolean): Unit

    <invalid inheritdoc annotation>

    <invalid inheritdoc annotation>

    Definition Classes
    Dataset → Dataset
  155. def show(numRows: Int, truncate: Boolean): Unit

    <invalid inheritdoc annotation>

    <invalid inheritdoc annotation>

    Definition Classes
    Dataset → Dataset
  156. def show(numRows: Int, truncate: Int): Unit
    Definition Classes
    Dataset
  157. def show(truncate: Boolean): Unit
    Definition Classes
    Dataset
  158. def show(): Unit
    Definition Classes
    Dataset
  159. def show(numRows: Int): Unit
    Definition Classes
    Dataset
  160. def sort(sortExprs: Column*): Dataset[T]

    <invalid inheritdoc annotation>

    <invalid inheritdoc annotation>

    Definition Classes
    Dataset → Dataset
    Annotations
    @varargs()
  161. def sort(sortCol: String, sortCols: String*): Dataset[T]

    <invalid inheritdoc annotation>

    <invalid inheritdoc annotation>

    Definition Classes
    Dataset → Dataset
    Annotations
    @varargs()
  162. def sortInternal(global: Boolean, sortExprs: Seq[Column]): Dataset[T]
    Attributes
    protected
    Definition Classes
    Dataset → Dataset
  163. def sortWithinPartitions(sortExprs: Column*): Dataset[T]

    <invalid inheritdoc annotation>

    <invalid inheritdoc annotation>

    Definition Classes
    Dataset → Dataset
    Annotations
    @varargs()
  164. def sortWithinPartitions(sortCol: String, sortCols: String*): Dataset[T]

    <invalid inheritdoc annotation>

    <invalid inheritdoc annotation>

    Definition Classes
    Dataset → Dataset
    Annotations
    @varargs()
  165. lazy val sparkSession: SparkSession
    Definition Classes
    Dataset → Dataset
    Annotations
    @transient()
  166. lazy val sqlContext: SQLContext
    Annotations
    @transient()
  167. def stat: DataFrameStatFunctions

    <invalid inheritdoc annotation>

    <invalid inheritdoc annotation>

    Definition Classes
    Dataset → Dataset
  168. def storageLevel: StorageLevel

    <invalid inheritdoc annotation>

    <invalid inheritdoc annotation>

    Definition Classes
    Dataset → Dataset
  169. def summary(statistics: String*): DataFrame

    <invalid inheritdoc annotation>

    <invalid inheritdoc annotation>

    Definition Classes
    Dataset → Dataset
    Annotations
    @varargs()
  170. final def synchronized[T0](arg0: => T0): T0
    Definition Classes
    AnyRef
  171. def tail(n: Int): Array[T]

    <invalid inheritdoc annotation>

    <invalid inheritdoc annotation>

    Definition Classes
    Dataset → Dataset
  172. def take(n: Int): Array[T]
    Definition Classes
    Dataset
  173. def takeAsList(n: Int): List[T]
    Definition Classes
    Dataset
  174. def to(schema: StructType): DataFrame

    <invalid inheritdoc annotation>

    <invalid inheritdoc annotation>

    Definition Classes
    Dataset → Dataset
  175. def toDF(colNames: String*): DataFrame

    <invalid inheritdoc annotation>

    <invalid inheritdoc annotation>

    Definition Classes
    Dataset → Dataset
    Annotations
    @varargs()
  176. def toDF(): DataFrame

    <invalid inheritdoc annotation>

    <invalid inheritdoc annotation>

    Definition Classes
    Dataset → Dataset
  177. def toJSON: Dataset[String]

    <invalid inheritdoc annotation>

    <invalid inheritdoc annotation>

    Definition Classes
    Dataset → Dataset
  178. def toJavaRDD: JavaRDD[T]

    Returns the content of the Dataset as a JavaRDD of Ts.

    Returns the content of the Dataset as a JavaRDD of Ts.

    Since

    1.6.0

  179. def toLocalIterator(): Iterator[T]

    <invalid inheritdoc annotation>

    <invalid inheritdoc annotation>

    Definition Classes
    Dataset → Dataset
  180. def toString(): String
    Definition Classes
    Dataset → AnyRef → Any
  181. def transform[U](t: (Dataset[T]) => Dataset[U]): Dataset[U]
    Definition Classes
    Dataset
  182. def transpose(): DataFrame

    <invalid inheritdoc annotation>

    <invalid inheritdoc annotation>

    Definition Classes
    Dataset → Dataset
  183. def transpose(indexColumn: Column): DataFrame

    <invalid inheritdoc annotation>

    <invalid inheritdoc annotation>

    Definition Classes
    Dataset → Dataset
  184. def union(other: Dataset[T]): Dataset[T]

    <invalid inheritdoc annotation>

    <invalid inheritdoc annotation>

    Definition Classes
    Dataset → Dataset
  185. def unionAll(other: Dataset[T]): Dataset[T]

    <invalid inheritdoc annotation>

    <invalid inheritdoc annotation>

    Definition Classes
    Dataset → Dataset
  186. def unionByName(other: Dataset[T]): Dataset[T]

    <invalid inheritdoc annotation>

    <invalid inheritdoc annotation>

    Definition Classes
    Dataset → Dataset
  187. def unionByName(other: Dataset[T], allowMissingColumns: Boolean): Dataset[T]

    <invalid inheritdoc annotation>

    <invalid inheritdoc annotation>

    Definition Classes
    Dataset → Dataset
  188. def unpersist(): Dataset.this.type

    <invalid inheritdoc annotation>

    <invalid inheritdoc annotation>

    Definition Classes
    Dataset → Dataset
  189. def unpersist(blocking: Boolean): Dataset.this.type

    <invalid inheritdoc annotation>

    <invalid inheritdoc annotation>

    Definition Classes
    Dataset → Dataset
  190. def unpivot(ids: Array[Column], variableColumnName: String, valueColumnName: String): DataFrame

    <invalid inheritdoc annotation>

    <invalid inheritdoc annotation>

    Definition Classes
    Dataset → Dataset
  191. def unpivot(ids: Array[Column], values: Array[Column], variableColumnName: String, valueColumnName: String): DataFrame

    <invalid inheritdoc annotation>

    <invalid inheritdoc annotation>

    Definition Classes
    Dataset → Dataset
  192. final def wait(arg0: Long, arg1: Int): Unit
    Definition Classes
    AnyRef
    Annotations
    @throws(classOf[java.lang.InterruptedException])
  193. final def wait(arg0: Long): Unit
    Definition Classes
    AnyRef
    Annotations
    @throws(classOf[java.lang.InterruptedException]) @native()
  194. final def wait(): Unit
    Definition Classes
    AnyRef
    Annotations
    @throws(classOf[java.lang.InterruptedException])
  195. def where(conditionExpr: String): Dataset[T]

    <invalid inheritdoc annotation>

    <invalid inheritdoc annotation>

    Definition Classes
    Dataset → Dataset
  196. def where(condition: Column): Dataset[T]

    <invalid inheritdoc annotation>

    <invalid inheritdoc annotation>

    Definition Classes
    Dataset → Dataset
  197. def withColumn(colName: String, col: Column): DataFrame

    <invalid inheritdoc annotation>

    <invalid inheritdoc annotation>

    Definition Classes
    Dataset → Dataset
  198. def withColumnRenamed(existingName: String, newName: String): DataFrame

    <invalid inheritdoc annotation>

    <invalid inheritdoc annotation>

    Definition Classes
    Dataset → Dataset
  199. def withColumns(colsMap: Map[String, Column]): DataFrame

    <invalid inheritdoc annotation>

    <invalid inheritdoc annotation>

    Definition Classes
    Dataset → Dataset
  200. def withColumns(colsMap: Map[String, Column]): DataFrame

    <invalid inheritdoc annotation>

    <invalid inheritdoc annotation>

    Definition Classes
    Dataset → Dataset
  201. def withColumns(colNames: Seq[String], cols: Seq[Column]): DataFrame

    <invalid inheritdoc annotation>

    <invalid inheritdoc annotation>

    Attributes
    protected[spark]
    Definition Classes
    Dataset → Dataset
  202. def withColumnsRenamed(colsMap: Map[String, String]): DataFrame

    <invalid inheritdoc annotation>

    <invalid inheritdoc annotation>

    Definition Classes
    Dataset → Dataset
  203. def withColumnsRenamed(colsMap: Map[String, String]): DataFrame

    <invalid inheritdoc annotation>

    <invalid inheritdoc annotation>

    Definition Classes
    Dataset → Dataset
  204. def withColumnsRenamed(colNames: Seq[String], newColNames: Seq[String]): DataFrame
    Attributes
    protected[spark]
    Definition Classes
    Dataset → Dataset
  205. def withMetadata(columnName: String, metadata: Metadata): DataFrame

    <invalid inheritdoc annotation>

    <invalid inheritdoc annotation>

    Definition Classes
    Dataset → Dataset
  206. def withWatermark(eventTime: String, delayThreshold: String): Dataset[T]

    <invalid inheritdoc annotation>

    <invalid inheritdoc annotation>

    Definition Classes
    Dataset → Dataset
  207. def write: DataFrameWriter[T]

    <invalid inheritdoc annotation>

    <invalid inheritdoc annotation>

    Definition Classes
    Dataset → Dataset
  208. def writeStream: DataStreamWriter[T]

    Interface for saving the content of the streaming Dataset out into external storage.

    Interface for saving the content of the streaming Dataset out into external storage.

    Since

    2.0.0

  209. def writeTo(table: String): DataFrameWriterV2[T]

    <invalid inheritdoc annotation>

    <invalid inheritdoc annotation>

    Definition Classes
    Dataset → Dataset

Deprecated Value Members

  1. def explode[A, B](inputColumn: String, outputColumn: String)(f: (A) => IterableOnce[B])(implicit arg0: scala.reflect.api.JavaUniverse.TypeTag[B]): DataFrame

    <invalid inheritdoc annotation>

    <invalid inheritdoc annotation>

    Definition Classes
    Dataset → Dataset
    Annotations
    @deprecated
    Deprecated

    (Since version 2.0.0) use flatMap() or select() with functions.explode() instead

  2. def explode[A <: Product](input: Column*)(f: (Row) => IterableOnce[A])(implicit arg0: scala.reflect.api.JavaUniverse.TypeTag[A]): DataFrame

    <invalid inheritdoc annotation>

    <invalid inheritdoc annotation>

    Definition Classes
    Dataset → Dataset
    Annotations
    @deprecated
    Deprecated

    (Since version 2.0.0) use flatMap() or select() with functions.explode() instead

  3. def finalize(): Unit
    Attributes
    protected[lang]
    Definition Classes
    AnyRef
    Annotations
    @throws(classOf[java.lang.Throwable]) @Deprecated
    Deprecated

    (Since version 9)

  4. def registerTempTable(tableName: String): Unit
    Definition Classes
    Dataset
    Annotations
    @deprecated
    Deprecated

    (Since version 2.0.0) Use createOrReplaceTempView(viewName) instead.

Inherited from api.Dataset[T, Dataset]

Inherited from Serializable

Inherited from AnyRef

Inherited from Any

Basic Dataset functions

Typed transformations

Untyped transformations

Ungrouped