TypedDatasetForwarded

trait TypedDatasetForwarded[T] extends AnyRef

This trait implements TypedDataset methods that have the same signature than their Dataset equivalent. Each method simply forwards the call to the underlying Dataset.

Documentation marked "apache/spark" is thanks to apache/spark Contributors at https://github.com/apache/spark, licensed under Apache v2.0 available at http://www.apache.org/licenses/LICENSE-2.0

Self Type: TypedDataset[T]

Linear Supertypes

AnyRef, Any

Known Subclasses

TypedDataset

Ordering

Alphabetic
By Inheritance

Inherited

TypedDatasetForwarded
AnyRef
Any

Hide All
Show All

Visibility

Public
Protected

Value Members

final def !=(arg0: Any): Boolean
Definition Classes
AnyRef → Any
final def ##: Int
Definition Classes
AnyRef → Any
final def ==(arg0: Any): Boolean
Definition Classes
AnyRef → Any
final def asInstanceOf[T0]: T0
Definition Classes
Any
def cache(): TypedDataset[T]
Persist this TypedDataset with the default storage level (MEMORY_AND_DISK).
Persist this TypedDataset with the default storage level (MEMORY_AND_DISK).
apache/spark
def clone(): AnyRef
Attributes
protected[lang]
Definition Classes
AnyRef
Annotations
@throws(classOf[java.lang.CloneNotSupportedException]) @native()
def coalesce(numPartitions: Int): TypedDataset[T]
Returns a new TypedDataset that has exactly numPartitions partitions.
Returns a new TypedDataset that has exactly numPartitions partitions. Similar to coalesce defined on an RDD, this operation results in a narrow dependency, e.g. if you go from 1000 partitions to 100 partitions, there will not be a shuffle, instead each of the 100 new partitions will claim 10 of the current partitions.
apache/spark
def columns: Array[String]
Returns an Array that contains all column names in this TypedDataset.
def distinct: TypedDataset[T]
Returns a new TypedDataset that contains only the unique elements of this TypedDataset.
Returns a new TypedDataset that contains only the unique elements of this TypedDataset.
Note that, equality checking is performed directly on the encoded representation of the data and thus is not affected by a custom equals function defined on T.
apache/spark
final def eq(arg0: AnyRef): Boolean
Definition Classes
AnyRef
def equals(arg0: AnyRef): Boolean
Definition Classes
AnyRef → Any
def except(other: TypedDataset[T]): TypedDataset[T]
Returns a new Dataset containing rows in this Dataset but not in another Dataset.
Returns a new Dataset containing rows in this Dataset but not in another Dataset. This is equivalent to EXCEPT in SQL.
Note that, equality checking is performed directly on the encoded representation of the data and thus is not affected by a custom equals function defined on T.
apache/spark
def explain(extended: Boolean = false): Unit
Prints the plans (logical and physical) to the console for debugging purposes.
Prints the plans (logical and physical) to the console for debugging purposes.
apache/spark
def finalize(): Unit
Attributes
protected[lang]
Definition Classes
AnyRef
Annotations
@throws(classOf[java.lang.Throwable])
final def getClass(): Class[_ <: AnyRef]
Definition Classes
AnyRef → Any
Annotations
@native()
def hashCode(): Int
Definition Classes
AnyRef → Any
Annotations
@native()
def inputFiles: Array[String]
Returns a best-effort snapshot of the files that compose this TypedDataset.
Returns a best-effort snapshot of the files that compose this TypedDataset. This method simply asks each constituent BaseRelation for its respective files and takes the union of all results. Depending on the source relations, this may not find all input files. Duplicates are removed.
apache/spark
def intersect(other: TypedDataset[T]): TypedDataset[T]
Returns a new TypedDataset that contains only the elements of this TypedDataset that are also present in other.
Returns a new TypedDataset that contains only the elements of this TypedDataset that are also present in other.
Note that, equality checking is performed directly on the encoded representation of the data and thus is not affected by a custom equals function defined on T.
apache/spark
final def isInstanceOf[T0]: Boolean
Definition Classes
Any
def isLocal: Boolean
Returns true if the collect and take methods can be run locally (without any Spark executors).
Returns true if the collect and take methods can be run locally (without any Spark executors).
apache/spark
def isStreaming: Boolean
Returns true if this TypedDataset contains one or more sources that continuously return data as it arrives.
Returns true if this TypedDataset contains one or more sources that continuously return data as it arrives. A TypedDataset that reads data from a streaming source must be executed as a StreamingQuery using the start() method in DataStreamWriter. Methods that return a single answer, e.g. count() or collect(), will throw an AnalysisException when there is a streaming source present.
apache/spark
def limit(n: Int): TypedDataset[T]
Returns a new Dataset by taking the first n rows.
Returns a new Dataset by taking the first n rows. The difference between this function and head is that head is an action and returns an array (by triggering query execution) while limit returns a new Dataset.
apache/spark
final def ne(arg0: AnyRef): Boolean
Definition Classes
AnyRef
final def notify(): Unit
Definition Classes
AnyRef
Annotations
@native()
final def notifyAll(): Unit
Definition Classes
AnyRef
Annotations
@native()
def persist(newLevel: StorageLevel = StorageLevel.MEMORY_AND_DISK): TypedDataset[T]
Persist this TypedDataset with the given storage level.
Persist this TypedDataset with the given storage level.
newLevel
One of: MEMORY_ONLY, MEMORY_AND_DISK, MEMORY_ONLY_SER, MEMORY_AND_DISK_SER, DISK_ONLY, MEMORY_ONLY_2, MEMORY_AND_DISK_2, etc. apache/spark
def printSchema(): Unit
Prints the schema of the underlying Dataset to the console in a nice tree format.
Prints the schema of the underlying Dataset to the console in a nice tree format.
apache/spark
def queryExecution: QueryExecution
Returns a QueryExecution from this TypedDataset.
Returns a QueryExecution from this TypedDataset.
It is the primary workflow for executing relational queries using Spark. Designed to allow easy access to the intermediate phases of query execution for developers.
apache/spark
def randomSplit(weights: Array[Double], seed: Long): Array[TypedDataset[T]]
Randomly splits this TypedDataset with the provided weights.
Randomly splits this TypedDataset with the provided weights. Weights for splits, will be normalized if they don't sum to 1.
apache/spark
def randomSplit(weights: Array[Double]): Array[TypedDataset[T]]
Randomly splits this TypedDataset with the provided weights.
Randomly splits this TypedDataset with the provided weights. Weights for splits, will be normalized if they don't sum to 1.
apache/spark
def randomSplitAsList(weights: Array[Double], seed: Long): List[TypedDataset[T]]
Returns a Java list that contains randomly split TypedDataset with the provided weights.
Returns a Java list that contains randomly split TypedDataset with the provided weights. Weights for splits, will be normalized if they don't sum to 1.
apache/spark
def rdd: RDD[T]
Converts this TypedDataset to an RDD.
Converts this TypedDataset to an RDD.
apache/spark
def repartition(numPartitions: Int): TypedDataset[T]
Returns a new TypedDataset that has exactly numPartitions partitions.
Returns a new TypedDataset that has exactly numPartitions partitions.
apache/spark
def sample(withReplacement: Boolean, fraction: Double, seed: Long = Random.nextLong()): TypedDataset[T]
Returns a new TypedDataset by sampling a fraction of records.
Returns a new TypedDataset by sampling a fraction of records.
apache/spark
def schema: StructType
Returns the schema of this Dataset.
Returns the schema of this Dataset.
apache/spark
def sparkSession: SparkSession
Returns a SparkSession from this TypedDataset.
def sqlContext: SQLContext
Returns a SQLContext from this TypedDataset.
def storageLevel(): StorageLevel
Get the TypedDataset's current storage level, or StorageLevel.NONE if not persisted.
Get the TypedDataset's current storage level, or StorageLevel.NONE if not persisted.
apache/spark
final def synchronized[T0](arg0: => T0): T0
Definition Classes
AnyRef
def toDF(): DataFrame
Converts this strongly typed collection of data to generic Dataframe.
Converts this strongly typed collection of data to generic Dataframe. In contrast to the strongly typed objects that Dataset operations work on, a Dataframe returns generic Row objects that allow fields to be accessed by ordinal or name.
apache/spark
def toJSON: TypedDataset[String]
Returns the content of the TypedDataset as a Dataset of JSON strings.
Returns the content of the TypedDataset as a Dataset of JSON strings.
apache/spark
def toString(): String
Definition Classes
TypedDatasetForwarded → AnyRef → Any
def transform[U](t: (TypedDataset[T]) => TypedDataset[U]): TypedDataset[U]
Concise syntax for chaining custom transformations.
Concise syntax for chaining custom transformations.
apache/spark
def unpersist(blocking: Boolean = false): TypedDataset[T]
Mark the TypedDataset as non-persistent, and remove all blocks for it from memory and disk.
Mark the TypedDataset as non-persistent, and remove all blocks for it from memory and disk.
blocking
Whether to block until all blocks are deleted. apache/spark
final def wait(): Unit
Definition Classes
AnyRef
Annotations
@throws(classOf[java.lang.InterruptedException])
final def wait(arg0: Long, arg1: Int): Unit
Definition Classes
AnyRef
Annotations
@throws(classOf[java.lang.InterruptedException])
final def wait(arg0: Long): Unit
Definition Classes
AnyRef
Annotations
@throws(classOf[java.lang.InterruptedException]) @native()
def write: DataFrameWriter[T]
Interface for saving the content of the non-streaming TypedDataset out into external storage.
Interface for saving the content of the non-streaming TypedDataset out into external storage.
apache/spark
def writeStream: DataStreamWriter[T]
Interface for saving the content of the streaming Dataset out into external storage.
Interface for saving the content of the streaming Dataset out into external storage.
apache/spark
object deserialized
Methods on TypedDataset[T] that go through a full serialization and deserialization of T, and execute outside of the Catalyst runtime.
Methods on TypedDataset[T] that go through a full serialization and deserialization of T, and execute outside of the Catalyst runtime.
Example:
1. The correct way to do a projection on a single column is to use the select method as follows:
  ds: TypedDataset[(String, String, String)] -> ds.select(ds('_2)).run()
  Spark provides an alternative way to obtain the same resulting Dataset, using the map method:
  ds: TypedDataset[(String, String, String)] -> ds.deserialized.map(_._2).run()
  This second approach is however substantially slower than the first one, and should be avoided as possible. Indeed, under the hood this map will deserialize the entire Tuple3 to an full JVM object, call the apply method of the _._2 closure on it, and serialize the resulting String back to its Catalyst representation.

Deprecated Value Members

def filter(func: (T) => Boolean): TypedDataset[T]
Annotations
@deprecated
Deprecated
(Since version 0.4.0) deserialized methods have moved to a separate section to highlight their runtime overhead
def flatMap[U](func: (T) => TraversableOnce[U])(implicit arg0: TypedEncoder[U]): TypedDataset[U]
Annotations
@deprecated
Deprecated
(Since version 0.4.0) deserialized methods have moved to a separate section to highlight their runtime overhead
def map[U](func: (T) => U)(implicit arg0: TypedEncoder[U]): TypedDataset[U]
Annotations
@deprecated
Deprecated
(Since version 0.4.0) deserialized methods have moved to a separate section to highlight their runtime overhead
def mapPartitions[U](func: (Iterator[T]) => Iterator[U])(implicit arg0: TypedEncoder[U]): TypedDataset[U]
Annotations
@deprecated
Deprecated
(Since version 0.4.0) deserialized methods have moved to a separate section to highlight their runtime overhead
def reduceOption[F[_]](func: (T, T) => T)(implicit arg0: SparkDelay[F]): F[Option[T]]
Annotations
@deprecated
Deprecated
(Since version 0.4.0) deserialized methods have moved to a separate section to highlight their runtime overhead

Packages

TypedDatasetForwarded

trait TypedDatasetForwarded[T] extends AnyRef

Value Members

Deprecated Value Members

Inherited from AnyRef

Inherited from Any

Ungrouped

Packages

TypedDatasetForwarded

trait TypedDatasetForwarded[T] extends AnyRef

Value Members

Deprecated Value Members

Inherited from AnyRef

Inherited from Any

Ungrouped

TypedDatasetForwarded