TypedDatasetForwarded

Value Members

final def !=(arg0: Any): Boolean

Definition Classes
AnyRef → Any
final def ##(): Int

Definition Classes
AnyRef → Any
final def ==(arg0: Any): Boolean

Definition Classes
AnyRef → Any
final def asInstanceOf[T0]: T0

Definition Classes
Any
def cache(): TypedDataset[T]

Persist this TypedDataset with the default storage level (MEMORY_AND_DISK).
Persist this TypedDataset with the default storage level (MEMORY_AND_DISK).
apache/spark
def clone(): AnyRef

Attributes
protected[java.lang]
Definition Classes
AnyRef
Annotations
@throws( ... )
def coalesce(numPartitions: Int): TypedDataset[T]

Returns a new TypedDataset that has exactly numPartitions partitions.
Returns a new TypedDataset that has exactly numPartitions partitions. Similar to coalesce defined on an RDD, this operation results in a narrow dependency, e.g. if you go from 1000 partitions to 100 partitions, there will not be a shuffle, instead each of the 100 new partitions will claim 10 of the current partitions.
apache/spark
def columns: Array[String]

Returns an Array that contains all column names in this TypedDataset.
object deserialized

Methods on TypedDataset[T] that go through a full serialization and deserialization of T, and execute outside of the Catalyst runtime.
Methods on TypedDataset[T] that go through a full serialization and deserialization of T, and execute outside of the Catalyst runtime.
Example:
1. The correct way to do a projection on a single column is to use the select method as follows:
  ds: TypedDataset[(String, String, String)] -> ds.select(ds('_2)).run()
  Spark provides an alternative way to obtain the same resulting Dataset, using the map method:
  ds: TypedDataset[(String, String, String)] -> ds.deserialized.map(_._2).run()
  This second approach is however substantially slower than the first one, and should be avoided as possible. Indeed, under the hood this map will deserialize the entire Tuple3 to an full JVM object, call the apply method of the _._2 closure on it, and serialize the resulting String back to its Catalyst representation.
def distinct: TypedDataset[T]

Returns a new TypedDataset that contains only the unique elements of this TypedDataset.
Returns a new TypedDataset that contains only the unique elements of this TypedDataset.
Note that, equality checking is performed directly on the encoded representation of the data and thus is not affected by a custom equals function defined on T.
apache/spark
final def eq(arg0: AnyRef): Boolean

Definition Classes
AnyRef
def equals(arg0: Any): Boolean

Definition Classes
AnyRef → Any
def except(other: TypedDataset[T]): TypedDataset[T]

Returns a new Dataset containing rows in this Dataset but not in another Dataset.
Returns a new Dataset containing rows in this Dataset but not in another Dataset. This is equivalent to EXCEPT in SQL.
Note that, equality checking is performed directly on the encoded representation of the data and thus is not affected by a custom equals function defined on T.
apache/spark
def explain(extended: Boolean = false): Unit

Prints the plans (logical and physical) to the console for debugging purposes.
Prints the plans (logical and physical) to the console for debugging purposes.
apache/spark
def finalize(): Unit

Attributes
protected[java.lang]
Definition Classes
AnyRef
Annotations
@throws( classOf[java.lang.Throwable] )
final def getClass(): Class[_]

Definition Classes
AnyRef → Any
def hashCode(): Int

Definition Classes
AnyRef → Any
def inputFiles: Array[String]

Returns a best-effort snapshot of the files that compose this TypedDataset.
Returns a best-effort snapshot of the files that compose this TypedDataset. This method simply asks each constituent BaseRelation for its respective files and takes the union of all results. Depending on the source relations, this may not find all input files. Duplicates are removed.
apache/spark
def intersect(other: TypedDataset[T]): TypedDataset[T]

Returns a new TypedDataset that contains only the elements of this TypedDataset that are also present in other.
Returns a new TypedDataset that contains only the elements of this TypedDataset that are also present in other.
Note that, equality checking is performed directly on the encoded representation of the data and thus is not affected by a custom equals function defined on T.
apache/spark
final def isInstanceOf[T0]: Boolean

Definition Classes
Any
def isLocal: Boolean

Returns true if the collect and take methods can be run locally (without any Spark executors).
Returns true if the collect and take methods can be run locally (without any Spark executors).
apache/spark
def isStreaming: Boolean

Returns true if this TypedDataset contains one or more sources that continuously return data as it arrives.
Returns true if this TypedDataset contains one or more sources that continuously return data as it arrives. A TypedDataset that reads data from a streaming source must be executed as a StreamingQuery using the start() method in DataStreamWriter. Methods that return a single answer, e.g. count() or collect(), will throw an AnalysisException when there is a streaming source present.
apache/spark
def limit(n: Int): TypedDataset[T]

Returns a new Dataset by taking the first n rows.
Returns a new Dataset by taking the first n rows. The difference between this function and head is that head is an action and returns an array (by triggering query execution) while limit returns a new Dataset.
apache/spark
final def ne(arg0: AnyRef): Boolean

Definition Classes
AnyRef
final def notify(): Unit

Definition Classes
AnyRef
final def notifyAll(): Unit

Definition Classes
AnyRef
def persist(newLevel: StorageLevel = StorageLevel.MEMORY_AND_DISK): TypedDataset[T]

Persist this TypedDataset with the given storage level.
Persist this TypedDataset with the given storage level.
newLevel
One of: MEMORY_ONLY, MEMORY_AND_DISK, MEMORY_ONLY_SER, MEMORY_AND_DISK_SER, DISK_ONLY, MEMORY_ONLY_2, MEMORY_AND_DISK_2, etc. apache/spark
def printSchema(): Unit

Prints the schema of the underlying Dataset to the console in a nice tree format.
Prints the schema of the underlying Dataset to the console in a nice tree format.
apache/spark
def queryExecution: QueryExecution

Returns a QueryExecution from this TypedDataset.
Returns a QueryExecution from this TypedDataset.
It is the primary workflow for executing relational queries using Spark. Designed to allow easy access to the intermediate phases of query execution for developers.
apache/spark
def randomSplit(weights: Array[Double], seed: Long): Array[TypedDataset[T]]

Randomly splits this TypedDataset with the provided weights.
Randomly splits this TypedDataset with the provided weights. Weights for splits, will be normalized if they don't sum to 1.
apache/spark
def randomSplit(weights: Array[Double]): Array[TypedDataset[T]]

Randomly splits this TypedDataset with the provided weights.
Randomly splits this TypedDataset with the provided weights. Weights for splits, will be normalized if they don't sum to 1.
apache/spark
def randomSplitAsList(weights: Array[Double], seed: Long): List[TypedDataset[T]]

Returns a Java list that contains randomly split TypedDataset with the provided weights.
Returns a Java list that contains randomly split TypedDataset with the provided weights. Weights for splits, will be normalized if they don't sum to 1.
apache/spark
def rdd: RDD[T]

Converts this TypedDataset to an RDD.
Converts this TypedDataset to an RDD.
apache/spark
def repartition(numPartitions: Int): TypedDataset[T]

Returns a new TypedDataset that has exactly numPartitions partitions.
Returns a new TypedDataset that has exactly numPartitions partitions.
apache/spark
def sample(withReplacement: Boolean, fraction: Double, seed: Long = Random.nextLong): TypedDataset[T]

Returns a new TypedDataset by sampling a fraction of records.
Returns a new TypedDataset by sampling a fraction of records.
apache/spark
def schema: StructType

Returns the schema of this Dataset.
Returns the schema of this Dataset.
apache/spark
def sparkSession: SparkSession

Returns a SparkSession from this TypedDataset.
def sqlContext: SQLContext

Returns a SQLContext from this TypedDataset.
def storageLevel(): StorageLevel

Get the TypedDataset's current storage level, or StorageLevel.NONE if not persisted.
Get the TypedDataset's current storage level, or StorageLevel.NONE if not persisted.
apache/spark
final def synchronized[T0](arg0: ⇒ T0): T0

Definition Classes
AnyRef
def toDF(): DataFrame

Converts this strongly typed collection of data to generic Dataframe.
Converts this strongly typed collection of data to generic Dataframe. In contrast to the strongly typed objects that Dataset operations work on, a Dataframe returns generic Row objects that allow fields to be accessed by ordinal or name.
apache/spark
def toJSON: TypedDataset[String]

Returns the content of the TypedDataset as a Dataset of JSON strings.
Returns the content of the TypedDataset as a Dataset of JSON strings.
apache/spark
def toString(): String

Definition Classes
TypedDatasetForwarded → AnyRef → Any
def transform[U](t: (TypedDataset[T]) ⇒ TypedDataset[U]): TypedDataset[U]

Concise syntax for chaining custom transformations.
Concise syntax for chaining custom transformations.
apache/spark
def unpersist(blocking: Boolean = false): TypedDataset[T]

Mark the TypedDataset as non-persistent, and remove all blocks for it from memory and disk.
Mark the TypedDataset as non-persistent, and remove all blocks for it from memory and disk.
blocking
Whether to block until all blocks are deleted. apache/spark
final def wait(): Unit

Definition Classes
AnyRef
Annotations
@throws( ... )
final def wait(arg0: Long, arg1: Int): Unit

Definition Classes
AnyRef
Annotations
@throws( ... )
final def wait(arg0: Long): Unit

Definition Classes
AnyRef
Annotations
@throws( ... )
def write: DataFrameWriter[T]

Interface for saving the content of the non-streaming TypedDataset out into external storage.
Interface for saving the content of the non-streaming TypedDataset out into external storage.
apache/spark

Deprecated Value Members

def filter(func: (T) ⇒ Boolean): TypedDataset[T]

Annotations
@deprecated
Deprecated
(Since version 0.4.0) deserialized methods have moved to a separate section to highlight their runtime overhead
def flatMap[U](func: (T) ⇒ TraversableOnce[U])(implicit arg0: TypedEncoder[U]): TypedDataset[U]

Annotations
@deprecated
Deprecated
(Since version 0.4.0) deserialized methods have moved to a separate section to highlight their runtime overhead
def map[U](func: (T) ⇒ U)(implicit arg0: TypedEncoder[U]): TypedDataset[U]

Annotations
@deprecated
Deprecated
(Since version 0.4.0) deserialized methods have moved to a separate section to highlight their runtime overhead
def mapPartitions[U](func: (Iterator[T]) ⇒ Iterator[U])(implicit arg0: TypedEncoder[U]): TypedDataset[U]

Annotations
@deprecated
Deprecated
(Since version 0.4.0) deserialized methods have moved to a separate section to highlight their runtime overhead
def reduceOption[F[_]](func: (T, T) ⇒ T)(implicit arg0: SparkDelay[F]): F[Option[T]]

Annotations
@deprecated
Deprecated
(Since version 0.4.0) deserialized methods have moved to a separate section to highlight their runtime overhead

Related Doc: package frameless

trait TypedDatasetForwarded[T] extends AnyRef

Value Members

final def !=(arg0: Any): Boolean

final def ##(): Int

final def ==(arg0: Any): Boolean

final def asInstanceOf[T0]: T0

def cache(): TypedDataset[T]

def clone(): AnyRef

def coalesce(numPartitions: Int): TypedDataset[T]

def columns: Array[String]

object deserialized

def distinct: TypedDataset[T]

final def eq(arg0: AnyRef): Boolean

def equals(arg0: Any): Boolean

def except(other: TypedDataset[T]): TypedDataset[T]

def explain(extended: Boolean = false): Unit

def finalize(): Unit

final def getClass(): Class[_]

def hashCode(): Int

def inputFiles: Array[String]

def intersect(other: TypedDataset[T]): TypedDataset[T]

final def isInstanceOf[T0]: Boolean

def isLocal: Boolean

def isStreaming: Boolean

def limit(n: Int): TypedDataset[T]

final def ne(arg0: AnyRef): Boolean

final def notify(): Unit

final def notifyAll(): Unit

def persist(newLevel: StorageLevel = StorageLevel.MEMORY_AND_DISK): TypedDataset[T]

def printSchema(): Unit

def queryExecution: QueryExecution

def randomSplit(weights: Array[Double], seed: Long): Array[TypedDataset[T]]

def randomSplit(weights: Array[Double]): Array[TypedDataset[T]]

def randomSplitAsList(weights: Array[Double], seed: Long): List[TypedDataset[T]]

def rdd: RDD[T]

def repartition(numPartitions: Int): TypedDataset[T]

def sample(withReplacement: Boolean, fraction: Double, seed: Long = Random.nextLong): TypedDataset[T]

def schema: StructType

def sparkSession: SparkSession

def sqlContext: SQLContext

def storageLevel(): StorageLevel

final def synchronized[T0](arg0: ⇒ T0): T0

def toDF(): DataFrame

def toJSON: TypedDataset[String]

def toString(): String

def transform[U](t: (TypedDataset[T]) ⇒ TypedDataset[U]): TypedDataset[U]

def unpersist(blocking: Boolean = false): TypedDataset[T]

final def wait(): Unit

final def wait(arg0: Long, arg1: Int): Unit

final def wait(arg0: Long): Unit

def write: DataFrameWriter[T]

Deprecated Value Members

def filter(func: (T) ⇒ Boolean): TypedDataset[T]

def flatMap[U](func: (T) ⇒ TraversableOnce[U])(implicit arg0: TypedEncoder[U]): TypedDataset[U]

def map[U](func: (T) ⇒ U)(implicit arg0: TypedEncoder[U]): TypedDataset[U]

def mapPartitions[U](func: (Iterator[T]) ⇒ Iterator[U])(implicit arg0: TypedEncoder[U]): TypedDataset[U]

def reduceOption[F[_]](func: (T, T) ⇒ T)(implicit arg0: SparkDelay[F]): F[Option[T]]

Inherited from AnyRef

Inherited from Any

Ungrouped