TypedDataset

Instance Constructors

new TypedDataset(dataset: Dataset[T])(implicit encoder: TypedEncoder[T])

Value Members

final def !=(arg0: Any): Boolean

Definition Classes
AnyRef → Any
final def ##(): Int

Definition Classes
AnyRef → Any
final def ==(arg0: Any): Boolean

Definition Classes
AnyRef → Any
def as[U]()(implicit as: As[T, U]): TypedDataset[U]

Returns a new TypedDataset where each record has been mapped on to the specified type.
final def asInstanceOf[T0]: T0

Definition Classes
Any
def cache(): TypedDataset[T]

Persist this TypedDataset with the default storage level (MEMORY_AND_DISK).
Persist this TypedDataset with the default storage level (MEMORY_AND_DISK).
apache/spark

Definition Classes
TypedDatasetForwarded
def clone(): AnyRef

Attributes
protected[java.lang]
Definition Classes
AnyRef
Annotations
@throws( ... )
def coalesce(numPartitions: Int): TypedDataset[T]

Returns a new TypedDataset that has exactly numPartitions partitions.
Returns a new TypedDataset that has exactly numPartitions partitions. Similar to coalesce defined on an RDD, this operation results in a narrow dependency, e.g. if you go from 1000 partitions to 100 partitions, there will not be a shuffle, instead each of the 100 new partitions will claim 10 of the current partitions.
apache/spark

Definition Classes
TypedDatasetForwarded
def col[A](column: Lt[Symbol])(implicit exists: Exists[T, (column)#T, A], encoder: TypedEncoder[A]): TypedColumn[T, A]

Returns TypedColumn of type A given it's name.
Returns TypedColumn of type A given it's name.
```
tf.col('id)
```
It is statically checked that column with such name exists and has type A.
object colMany extends SingletonProductArgs
def collect(): Job[Seq[T]]

Returns a Seq that contains all the elements in this TypedDataset.
Returns a Seq that contains all the elements in this TypedDataset.
Running this Job requires moving all the data into the application's driver process, and doing so on a very large TypedDataset can crash the driver process with OutOfMemoryError.
Differs from Dataset#collect by wrapping it's result into a Job.
def count(): Job[Long]

Returns the number of elements in the TypedDataset.
Returns the number of elements in the TypedDataset.
Differs from Dataset#count by wrapping it's result into a Job.
val dataset: Dataset[T]
def distinct: TypedDataset[T]

Returns a new TypedDataset that contains only the unique elements of this TypedDataset.
Returns a new TypedDataset that contains only the unique elements of this TypedDataset.
Note that, equality checking is performed directly on the encoded representation of the data and thus is not affected by a custom equals function defined on T.
apache/spark

Definition Classes
TypedDatasetForwarded
implicit val encoder: TypedEncoder[T]
final def eq(arg0: AnyRef): Boolean

Definition Classes
AnyRef
def equals(arg0: Any): Boolean

Definition Classes
AnyRef → Any
def explain(extended: Boolean = false): Unit

Prints the plans (logical and physical) to the console for debugging purposes.
Prints the plans (logical and physical) to the console for debugging purposes.
apache/spark

Definition Classes
TypedDatasetForwarded
def filter(column: TypedColumn[T, Boolean]): TypedDataset[T]

Returns a new frameless.TypedDataset that only contains elements where column is true.
Returns a new frameless.TypedDataset that only contains elements where column is true.
Differs from TypedDatasetForward#filter by taking a TypedColumn[T, Boolean] instead of a T => Boolean. Using a column expression instead of a regular function save one Spark → Scala deserialization which leads to better preformances.
def filter(func: (T) ⇒ Boolean): TypedDataset[T]

Returns a new TypedDataset that only contains elements where func returns true.
Returns a new TypedDataset that only contains elements where func returns true.
apache/spark

Definition Classes
TypedDatasetForwarded
def finalize(): Unit

Attributes
protected[java.lang]
Definition Classes
AnyRef
Annotations
@throws( classOf[java.lang.Throwable] )
def firstOption(): Job[Option[T]]

Optionally returns the first element in this TypedDataset.
Optionally returns the first element in this TypedDataset.
Differs from Dataset#first by wrapping it's result into an Option and a Job.
def flatMap[U](func: (T) ⇒ TraversableOnce[U])(implicit arg0: TypedEncoder[U]): TypedDataset[U]

Returns a new TypedDataset by first applying a function to all elements of this TypedDataset, and then flattening the results.
Returns a new TypedDataset by first applying a function to all elements of this TypedDataset, and then flattening the results.
apache/spark

Definition Classes
TypedDatasetForwarded
def foreach(func: (T) ⇒ Unit): Job[Unit]

Runs func on each element of this TypedDataset.
Runs func on each element of this TypedDataset.
Differs from Dataset#foreach by wrapping it's result into a Job.
def foreachPartition(func: (Iterator[T]) ⇒ Unit): Job[Unit]

Runs func on each partition of this TypedDataset.
Runs func on each partition of this TypedDataset.
Differs from Dataset#foreachPartition by wrapping it's result into a Job.
final def getClass(): Class[_]

Definition Classes
AnyRef → Any
def groupBy[K1, K2](c1: TypedColumn[T, K1], c2: TypedColumn[T, K2]): GroupedBy2Ops[K1, K2, T]
def groupBy[K1](c1: TypedColumn[T, K1]): GroupedBy1Ops[K1, T]
object groupByMany extends ProductArgs
def hashCode(): Int

Definition Classes
AnyRef → Any
def intersect(other: TypedDataset[T]): TypedDataset[T]

Returns a new TypedDataset that contains only the elements of this TypedDataset that are also present in other.
Returns a new TypedDataset that contains only the elements of this TypedDataset that are also present in other.
Note that, equality checking is performed directly on the encoded representation of the data and thus is not affected by a custom equals function defined on T.
apache/spark

Definition Classes
TypedDatasetForwarded
final def isInstanceOf[T0]: Boolean

Definition Classes
Any
def join[A, B](right: TypedDataset[A], leftCol: TypedColumn[T, B], rightCol: TypedColumn[A, B]): TypedDataset[(T, A)]
def joinLeft[A, B](right: TypedDataset[A], leftCol: TypedColumn[T, B], rightCol: TypedColumn[A, B])(implicit arg0: TypedEncoder[A], e: TypedEncoder[(T, Option[A])]): TypedDataset[(T, Option[A])]
def map[U](func: (T) ⇒ U)(implicit arg0: TypedEncoder[U]): TypedDataset[U]

Returns a new TypedDataset that contains the result of applying func to each element.
Returns a new TypedDataset that contains the result of applying func to each element.
apache/spark

Definition Classes
TypedDatasetForwarded
def mapPartitions[U](func: (Iterator[T]) ⇒ Iterator[U])(implicit arg0: TypedEncoder[U]): TypedDataset[U]

Returns a new TypedDataset that contains the result of applying func to each partition.
Returns a new TypedDataset that contains the result of applying func to each partition.
apache/spark

Definition Classes
TypedDatasetForwarded
final def ne(arg0: AnyRef): Boolean

Definition Classes
AnyRef
final def notify(): Unit

Definition Classes
AnyRef
final def notifyAll(): Unit

Definition Classes
AnyRef
def persist(newLevel: StorageLevel = StorageLevel.MEMORY_AND_DISK): TypedDataset[T]

Persist this TypedDataset with the given storage level.
Persist this TypedDataset with the given storage level.
newLevel
One of: MEMORY_ONLY, MEMORY_AND_DISK, MEMORY_ONLY_SER, MEMORY_AND_DISK_SER, DISK_ONLY, MEMORY_ONLY_2, MEMORY_AND_DISK_2, etc. apache/spark

Definition Classes
TypedDatasetForwarded
def printSchema(): Unit

Prints the schema of the underlying Dataset to the console in a nice tree format.
Prints the schema of the underlying Dataset to the console in a nice tree format.
apache/spark

Definition Classes
TypedDatasetForwarded
def rdd: RDD[T]

Converts this TypedDataset to an RDD.
Converts this TypedDataset to an RDD.
apache/spark

Definition Classes
TypedDatasetForwarded
def reduceOption(func: (T, T) ⇒ T): Job[Option[T]]

Optionally reduces the elements of this TypedDataset using the specified binary function.
Optionally reduces the elements of this TypedDataset using the specified binary function. The given func must be commutative and associative or the result may be non-deterministic.
Differs from Dataset#reduce by wrapping it's result into an Option and a Job.
def repartition(numPartitions: Int): TypedDataset[T]

Returns a new TypedDataset that has exactly numPartitions partitions.
Returns a new TypedDataset that has exactly numPartitions partitions.
apache/spark

Definition Classes
TypedDatasetForwarded
def sample(withReplacement: Boolean, fraction: Double, seed: Long = Random.nextLong): TypedDataset[T]

Returns a new TypedDataset by sampling a fraction of records.
Returns a new TypedDataset by sampling a fraction of records.
apache/spark

Definition Classes
TypedDatasetForwarded
def select[A, B, C](ca: TypedColumn[T, A], cb: TypedColumn[T, B], cc: TypedColumn[T, C])(implicit arg0: TypedEncoder[A], arg1: TypedEncoder[B], arg2: TypedEncoder[C]): TypedDataset[(A, B, C)]
def select[A, B](ca: TypedColumn[T, A], cb: TypedColumn[T, B])(implicit arg0: TypedEncoder[A], arg1: TypedEncoder[B]): TypedDataset[(A, B)]
def select[A](ca: TypedColumn[T, A])(implicit arg0: TypedEncoder[A]): TypedDataset[A]
object selectMany extends ProductArgs
def show(numRows: Int = 20, truncate: Boolean = true): Job[Unit]

Displays the content of this TypedDataset in a tabular form.
Displays the content of this TypedDataset in a tabular form. Strings more than 20 characters will be truncated, and all cells will be aligned right. For example:
```
year  month AVG('Adj Close) MAX('Adj Close)
1980  12    0.503218        0.595103
1981  01    0.523289        0.570307
1982  02    0.436504        0.475256
1983  03    0.410516        0.442194
1984  04    0.450090        0.483521
```
numRows
Number of rows to show
truncate
Whether truncate long strings. If true, strings more than 20 characters will be truncated and all cells will be aligned right Differs from Dataset#show by wrapping it's result into a Job. apache/spark
def subtract(other: TypedDataset[T]): TypedDataset[T]

Returns a new TypedDataset where any elements present in other have been removed.
Returns a new TypedDataset where any elements present in other have been removed.
Note that, equality checking is performed directly on the encoded representation of the data and thus is not affected by a custom equals function defined on T.
apache/spark

Definition Classes
TypedDatasetForwarded
final def synchronized[T0](arg0: ⇒ T0): T0

Definition Classes
AnyRef
def take(num: Int): Job[Seq[T]]

Returns the first num elements of this TypedDataset as a Seq.
Returns the first num elements of this TypedDataset as a Seq.
Running take requires moving data into the application's driver process, and doing so with a very large num can crash the driver process with OutOfMemoryError.
Differs from Dataset#take by wrapping it's result into a Job.
apache/spark
def toDF(): DataFrame

Converts this strongly typed collection of data to generic Dataframe.
Converts this strongly typed collection of data to generic Dataframe. In contrast to the strongly typed objects that Dataset operations work on, a Dataframe returns generic Row objects that allow fields to be accessed by ordinal or name.
apache/spark

Definition Classes
TypedDatasetForwarded
def toString(): String

Definition Classes
TypedDatasetForwarded → AnyRef → Any
def transform[U](t: (TypedDataset[T]) ⇒ TypedDataset[U]): TypedDataset[U]

Concise syntax for chaining custom transformations.
Concise syntax for chaining custom transformations.
apache/spark

Definition Classes
TypedDatasetForwarded
def union(other: TypedDataset[T]): TypedDataset[T]

Returns a new TypedDataset that contains the elements of both this and the other TypedDataset combined.
Returns a new TypedDataset that contains the elements of both this and the other TypedDataset combined.
Note that, this function is not a typical set union operation, in that it does not eliminate duplicate items. As such, it is analogous to UNION ALL in SQL.
apache/spark

Definition Classes
TypedDatasetForwarded
def unpersist(blocking: Boolean = false): TypedDataset[T]

Mark the TypedDataset as non-persistent, and remove all blocks for it from memory and disk.
Mark the TypedDataset as non-persistent, and remove all blocks for it from memory and disk.
blocking
Whether to block until all blocks are deleted. apache/spark

Definition Classes
TypedDatasetForwarded
final def wait(): Unit

Definition Classes
AnyRef
Annotations
@throws( ... )
final def wait(arg0: Long, arg1: Int): Unit

Definition Classes
AnyRef
Annotations
@throws( ... )
final def wait(arg0: Long): Unit

Definition Classes
AnyRef
Annotations
@throws( ... )

Related Docs: object TypedDataset | package frameless

class TypedDataset[T] extends TypedDatasetForwarded[T]

Instance Constructors

new TypedDataset(dataset: Dataset[T])(implicit encoder: TypedEncoder[T])

Value Members

final def !=(arg0: Any): Boolean

final def ##(): Int

final def ==(arg0: Any): Boolean

def as[U]()(implicit as: As[T, U]): TypedDataset[U]

final def asInstanceOf[T0]: T0

def cache(): TypedDataset[T]

def clone(): AnyRef

def coalesce(numPartitions: Int): TypedDataset[T]

def col[A](column: Lt[Symbol])(implicit exists: Exists[T, (column)#T, A], encoder: TypedEncoder[A]): TypedColumn[T, A]

object colMany extends SingletonProductArgs

def collect(): Job[Seq[T]]

def count(): Job[Long]

val dataset: Dataset[T]

def distinct: TypedDataset[T]

implicit val encoder: TypedEncoder[T]

final def eq(arg0: AnyRef): Boolean

def equals(arg0: Any): Boolean

def explain(extended: Boolean = false): Unit

def filter(column: TypedColumn[T, Boolean]): TypedDataset[T]

def filter(func: (T) ⇒ Boolean): TypedDataset[T]

def finalize(): Unit

def firstOption(): Job[Option[T]]

def flatMap[U](func: (T) ⇒ TraversableOnce[U])(implicit arg0: TypedEncoder[U]): TypedDataset[U]

def foreach(func: (T) ⇒ Unit): Job[Unit]

def foreachPartition(func: (Iterator[T]) ⇒ Unit): Job[Unit]

final def getClass(): Class[_]

def groupBy[K1, K2](c1: TypedColumn[T, K1], c2: TypedColumn[T, K2]): GroupedBy2Ops[K1, K2, T]

def groupBy[K1](c1: TypedColumn[T, K1]): GroupedBy1Ops[K1, T]

object groupByMany extends ProductArgs

def hashCode(): Int

def intersect(other: TypedDataset[T]): TypedDataset[T]

final def isInstanceOf[T0]: Boolean

def join[A, B](right: TypedDataset[A], leftCol: TypedColumn[T, B], rightCol: TypedColumn[A, B]): TypedDataset[(T, A)]

def joinLeft[A, B](right: TypedDataset[A], leftCol: TypedColumn[T, B], rightCol: TypedColumn[A, B])(implicit arg0: TypedEncoder[A], e: TypedEncoder[(T, Option[A])]): TypedDataset[(T, Option[A])]

def map[U](func: (T) ⇒ U)(implicit arg0: TypedEncoder[U]): TypedDataset[U]

def mapPartitions[U](func: (Iterator[T]) ⇒ Iterator[U])(implicit arg0: TypedEncoder[U]): TypedDataset[U]

final def ne(arg0: AnyRef): Boolean

final def notify(): Unit

final def notifyAll(): Unit

def persist(newLevel: StorageLevel = StorageLevel.MEMORY_AND_DISK): TypedDataset[T]

def printSchema(): Unit

def rdd: RDD[T]

def reduceOption(func: (T, T) ⇒ T): Job[Option[T]]

def repartition(numPartitions: Int): TypedDataset[T]

def sample(withReplacement: Boolean, fraction: Double, seed: Long = Random.nextLong): TypedDataset[T]

def select[A, B, C](ca: TypedColumn[T, A], cb: TypedColumn[T, B], cc: TypedColumn[T, C])(implicit arg0: TypedEncoder[A], arg1: TypedEncoder[B], arg2: TypedEncoder[C]): TypedDataset[(A, B, C)]

def select[A, B](ca: TypedColumn[T, A], cb: TypedColumn[T, B])(implicit arg0: TypedEncoder[A], arg1: TypedEncoder[B]): TypedDataset[(A, B)]

def select[A](ca: TypedColumn[T, A])(implicit arg0: TypedEncoder[A]): TypedDataset[A]

object selectMany extends ProductArgs

def show(numRows: Int = 20, truncate: Boolean = true): Job[Unit]

def subtract(other: TypedDataset[T]): TypedDataset[T]

final def synchronized[T0](arg0: ⇒ T0): T0

def take(num: Int): Job[Seq[T]]

def toDF(): DataFrame

def toString(): String

def transform[U](t: (TypedDataset[T]) ⇒ TypedDataset[U]): TypedDataset[U]

def union(other: TypedDataset[T]): TypedDataset[T]

def unpersist(blocking: Boolean = false): TypedDataset[T]

final def wait(): Unit

final def wait(arg0: Long, arg1: Int): Unit

final def wait(arg0: Long): Unit

Inherited from TypedDatasetForwarded[T]

Inherited from AnyRef

Inherited from Any

Ungrouped