Class/Object

frameless

TypedDataset

Related Docs: object TypedDataset | package frameless

Permalink

class TypedDataset[T] extends TypedDatasetForwarded[T]

TypedDataset is a safer interface for working with Dataset.

Documentation marked "apache/spark" is thanks to apache/spark Contributors at https://github.com/apache/spark, licensed under Apache v2.0 available at http://www.apache.org/licenses/LICENSE-2.0

Self Type
TypedDataset[T]
Linear Supertypes
TypedDatasetForwarded[T], AnyRef, Any
Ordering
  1. Alphabetic
  2. By Inheritance
Inherited
  1. TypedDataset
  2. TypedDatasetForwarded
  3. AnyRef
  4. Any
  1. Hide All
  2. Show All
Visibility
  1. Public
  2. All

Instance Constructors

  1. new TypedDataset(dataset: Dataset[T])(implicit encoder: TypedEncoder[T])

    Permalink

Value Members

  1. final def !=(arg0: Any): Boolean

    Permalink
    Definition Classes
    AnyRef → Any
  2. final def ##(): Int

    Permalink
    Definition Classes
    AnyRef → Any
  3. final def ==(arg0: Any): Boolean

    Permalink
    Definition Classes
    AnyRef → Any
  4. def as[U]()(implicit as: As[T, U]): TypedDataset[U]

    Permalink

    Returns a new TypedDataset where each record has been mapped on to the specified type.

  5. final def asInstanceOf[T0]: T0

    Permalink
    Definition Classes
    Any
  6. def cache(): TypedDataset[T]

    Permalink

    Persist this TypedDataset with the default storage level (MEMORY_AND_DISK).

    Persist this TypedDataset with the default storage level (MEMORY_AND_DISK).

    apache/spark

    Definition Classes
    TypedDatasetForwarded
  7. def clone(): AnyRef

    Permalink
    Attributes
    protected[java.lang]
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  8. def coalesce(numPartitions: Int): TypedDataset[T]

    Permalink

    Returns a new TypedDataset that has exactly numPartitions partitions.

    Returns a new TypedDataset that has exactly numPartitions partitions. Similar to coalesce defined on an RDD, this operation results in a narrow dependency, e.g. if you go from 1000 partitions to 100 partitions, there will not be a shuffle, instead each of the 100 new partitions will claim 10 of the current partitions.

    apache/spark

    Definition Classes
    TypedDatasetForwarded
  9. def col[A](column: Lt[Symbol])(implicit exists: Exists[T, (column)#T, A], encoder: TypedEncoder[A]): TypedColumn[T, A]

    Permalink

    Returns TypedColumn of type A given it's name.

    Returns TypedColumn of type A given it's name.

    tf.col('id)

    It is statically checked that column with such name exists and has type A.

  10. object colMany extends SingletonProductArgs

    Permalink
  11. def collect(): Job[Seq[T]]

    Permalink

    Returns a Seq that contains all the elements in this TypedDataset.

    Returns a Seq that contains all the elements in this TypedDataset.

    Running this Job requires moving all the data into the application's driver process, and doing so on a very large TypedDataset can crash the driver process with OutOfMemoryError.

    Differs from Dataset#collect by wrapping it's result into a Job.

  12. def count(): Job[Long]

    Permalink

    Returns the number of elements in the TypedDataset.

    Returns the number of elements in the TypedDataset.

    Differs from Dataset#count by wrapping it's result into a Job.

  13. val dataset: Dataset[T]

    Permalink
  14. def distinct: TypedDataset[T]

    Permalink

    Returns a new TypedDataset that contains only the unique elements of this TypedDataset.

    Returns a new TypedDataset that contains only the unique elements of this TypedDataset.

    Note that, equality checking is performed directly on the encoded representation of the data and thus is not affected by a custom equals function defined on T.

    apache/spark

    Definition Classes
    TypedDatasetForwarded
  15. implicit val encoder: TypedEncoder[T]

    Permalink
  16. final def eq(arg0: AnyRef): Boolean

    Permalink
    Definition Classes
    AnyRef
  17. def equals(arg0: Any): Boolean

    Permalink
    Definition Classes
    AnyRef → Any
  18. def explain(extended: Boolean = false): Unit

    Permalink

    Prints the plans (logical and physical) to the console for debugging purposes.

    Prints the plans (logical and physical) to the console for debugging purposes.

    apache/spark

    Definition Classes
    TypedDatasetForwarded
  19. def filter(column: TypedColumn[T, Boolean]): TypedDataset[T]

    Permalink

    Returns a new frameless.TypedDataset that only contains elements where column is true.

    Returns a new frameless.TypedDataset that only contains elements where column is true.

    Differs from TypedDatasetForward#filter by taking a TypedColumn[T, Boolean] instead of a T => Boolean. Using a column expression instead of a regular function save one Spark → Scala deserialization which leads to better preformances.

  20. def filter(func: (T) ⇒ Boolean): TypedDataset[T]

    Permalink

    Returns a new TypedDataset that only contains elements where func returns true.

    Returns a new TypedDataset that only contains elements where func returns true.

    apache/spark

    Definition Classes
    TypedDatasetForwarded
  21. def finalize(): Unit

    Permalink
    Attributes
    protected[java.lang]
    Definition Classes
    AnyRef
    Annotations
    @throws( classOf[java.lang.Throwable] )
  22. def firstOption(): Job[Option[T]]

    Permalink

    Optionally returns the first element in this TypedDataset.

    Optionally returns the first element in this TypedDataset.

    Differs from Dataset#first by wrapping it's result into an Option and a Job.

  23. def flatMap[U](func: (T) ⇒ TraversableOnce[U])(implicit arg0: TypedEncoder[U]): TypedDataset[U]

    Permalink

    Returns a new TypedDataset by first applying a function to all elements of this TypedDataset, and then flattening the results.

    Returns a new TypedDataset by first applying a function to all elements of this TypedDataset, and then flattening the results.

    apache/spark

    Definition Classes
    TypedDatasetForwarded
  24. def foreach(func: (T) ⇒ Unit): Job[Unit]

    Permalink

    Runs func on each element of this TypedDataset.

    Runs func on each element of this TypedDataset.

    Differs from Dataset#foreach by wrapping it's result into a Job.

  25. def foreachPartition(func: (Iterator[T]) ⇒ Unit): Job[Unit]

    Permalink

    Runs func on each partition of this TypedDataset.

    Runs func on each partition of this TypedDataset.

    Differs from Dataset#foreachPartition by wrapping it's result into a Job.

  26. final def getClass(): Class[_]

    Permalink
    Definition Classes
    AnyRef → Any
  27. def groupBy[K1, K2](c1: TypedColumn[T, K1], c2: TypedColumn[T, K2]): GroupedBy2Ops[K1, K2, T]

    Permalink
  28. def groupBy[K1](c1: TypedColumn[T, K1]): GroupedBy1Ops[K1, T]

    Permalink
  29. object groupByMany extends ProductArgs

    Permalink
  30. def hashCode(): Int

    Permalink
    Definition Classes
    AnyRef → Any
  31. def intersect(other: TypedDataset[T]): TypedDataset[T]

    Permalink

    Returns a new TypedDataset that contains only the elements of this TypedDataset that are also present in other.

    Returns a new TypedDataset that contains only the elements of this TypedDataset that are also present in other.

    Note that, equality checking is performed directly on the encoded representation of the data and thus is not affected by a custom equals function defined on T.

    apache/spark

    Definition Classes
    TypedDatasetForwarded
  32. final def isInstanceOf[T0]: Boolean

    Permalink
    Definition Classes
    Any
  33. def join[A, B](right: TypedDataset[A], leftCol: TypedColumn[T, B], rightCol: TypedColumn[A, B]): TypedDataset[(T, A)]

    Permalink
  34. def joinLeft[A, B](right: TypedDataset[A], leftCol: TypedColumn[T, B], rightCol: TypedColumn[A, B])(implicit arg0: TypedEncoder[A], e: TypedEncoder[(T, Option[A])]): TypedDataset[(T, Option[A])]

    Permalink
  35. def map[U](func: (T) ⇒ U)(implicit arg0: TypedEncoder[U]): TypedDataset[U]

    Permalink

    Returns a new TypedDataset that contains the result of applying func to each element.

    Returns a new TypedDataset that contains the result of applying func to each element.

    apache/spark

    Definition Classes
    TypedDatasetForwarded
  36. def mapPartitions[U](func: (Iterator[T]) ⇒ Iterator[U])(implicit arg0: TypedEncoder[U]): TypedDataset[U]

    Permalink

    Returns a new TypedDataset that contains the result of applying func to each partition.

    Returns a new TypedDataset that contains the result of applying func to each partition.

    apache/spark

    Definition Classes
    TypedDatasetForwarded
  37. final def ne(arg0: AnyRef): Boolean

    Permalink
    Definition Classes
    AnyRef
  38. final def notify(): Unit

    Permalink
    Definition Classes
    AnyRef
  39. final def notifyAll(): Unit

    Permalink
    Definition Classes
    AnyRef
  40. def persist(newLevel: StorageLevel = StorageLevel.MEMORY_AND_DISK): TypedDataset[T]

    Permalink

    Persist this TypedDataset with the given storage level.

    Persist this TypedDataset with the given storage level.

    newLevel

    One of: MEMORY_ONLY, MEMORY_AND_DISK, MEMORY_ONLY_SER, MEMORY_AND_DISK_SER, DISK_ONLY, MEMORY_ONLY_2, MEMORY_AND_DISK_2, etc. apache/spark

    Definition Classes
    TypedDatasetForwarded
  41. def printSchema(): Unit

    Permalink

    Prints the schema of the underlying Dataset to the console in a nice tree format.

    Prints the schema of the underlying Dataset to the console in a nice tree format.

    apache/spark

    Definition Classes
    TypedDatasetForwarded
  42. def rdd: RDD[T]

    Permalink

    Converts this TypedDataset to an RDD.

    Converts this TypedDataset to an RDD.

    apache/spark

    Definition Classes
    TypedDatasetForwarded
  43. def reduceOption(func: (T, T) ⇒ T): Job[Option[T]]

    Permalink

    Optionally reduces the elements of this TypedDataset using the specified binary function.

    Optionally reduces the elements of this TypedDataset using the specified binary function. The given func must be commutative and associative or the result may be non-deterministic.

    Differs from Dataset#reduce by wrapping it's result into an Option and a Job.

  44. def repartition(numPartitions: Int): TypedDataset[T]

    Permalink

    Returns a new TypedDataset that has exactly numPartitions partitions.

    Returns a new TypedDataset that has exactly numPartitions partitions.

    apache/spark

    Definition Classes
    TypedDatasetForwarded
  45. def sample(withReplacement: Boolean, fraction: Double, seed: Long = Random.nextLong): TypedDataset[T]

    Permalink

    Returns a new TypedDataset by sampling a fraction of records.

    Returns a new TypedDataset by sampling a fraction of records.

    apache/spark

    Definition Classes
    TypedDatasetForwarded
  46. def select[A, B, C](ca: TypedColumn[T, A], cb: TypedColumn[T, B], cc: TypedColumn[T, C])(implicit arg0: TypedEncoder[A], arg1: TypedEncoder[B], arg2: TypedEncoder[C]): TypedDataset[(A, B, C)]

    Permalink
  47. def select[A, B](ca: TypedColumn[T, A], cb: TypedColumn[T, B])(implicit arg0: TypedEncoder[A], arg1: TypedEncoder[B]): TypedDataset[(A, B)]

    Permalink
  48. def select[A](ca: TypedColumn[T, A])(implicit arg0: TypedEncoder[A]): TypedDataset[A]

    Permalink
  49. object selectMany extends ProductArgs

    Permalink
  50. def show(numRows: Int = 20, truncate: Boolean = true): Job[Unit]

    Permalink

    Displays the content of this TypedDataset in a tabular form.

    Displays the content of this TypedDataset in a tabular form. Strings more than 20 characters will be truncated, and all cells will be aligned right. For example:

    year  month AVG('Adj Close) MAX('Adj Close)
    1980  12    0.503218        0.595103
    1981  01    0.523289        0.570307
    1982  02    0.436504        0.475256
    1983  03    0.410516        0.442194
    1984  04    0.450090        0.483521
    numRows

    Number of rows to show

    truncate

    Whether truncate long strings. If true, strings more than 20 characters will be truncated and all cells will be aligned right Differs from Dataset#show by wrapping it's result into a Job. apache/spark

  51. def subtract(other: TypedDataset[T]): TypedDataset[T]

    Permalink

    Returns a new TypedDataset where any elements present in other have been removed.

    Returns a new TypedDataset where any elements present in other have been removed.

    Note that, equality checking is performed directly on the encoded representation of the data and thus is not affected by a custom equals function defined on T.

    apache/spark

    Definition Classes
    TypedDatasetForwarded
  52. final def synchronized[T0](arg0: ⇒ T0): T0

    Permalink
    Definition Classes
    AnyRef
  53. def take(num: Int): Job[Seq[T]]

    Permalink

    Returns the first num elements of this TypedDataset as a Seq.

    Returns the first num elements of this TypedDataset as a Seq.

    Running take requires moving data into the application's driver process, and doing so with a very large num can crash the driver process with OutOfMemoryError.

    Differs from Dataset#take by wrapping it's result into a Job.

    apache/spark

  54. def toDF(): DataFrame

    Permalink

    Converts this strongly typed collection of data to generic Dataframe.

    Converts this strongly typed collection of data to generic Dataframe. In contrast to the strongly typed objects that Dataset operations work on, a Dataframe returns generic Row objects that allow fields to be accessed by ordinal or name.

    apache/spark

    Definition Classes
    TypedDatasetForwarded
  55. def toString(): String

    Permalink
    Definition Classes
    TypedDatasetForwarded → AnyRef → Any
  56. def transform[U](t: (TypedDataset[T]) ⇒ TypedDataset[U]): TypedDataset[U]

    Permalink

    Concise syntax for chaining custom transformations.

    Concise syntax for chaining custom transformations.

    apache/spark

    Definition Classes
    TypedDatasetForwarded
  57. def union(other: TypedDataset[T]): TypedDataset[T]

    Permalink

    Returns a new TypedDataset that contains the elements of both this and the other TypedDataset combined.

    Returns a new TypedDataset that contains the elements of both this and the other TypedDataset combined.

    Note that, this function is not a typical set union operation, in that it does not eliminate duplicate items. As such, it is analogous to UNION ALL in SQL.

    apache/spark

    Definition Classes
    TypedDatasetForwarded
  58. def unpersist(blocking: Boolean = false): TypedDataset[T]

    Permalink

    Mark the TypedDataset as non-persistent, and remove all blocks for it from memory and disk.

    Mark the TypedDataset as non-persistent, and remove all blocks for it from memory and disk.

    blocking

    Whether to block until all blocks are deleted. apache/spark

    Definition Classes
    TypedDatasetForwarded
  59. final def wait(): Unit

    Permalink
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  60. final def wait(arg0: Long, arg1: Int): Unit

    Permalink
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  61. final def wait(arg0: Long): Unit

    Permalink
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )

Inherited from TypedDatasetForwarded[T]

Inherited from AnyRef

Inherited from Any

Ungrouped