t

frameless

TypedDatasetForwarded

trait TypedDatasetForwarded[T] extends AnyRef

This trait implements TypedDataset methods that have the same signature than their Dataset equivalent. Each method simply forwards the call to the underlying Dataset.

Documentation marked "apache/spark" is thanks to apache/spark Contributors at https://github.com/apache/spark, licensed under Apache v2.0 available at http://www.apache.org/licenses/LICENSE-2.0

Self Type
TypedDataset[T]
Linear Supertypes
Known Subclasses
Ordering
  1. Alphabetic
  2. By Inheritance
Inherited
  1. TypedDatasetForwarded
  2. AnyRef
  3. Any
  1. Hide All
  2. Show All
Visibility
  1. Public
  2. Protected

Value Members

  1. final def !=(arg0: Any): Boolean
    Definition Classes
    AnyRef → Any
  2. final def ##: Int
    Definition Classes
    AnyRef → Any
  3. final def ==(arg0: Any): Boolean
    Definition Classes
    AnyRef → Any
  4. final def asInstanceOf[T0]: T0
    Definition Classes
    Any
  5. def cache(): TypedDataset[T]

    Persist this TypedDataset with the default storage level (MEMORY_AND_DISK).

    Persist this TypedDataset with the default storage level (MEMORY_AND_DISK).

    apache/spark

  6. def clone(): AnyRef
    Attributes
    protected[lang]
    Definition Classes
    AnyRef
    Annotations
    @throws(classOf[java.lang.CloneNotSupportedException]) @native()
  7. def coalesce(numPartitions: Int): TypedDataset[T]

    Returns a new TypedDataset that has exactly numPartitions partitions.

    Returns a new TypedDataset that has exactly numPartitions partitions. Similar to coalesce defined on an RDD, this operation results in a narrow dependency, e.g. if you go from 1000 partitions to 100 partitions, there will not be a shuffle, instead each of the 100 new partitions will claim 10 of the current partitions.

    apache/spark

  8. def columns: Array[String]

    Returns an Array that contains all column names in this TypedDataset.

  9. def distinct: TypedDataset[T]

    Returns a new TypedDataset that contains only the unique elements of this TypedDataset.

    Returns a new TypedDataset that contains only the unique elements of this TypedDataset.

    Note that, equality checking is performed directly on the encoded representation of the data and thus is not affected by a custom equals function defined on T.

    apache/spark

  10. final def eq(arg0: AnyRef): Boolean
    Definition Classes
    AnyRef
  11. def equals(arg0: AnyRef): Boolean
    Definition Classes
    AnyRef → Any
  12. def except(other: TypedDataset[T]): TypedDataset[T]

    Returns a new Dataset containing rows in this Dataset but not in another Dataset.

    Returns a new Dataset containing rows in this Dataset but not in another Dataset. This is equivalent to EXCEPT in SQL.

    Note that, equality checking is performed directly on the encoded representation of the data and thus is not affected by a custom equals function defined on T.

    apache/spark

  13. def explain(extended: Boolean = false): Unit

    Prints the plans (logical and physical) to the console for debugging purposes.

    Prints the plans (logical and physical) to the console for debugging purposes.

    apache/spark

  14. def finalize(): Unit
    Attributes
    protected[lang]
    Definition Classes
    AnyRef
    Annotations
    @throws(classOf[java.lang.Throwable])
  15. final def getClass(): Class[_ <: AnyRef]
    Definition Classes
    AnyRef → Any
    Annotations
    @native()
  16. def hashCode(): Int
    Definition Classes
    AnyRef → Any
    Annotations
    @native()
  17. def inputFiles: Array[String]

    Returns a best-effort snapshot of the files that compose this TypedDataset.

    Returns a best-effort snapshot of the files that compose this TypedDataset. This method simply asks each constituent BaseRelation for its respective files and takes the union of all results. Depending on the source relations, this may not find all input files. Duplicates are removed.

    apache/spark

  18. def intersect(other: TypedDataset[T]): TypedDataset[T]

    Returns a new TypedDataset that contains only the elements of this TypedDataset that are also present in other.

    Returns a new TypedDataset that contains only the elements of this TypedDataset that are also present in other.

    Note that, equality checking is performed directly on the encoded representation of the data and thus is not affected by a custom equals function defined on T.

    apache/spark

  19. final def isInstanceOf[T0]: Boolean
    Definition Classes
    Any
  20. def isLocal: Boolean

    Returns true if the collect and take methods can be run locally (without any Spark executors).

    Returns true if the collect and take methods can be run locally (without any Spark executors).

    apache/spark

  21. def isStreaming: Boolean

    Returns true if this TypedDataset contains one or more sources that continuously return data as it arrives.

    Returns true if this TypedDataset contains one or more sources that continuously return data as it arrives. A TypedDataset that reads data from a streaming source must be executed as a StreamingQuery using the start() method in DataStreamWriter. Methods that return a single answer, e.g. count() or collect(), will throw an AnalysisException when there is a streaming source present.

    apache/spark

  22. def limit(n: Int): TypedDataset[T]

    Returns a new Dataset by taking the first n rows.

    Returns a new Dataset by taking the first n rows. The difference between this function and head is that head is an action and returns an array (by triggering query execution) while limit returns a new Dataset.

    apache/spark

  23. final def ne(arg0: AnyRef): Boolean
    Definition Classes
    AnyRef
  24. final def notify(): Unit
    Definition Classes
    AnyRef
    Annotations
    @native()
  25. final def notifyAll(): Unit
    Definition Classes
    AnyRef
    Annotations
    @native()
  26. def persist(newLevel: StorageLevel = StorageLevel.MEMORY_AND_DISK): TypedDataset[T]

    Persist this TypedDataset with the given storage level.

    Persist this TypedDataset with the given storage level.

    newLevel

    One of: MEMORY_ONLY, MEMORY_AND_DISK, MEMORY_ONLY_SER, MEMORY_AND_DISK_SER, DISK_ONLY, MEMORY_ONLY_2, MEMORY_AND_DISK_2, etc. apache/spark

  27. def printSchema(): Unit

    Prints the schema of the underlying Dataset to the console in a nice tree format.

    Prints the schema of the underlying Dataset to the console in a nice tree format.

    apache/spark

  28. def queryExecution: QueryExecution

    Returns a QueryExecution from this TypedDataset.

    Returns a QueryExecution from this TypedDataset.

    It is the primary workflow for executing relational queries using Spark. Designed to allow easy access to the intermediate phases of query execution for developers.

    apache/spark

  29. def randomSplit(weights: Array[Double], seed: Long): Array[TypedDataset[T]]

    Randomly splits this TypedDataset with the provided weights.

    Randomly splits this TypedDataset with the provided weights. Weights for splits, will be normalized if they don't sum to 1.

    apache/spark

  30. def randomSplit(weights: Array[Double]): Array[TypedDataset[T]]

    Randomly splits this TypedDataset with the provided weights.

    Randomly splits this TypedDataset with the provided weights. Weights for splits, will be normalized if they don't sum to 1.

    apache/spark

  31. def randomSplitAsList(weights: Array[Double], seed: Long): List[TypedDataset[T]]

    Returns a Java list that contains randomly split TypedDataset with the provided weights.

    Returns a Java list that contains randomly split TypedDataset with the provided weights. Weights for splits, will be normalized if they don't sum to 1.

    apache/spark

  32. def rdd: RDD[T]

    Converts this TypedDataset to an RDD.

    Converts this TypedDataset to an RDD.

    apache/spark

  33. def repartition(numPartitions: Int): TypedDataset[T]

    Returns a new TypedDataset that has exactly numPartitions partitions.

    Returns a new TypedDataset that has exactly numPartitions partitions.

    apache/spark

  34. def sample(withReplacement: Boolean, fraction: Double, seed: Long = Random.nextLong()): TypedDataset[T]

    Returns a new TypedDataset by sampling a fraction of records.

    Returns a new TypedDataset by sampling a fraction of records.

    apache/spark

  35. def schema: StructType

    Returns the schema of this Dataset.

    Returns the schema of this Dataset.

    apache/spark

  36. def sparkSession: SparkSession

    Returns a SparkSession from this TypedDataset.

  37. def sqlContext: SQLContext

    Returns a SQLContext from this TypedDataset.

  38. def storageLevel(): StorageLevel

    Get the TypedDataset's current storage level, or StorageLevel.NONE if not persisted.

    Get the TypedDataset's current storage level, or StorageLevel.NONE if not persisted.

    apache/spark

  39. final def synchronized[T0](arg0: => T0): T0
    Definition Classes
    AnyRef
  40. def toDF(): DataFrame

    Converts this strongly typed collection of data to generic Dataframe.

    Converts this strongly typed collection of data to generic Dataframe. In contrast to the strongly typed objects that Dataset operations work on, a Dataframe returns generic Row objects that allow fields to be accessed by ordinal or name.

    apache/spark

  41. def toJSON: TypedDataset[String]

    Returns the content of the TypedDataset as a Dataset of JSON strings.

    Returns the content of the TypedDataset as a Dataset of JSON strings.

    apache/spark

  42. def toString(): String
    Definition Classes
    TypedDatasetForwarded → AnyRef → Any
  43. def transform[U](t: (TypedDataset[T]) => TypedDataset[U]): TypedDataset[U]

    Concise syntax for chaining custom transformations.

    Concise syntax for chaining custom transformations.

    apache/spark

  44. def unpersist(blocking: Boolean = false): TypedDataset[T]

    Mark the TypedDataset as non-persistent, and remove all blocks for it from memory and disk.

    Mark the TypedDataset as non-persistent, and remove all blocks for it from memory and disk.

    blocking

    Whether to block until all blocks are deleted. apache/spark

  45. final def wait(): Unit
    Definition Classes
    AnyRef
    Annotations
    @throws(classOf[java.lang.InterruptedException])
  46. final def wait(arg0: Long, arg1: Int): Unit
    Definition Classes
    AnyRef
    Annotations
    @throws(classOf[java.lang.InterruptedException])
  47. final def wait(arg0: Long): Unit
    Definition Classes
    AnyRef
    Annotations
    @throws(classOf[java.lang.InterruptedException]) @native()
  48. def write: DataFrameWriter[T]

    Interface for saving the content of the non-streaming TypedDataset out into external storage.

    Interface for saving the content of the non-streaming TypedDataset out into external storage.

    apache/spark

  49. def writeStream: DataStreamWriter[T]

    Interface for saving the content of the streaming Dataset out into external storage.

    Interface for saving the content of the streaming Dataset out into external storage.

    apache/spark

  50. object deserialized

    Methods on TypedDataset[T] that go through a full serialization and deserialization of T, and execute outside of the Catalyst runtime.

    Methods on TypedDataset[T] that go through a full serialization and deserialization of T, and execute outside of the Catalyst runtime.

    Example:
    1. The correct way to do a projection on a single column is to use the select method as follows:

      ds: TypedDataset[(String, String, String)] -> ds.select(ds('_2)).run()

      Spark provides an alternative way to obtain the same resulting Dataset, using the map method:

      ds: TypedDataset[(String, String, String)] -> ds.deserialized.map(_._2).run()

      This second approach is however substantially slower than the first one, and should be avoided as possible. Indeed, under the hood this map will deserialize the entire Tuple3 to an full JVM object, call the apply method of the _._2 closure on it, and serialize the resulting String back to its Catalyst representation.

Deprecated Value Members

  1. def filter(func: (T) => Boolean): TypedDataset[T]
    Annotations
    @deprecated
    Deprecated

    (Since version 0.4.0) deserialized methods have moved to a separate section to highlight their runtime overhead

  2. def flatMap[U](func: (T) => TraversableOnce[U])(implicit arg0: TypedEncoder[U]): TypedDataset[U]
    Annotations
    @deprecated
    Deprecated

    (Since version 0.4.0) deserialized methods have moved to a separate section to highlight their runtime overhead

  3. def map[U](func: (T) => U)(implicit arg0: TypedEncoder[U]): TypedDataset[U]
    Annotations
    @deprecated
    Deprecated

    (Since version 0.4.0) deserialized methods have moved to a separate section to highlight their runtime overhead

  4. def mapPartitions[U](func: (Iterator[T]) => Iterator[U])(implicit arg0: TypedEncoder[U]): TypedDataset[U]
    Annotations
    @deprecated
    Deprecated

    (Since version 0.4.0) deserialized methods have moved to a separate section to highlight their runtime overhead

  5. def reduceOption[F[_]](func: (T, T) => T)(implicit arg0: SparkDelay[F]): F[Option[T]]
    Annotations
    @deprecated
    Deprecated

    (Since version 0.4.0) deserialized methods have moved to a separate section to highlight their runtime overhead

Inherited from AnyRef

Inherited from Any

Ungrouped