Trait

frameless

TypedDatasetForwarded

Related Doc: package frameless

Permalink

trait TypedDatasetForwarded[T] extends AnyRef

This trait implements TypedDataset methods that have the same signature than their Dataset equivalent. Each method simply forwards the call to the underlying Dataset.

Documentation marked "apache/spark" is thanks to apache/spark Contributors at https://github.com/apache/spark, licensed under Apache v2.0 available at http://www.apache.org/licenses/LICENSE-2.0

Self Type
TypedDataset[T]
Linear Supertypes
AnyRef, Any
Known Subclasses
Ordering
  1. Alphabetic
  2. By Inheritance
Inherited
  1. TypedDatasetForwarded
  2. AnyRef
  3. Any
  1. Hide All
  2. Show All
Visibility
  1. Public
  2. All

Value Members

  1. final def !=(arg0: Any): Boolean

    Permalink
    Definition Classes
    AnyRef → Any
  2. final def ##(): Int

    Permalink
    Definition Classes
    AnyRef → Any
  3. final def ==(arg0: Any): Boolean

    Permalink
    Definition Classes
    AnyRef → Any
  4. final def asInstanceOf[T0]: T0

    Permalink
    Definition Classes
    Any
  5. def cache(): TypedDataset[T]

    Permalink

    Persist this TypedDataset with the default storage level (MEMORY_AND_DISK).

    Persist this TypedDataset with the default storage level (MEMORY_AND_DISK).

    apache/spark

  6. def clone(): AnyRef

    Permalink
    Attributes
    protected[java.lang]
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  7. def coalesce(numPartitions: Int): TypedDataset[T]

    Permalink

    Returns a new TypedDataset that has exactly numPartitions partitions.

    Returns a new TypedDataset that has exactly numPartitions partitions. Similar to coalesce defined on an RDD, this operation results in a narrow dependency, e.g. if you go from 1000 partitions to 100 partitions, there will not be a shuffle, instead each of the 100 new partitions will claim 10 of the current partitions.

    apache/spark

  8. def columns: Array[String]

    Permalink

    Returns an Array that contains all column names in this TypedDataset.

  9. object deserialized

    Permalink

    Methods on TypedDataset[T] that go through a full serialization and deserialization of T, and execute outside of the Catalyst runtime.

    Methods on TypedDataset[T] that go through a full serialization and deserialization of T, and execute outside of the Catalyst runtime.

    Example:
    1. The correct way to do a projection on a single column is to use the select method as follows:

      ds: TypedDataset[(String, String, String)] -> ds.select(ds('_2)).run()

      Spark provides an alternative way to obtain the same resulting Dataset, using the map method:

      ds: TypedDataset[(String, String, String)] -> ds.deserialized.map(_._2).run()

      This second approach is however substantially slower than the first one, and should be avoided as possible. Indeed, under the hood this map will deserialize the entire Tuple3 to an full JVM object, call the apply method of the _._2 closure on it, and serialize the resulting String back to its Catalyst representation.

  10. def distinct: TypedDataset[T]

    Permalink

    Returns a new TypedDataset that contains only the unique elements of this TypedDataset.

    Returns a new TypedDataset that contains only the unique elements of this TypedDataset.

    Note that, equality checking is performed directly on the encoded representation of the data and thus is not affected by a custom equals function defined on T.

    apache/spark

  11. final def eq(arg0: AnyRef): Boolean

    Permalink
    Definition Classes
    AnyRef
  12. def equals(arg0: Any): Boolean

    Permalink
    Definition Classes
    AnyRef → Any
  13. def except(other: TypedDataset[T]): TypedDataset[T]

    Permalink

    Returns a new Dataset containing rows in this Dataset but not in another Dataset.

    Returns a new Dataset containing rows in this Dataset but not in another Dataset. This is equivalent to EXCEPT in SQL.

    Note that, equality checking is performed directly on the encoded representation of the data and thus is not affected by a custom equals function defined on T.

    apache/spark

  14. def explain(extended: Boolean = false): Unit

    Permalink

    Prints the plans (logical and physical) to the console for debugging purposes.

    Prints the plans (logical and physical) to the console for debugging purposes.

    apache/spark

  15. def finalize(): Unit

    Permalink
    Attributes
    protected[java.lang]
    Definition Classes
    AnyRef
    Annotations
    @throws( classOf[java.lang.Throwable] )
  16. final def getClass(): Class[_]

    Permalink
    Definition Classes
    AnyRef → Any
  17. def hashCode(): Int

    Permalink
    Definition Classes
    AnyRef → Any
  18. def inputFiles: Array[String]

    Permalink

    Returns a best-effort snapshot of the files that compose this TypedDataset.

    Returns a best-effort snapshot of the files that compose this TypedDataset. This method simply asks each constituent BaseRelation for its respective files and takes the union of all results. Depending on the source relations, this may not find all input files. Duplicates are removed.

    apache/spark

  19. def intersect(other: TypedDataset[T]): TypedDataset[T]

    Permalink

    Returns a new TypedDataset that contains only the elements of this TypedDataset that are also present in other.

    Returns a new TypedDataset that contains only the elements of this TypedDataset that are also present in other.

    Note that, equality checking is performed directly on the encoded representation of the data and thus is not affected by a custom equals function defined on T.

    apache/spark

  20. final def isInstanceOf[T0]: Boolean

    Permalink
    Definition Classes
    Any
  21. def isLocal: Boolean

    Permalink

    Returns true if the collect and take methods can be run locally (without any Spark executors).

    Returns true if the collect and take methods can be run locally (without any Spark executors).

    apache/spark

  22. def isStreaming: Boolean

    Permalink

    Returns true if this TypedDataset contains one or more sources that continuously return data as it arrives.

    Returns true if this TypedDataset contains one or more sources that continuously return data as it arrives. A TypedDataset that reads data from a streaming source must be executed as a StreamingQuery using the start() method in DataStreamWriter. Methods that return a single answer, e.g. count() or collect(), will throw an AnalysisException when there is a streaming source present.

    apache/spark

  23. def limit(n: Int): TypedDataset[T]

    Permalink

    Returns a new Dataset by taking the first n rows.

    Returns a new Dataset by taking the first n rows. The difference between this function and head is that head is an action and returns an array (by triggering query execution) while limit returns a new Dataset.

    apache/spark

  24. final def ne(arg0: AnyRef): Boolean

    Permalink
    Definition Classes
    AnyRef
  25. final def notify(): Unit

    Permalink
    Definition Classes
    AnyRef
  26. final def notifyAll(): Unit

    Permalink
    Definition Classes
    AnyRef
  27. def persist(newLevel: StorageLevel = StorageLevel.MEMORY_AND_DISK): TypedDataset[T]

    Permalink

    Persist this TypedDataset with the given storage level.

    Persist this TypedDataset with the given storage level.

    newLevel

    One of: MEMORY_ONLY, MEMORY_AND_DISK, MEMORY_ONLY_SER, MEMORY_AND_DISK_SER, DISK_ONLY, MEMORY_ONLY_2, MEMORY_AND_DISK_2, etc. apache/spark

  28. def printSchema(): Unit

    Permalink

    Prints the schema of the underlying Dataset to the console in a nice tree format.

    Prints the schema of the underlying Dataset to the console in a nice tree format.

    apache/spark

  29. def queryExecution: QueryExecution

    Permalink

    Returns a QueryExecution from this TypedDataset.

    Returns a QueryExecution from this TypedDataset.

    It is the primary workflow for executing relational queries using Spark. Designed to allow easy access to the intermediate phases of query execution for developers.

    apache/spark

  30. def randomSplit(weights: Array[Double], seed: Long): Array[TypedDataset[T]]

    Permalink

    Randomly splits this TypedDataset with the provided weights.

    Randomly splits this TypedDataset with the provided weights. Weights for splits, will be normalized if they don't sum to 1.

    apache/spark

  31. def randomSplit(weights: Array[Double]): Array[TypedDataset[T]]

    Permalink

    Randomly splits this TypedDataset with the provided weights.

    Randomly splits this TypedDataset with the provided weights. Weights for splits, will be normalized if they don't sum to 1.

    apache/spark

  32. def randomSplitAsList(weights: Array[Double], seed: Long): List[TypedDataset[T]]

    Permalink

    Returns a Java list that contains randomly split TypedDataset with the provided weights.

    Returns a Java list that contains randomly split TypedDataset with the provided weights. Weights for splits, will be normalized if they don't sum to 1.

    apache/spark

  33. def rdd: RDD[T]

    Permalink

    Converts this TypedDataset to an RDD.

    Converts this TypedDataset to an RDD.

    apache/spark

  34. def repartition(numPartitions: Int): TypedDataset[T]

    Permalink

    Returns a new TypedDataset that has exactly numPartitions partitions.

    Returns a new TypedDataset that has exactly numPartitions partitions.

    apache/spark

  35. def sample(withReplacement: Boolean, fraction: Double, seed: Long = Random.nextLong): TypedDataset[T]

    Permalink

    Returns a new TypedDataset by sampling a fraction of records.

    Returns a new TypedDataset by sampling a fraction of records.

    apache/spark

  36. def schema: StructType

    Permalink

    Returns the schema of this Dataset.

    Returns the schema of this Dataset.

    apache/spark

  37. def sparkSession: SparkSession

    Permalink

    Returns a SparkSession from this TypedDataset.

  38. def sqlContext: SQLContext

    Permalink

    Returns a SQLContext from this TypedDataset.

  39. def storageLevel(): StorageLevel

    Permalink

    Get the TypedDataset's current storage level, or StorageLevel.NONE if not persisted.

    Get the TypedDataset's current storage level, or StorageLevel.NONE if not persisted.

    apache/spark

  40. final def synchronized[T0](arg0: ⇒ T0): T0

    Permalink
    Definition Classes
    AnyRef
  41. def toDF(): DataFrame

    Permalink

    Converts this strongly typed collection of data to generic Dataframe.

    Converts this strongly typed collection of data to generic Dataframe. In contrast to the strongly typed objects that Dataset operations work on, a Dataframe returns generic Row objects that allow fields to be accessed by ordinal or name.

    apache/spark

  42. def toJSON: TypedDataset[String]

    Permalink

    Returns the content of the TypedDataset as a Dataset of JSON strings.

    Returns the content of the TypedDataset as a Dataset of JSON strings.

    apache/spark

  43. def toString(): String

    Permalink
    Definition Classes
    TypedDatasetForwarded → AnyRef → Any
  44. def transform[U](t: (TypedDataset[T]) ⇒ TypedDataset[U]): TypedDataset[U]

    Permalink

    Concise syntax for chaining custom transformations.

    Concise syntax for chaining custom transformations.

    apache/spark

  45. def unpersist(blocking: Boolean = false): TypedDataset[T]

    Permalink

    Mark the TypedDataset as non-persistent, and remove all blocks for it from memory and disk.

    Mark the TypedDataset as non-persistent, and remove all blocks for it from memory and disk.

    blocking

    Whether to block until all blocks are deleted. apache/spark

  46. final def wait(): Unit

    Permalink
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  47. final def wait(arg0: Long, arg1: Int): Unit

    Permalink
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  48. final def wait(arg0: Long): Unit

    Permalink
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  49. def write: DataFrameWriter[T]

    Permalink

    Interface for saving the content of the non-streaming TypedDataset out into external storage.

    Interface for saving the content of the non-streaming TypedDataset out into external storage.

    apache/spark

Deprecated Value Members

  1. def filter(func: (T) ⇒ Boolean): TypedDataset[T]

    Permalink
    Annotations
    @deprecated
    Deprecated

    (Since version 0.4.0) deserialized methods have moved to a separate section to highlight their runtime overhead

  2. def flatMap[U](func: (T) ⇒ TraversableOnce[U])(implicit arg0: TypedEncoder[U]): TypedDataset[U]

    Permalink
    Annotations
    @deprecated
    Deprecated

    (Since version 0.4.0) deserialized methods have moved to a separate section to highlight their runtime overhead

  3. def map[U](func: (T) ⇒ U)(implicit arg0: TypedEncoder[U]): TypedDataset[U]

    Permalink
    Annotations
    @deprecated
    Deprecated

    (Since version 0.4.0) deserialized methods have moved to a separate section to highlight their runtime overhead

  4. def mapPartitions[U](func: (Iterator[T]) ⇒ Iterator[U])(implicit arg0: TypedEncoder[U]): TypedDataset[U]

    Permalink
    Annotations
    @deprecated
    Deprecated

    (Since version 0.4.0) deserialized methods have moved to a separate section to highlight their runtime overhead

  5. def reduceOption[F[_]](func: (T, T) ⇒ T)(implicit arg0: SparkDelay[F]): F[Option[T]]

    Permalink
    Annotations
    @deprecated
    Deprecated

    (Since version 0.4.0) deserialized methods have moved to a separate section to highlight their runtime overhead

Inherited from AnyRef

Inherited from Any

Ungrouped