Class

zio.spark.sql

KeyValueGroupedDataset

Related Doc: package sql

Permalink

final case class KeyValueGroupedDataset[K, V](underlying: org.apache.spark.sql.KeyValueGroupedDataset[K, V]) extends Product with Serializable

Self Type
KeyValueGroupedDataset[K, V]
Linear Supertypes
Ordering
  1. Alphabetic
  2. By Inheritance
Inherited
  1. KeyValueGroupedDataset
  2. Serializable
  3. Serializable
  4. Product
  5. Equals
  6. AnyRef
  7. Any
  1. Hide All
  2. Show All
Visibility
  1. Public
  2. All

Instance Constructors

  1. new KeyValueGroupedDataset(underlying: org.apache.spark.sql.KeyValueGroupedDataset[K, V])

    Permalink

Value Members

  1. final def !=(arg0: Any): Boolean

    Permalink
    Definition Classes
    AnyRef → Any
  2. final def ##(): Int

    Permalink
    Definition Classes
    AnyRef → Any
  3. final def ==(arg0: Any): Boolean

    Permalink
    Definition Classes
    AnyRef → Any
  4. def agg[U1, U2, U3, U4](col1: TypedColumn[V, U1], col2: TypedColumn[V, U2], col3: TypedColumn[V, U3], col4: TypedColumn[V, U4]): TryAnalysis[Dataset[(K, U1, U2, U3, U4)]]

    Permalink

    Computes the given aggregations, returning a Dataset of tuples for each unique key and the result of computing these aggregations over all elements in the group.

    Computes the given aggregations, returning a Dataset of tuples for each unique key and the result of computing these aggregations over all elements in the group.

    Since

    1.6.0

  5. def agg[U1, U2, U3](col1: TypedColumn[V, U1], col2: TypedColumn[V, U2], col3: TypedColumn[V, U3]): TryAnalysis[Dataset[(K, U1, U2, U3)]]

    Permalink

    Computes the given aggregations, returning a Dataset of tuples for each unique key and the result of computing these aggregations over all elements in the group.

    Computes the given aggregations, returning a Dataset of tuples for each unique key and the result of computing these aggregations over all elements in the group.

    Since

    1.6.0

  6. def agg[U1, U2](col1: TypedColumn[V, U1], col2: TypedColumn[V, U2]): TryAnalysis[Dataset[(K, U1, U2)]]

    Permalink

    Computes the given aggregations, returning a Dataset of tuples for each unique key and the result of computing these aggregations over all elements in the group.

    Computes the given aggregations, returning a Dataset of tuples for each unique key and the result of computing these aggregations over all elements in the group.

    Since

    1.6.0

  7. def agg[U1](col1: TypedColumn[V, U1]): TryAnalysis[Dataset[(K, U1)]]

    Permalink

    Computes the given aggregation, returning a Dataset of tuples for each unique key and the result of computing this aggregation over all elements in the group.

    Computes the given aggregation, returning a Dataset of tuples for each unique key and the result of computing this aggregation over all elements in the group.

    Since

    1.6.0

  8. final def asInstanceOf[T0]: T0

    Permalink
    Definition Classes
    Any
  9. def clone(): AnyRef

    Permalink
    Attributes
    protected[java.lang]
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  10. def cogroup[U, R](other: KeyValueGroupedDataset[K, U])(f: (K, Iterator[V], Iterator[U]) ⇒ TraversableOnce[R])(implicit arg0: Encoder[R]): Dataset[R]

    Permalink

    (Scala-specific) Applies the given function to each cogrouped data.

    (Scala-specific) Applies the given function to each cogrouped data. For each unique group, the function will be passed the grouping key and 2 iterators containing all elements in the group from Dataset this and other. The function can return an iterator containing elements of an arbitrary type which will be returned as a new Dataset.

    Since

    1.6.0

  11. def count: Dataset[(K, Long)]

    Permalink

    Returns a Dataset that contains a tuple with each key and the number of items present for that key.

    Returns a Dataset that contains a tuple with each key and the number of items present for that key.

    Since

    1.6.0

  12. final def eq(arg0: AnyRef): Boolean

    Permalink
    Definition Classes
    AnyRef
  13. def finalize(): Unit

    Permalink
    Attributes
    protected[java.lang]
    Definition Classes
    AnyRef
    Annotations
    @throws( classOf[java.lang.Throwable] )
  14. def flatMapGroups[U](f: (K, Iterator[V]) ⇒ TraversableOnce[U])(implicit arg0: Encoder[U]): Dataset[U]

    Permalink

    (Scala-specific) Applies the given function to each group of data.

    (Scala-specific) Applies the given function to each group of data. For each unique group, the function will be passed the group key and an iterator that contains all of the elements in the group. The function can return an iterator containing elements of an arbitrary type which will be returned as a new Dataset.

    This function does not support partial aggregation, and as a result requires shuffling all the data in the Dataset. If an application intends to perform an aggregation over each key, it is best to use the reduce function or an org.apache.spark.sql.expressions#Aggregator.

    Internally, the implementation will spill to disk if any given group is too large to fit into memory. However, users must take care to avoid materializing the whole iterator for a group (for example, by calling toList) unless they are sure that this is possible given the memory constraints of their cluster.

    Since

    1.6.0

  15. def flatMapGroupsWithState[S, U](outputMode: OutputMode, timeoutConf: GroupStateTimeout)(func: (K, Iterator[V], GroupState[S]) ⇒ Iterator[U])(implicit arg0: Encoder[S], arg1: Encoder[U]): Dataset[U]

    Permalink

    ::Experimental:: (Scala-specific) Applies the given function to each group of data, while maintaining a user-defined per-group state.

    ::Experimental:: (Scala-specific) Applies the given function to each group of data, while maintaining a user-defined per-group state. The result Dataset will represent the objects returned by the function. For a static batch Dataset, the function will be invoked once per group. For a streaming Dataset, the function will be invoked for each group repeatedly in every trigger, and updates to each group's state will be saved across invocations. See GroupState for more details.

    S

    The type of the user-defined state. Must be encodable to Spark SQL types.

    U

    The type of the output objects. Must be encodable to Spark SQL types.

    outputMode

    The output mode of the function.

    timeoutConf

    Timeout configuration for groups that do not receive data for a while. See Encoder for more details on what types are encodable to Spark SQL.

    func

    Function to be called on every group.

    Since

    2.2.0

  16. final def getClass(): Class[_]

    Permalink
    Definition Classes
    AnyRef → Any
  17. final def isInstanceOf[T0]: Boolean

    Permalink
    Definition Classes
    Any
  18. def keyAs[L](implicit arg0: Encoder[L]): KeyValueGroupedDataset[L, V]

    Permalink

    Returns a new KeyValueGroupedDataset where the type of the key has been mapped to the specified type.

    Returns a new KeyValueGroupedDataset where the type of the key has been mapped to the specified type. The mapping of key columns to the type follows the same rules as as on Dataset.

    Since

    1.6.0

  19. def keys: Dataset[K]

    Permalink

    Returns a Dataset that contains each unique key.

    Returns a Dataset that contains each unique key. This is equivalent to doing mapping over the Dataset to extract the keys and then running a distinct operation on those.

    Since

    1.6.0

  20. def mapGroups[U](f: (K, Iterator[V]) ⇒ U)(implicit arg0: Encoder[U]): Dataset[U]

    Permalink

    (Scala-specific) Applies the given function to each group of data.

    (Scala-specific) Applies the given function to each group of data. For each unique group, the function will be passed the group key and an iterator that contains all of the elements in the group. The function can return an element of arbitrary type which will be returned as a new Dataset.

    This function does not support partial aggregation, and as a result requires shuffling all the data in the Dataset. If an application intends to perform an aggregation over each key, it is best to use the reduce function or an org.apache.spark.sql.expressions#Aggregator.

    Internally, the implementation will spill to disk if any given group is too large to fit into memory. However, users must take care to avoid materializing the whole iterator for a group (for example, by calling toList) unless they are sure that this is possible given the memory constraints of their cluster.

    Since

    1.6.0

  21. def mapGroupsWithState[S, U](timeoutConf: GroupStateTimeout)(func: (K, Iterator[V], GroupState[S]) ⇒ U)(implicit arg0: Encoder[S], arg1: Encoder[U]): Dataset[U]

    Permalink

    ::Experimental:: (Scala-specific) Applies the given function to each group of data, while maintaining a user-defined per-group state.

    ::Experimental:: (Scala-specific) Applies the given function to each group of data, while maintaining a user-defined per-group state. The result Dataset will represent the objects returned by the function. For a static batch Dataset, the function will be invoked once per group. For a streaming Dataset, the function will be invoked for each group repeatedly in every trigger, and updates to each group's state will be saved across invocations. See org.apache.spark.sql.streaming.GroupState for more details.

    S

    The type of the user-defined state. Must be encodable to Spark SQL types.

    U

    The type of the output objects. Must be encodable to Spark SQL types.

    timeoutConf

    Timeout configuration for groups that do not receive data for a while. See Encoder for more details on what types are encodable to Spark SQL.

    func

    Function to be called on every group.

    Since

    2.2.0

  22. def mapGroupsWithState[S, U](func: (K, Iterator[V], GroupState[S]) ⇒ U)(implicit arg0: Encoder[S], arg1: Encoder[U]): Dataset[U]

    Permalink

    ::Experimental:: (Scala-specific) Applies the given function to each group of data, while maintaining a user-defined per-group state.

    ::Experimental:: (Scala-specific) Applies the given function to each group of data, while maintaining a user-defined per-group state. The result Dataset will represent the objects returned by the function. For a static batch Dataset, the function will be invoked once per group. For a streaming Dataset, the function will be invoked for each group repeatedly in every trigger, and updates to each group's state will be saved across invocations. See org.apache.spark.sql.streaming.GroupState for more details.

    S

    The type of the user-defined state. Must be encodable to Spark SQL types.

    U

    The type of the output objects. Must be encodable to Spark SQL types.

    func

    Function to be called on every group. See Encoder for more details on what types are encodable to Spark SQL.

    Since

    2.2.0

  23. def mapValues[W](func: (V) ⇒ W)(implicit arg0: Encoder[W]): KeyValueGroupedDataset[K, W]

    Permalink

    Returns a new KeyValueGroupedDataset where the given function func has been applied to the data.

    Returns a new KeyValueGroupedDataset where the given function func has been applied to the data. The grouping key is unchanged by this.

    // Create values grouped by key from a Dataset[(K, V)]
    ds.groupByKey(_._1).mapValues(_._2) // Scala
    Since

    2.1.0

  24. final def ne(arg0: AnyRef): Boolean

    Permalink
    Definition Classes
    AnyRef
  25. final def notify(): Unit

    Permalink
    Definition Classes
    AnyRef
  26. final def notifyAll(): Unit

    Permalink
    Definition Classes
    AnyRef
  27. def reduceGroups(f: (V, V) ⇒ V): Dataset[(K, V)]

    Permalink

    (Scala-specific) Reduces the elements of each group of data using the specified binary function.

    (Scala-specific) Reduces the elements of each group of data using the specified binary function. The given function must be commutative and associative or the result may be non-deterministic.

    Since

    1.6.0

  28. final def synchronized[T0](arg0: ⇒ T0): T0

    Permalink
    Definition Classes
    AnyRef
  29. def transformation[KNew, VNew](f: (org.apache.spark.sql.KeyValueGroupedDataset[K, V]) ⇒ org.apache.spark.sql.KeyValueGroupedDataset[KNew, VNew]): KeyValueGroupedDataset[KNew, VNew]

    Permalink

    Applies a transformation to the underlying KeyValueGroupedDataset.

  30. val underlying: org.apache.spark.sql.KeyValueGroupedDataset[K, V]

    Permalink
  31. def unpack[U](f: (org.apache.spark.sql.KeyValueGroupedDataset[K, V]) ⇒ org.apache.spark.sql.Dataset[U]): Dataset[U]

    Permalink

    Unpack the underlying KeyValueGroupedDataset into a DataFrame.

  32. def unpackWithAnalysis[U](f: (org.apache.spark.sql.KeyValueGroupedDataset[K, V]) ⇒ org.apache.spark.sql.Dataset[U]): TryAnalysis[Dataset[U]]

    Permalink

    Unpack the underlying KeyValueGroupedDataset into a DataFrame, it is used for transformations that can fail due to an AnalysisException.

  33. final def wait(): Unit

    Permalink
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  34. final def wait(arg0: Long, arg1: Int): Unit

    Permalink
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  35. final def wait(arg0: Long): Unit

    Permalink
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )

Inherited from Serializable

Inherited from Serializable

Inherited from Product

Inherited from Equals

Inherited from AnyRef

Inherited from Any

Ungrouped