Packages

  • package root
    Definition Classes
    root
  • package org
    Definition Classes
    root
  • package apache
    Definition Classes
    org
  • package spark
    Definition Classes
    apache
  • package sql
    Definition Classes
    spark
  • object functions

    Commonly used functions available for DataFrame operations.

    Commonly used functions available for DataFrame operations. Using functions defined here provides a little bit more compile-time safety to make sure the function exists.

    You can call the functions defined here by two ways: _FUNC_(...) and functions.expr("_FUNC_(...)").

    As an example, regr_count is a function that is defined here. You can use regr_count(col("yCol", col("xCol"))) to invoke the regr_count function. This way the programming language's compiler ensures regr_count exists and is of the proper form. You can also use expr("regr_count(yCol, xCol)") function to invoke the same function. In this case, Spark itself will ensure regr_count exists when it analyzes the query.

    You can find the entire list of functions at SQL API documentation of your Spark version, see also the latest list

    This function APIs usually have methods with Column signature only because it can support not only Column but also other types such as a native string. The other variants currently exist for historical reasons.

    Definition Classes
    sql
    Annotations
    @Stable()
    Since

    1.3.0

  • partitioning

object partitioning

Linear Supertypes
AnyRef, Any
Ordering
  1. Alphabetic
  2. By Inheritance
Inherited
  1. partitioning
  2. AnyRef
  3. Any
  1. Hide All
  2. Show All
Visibility
  1. Public
  2. Protected

Value Members

  1. def bucket(numBuckets: Int, e: Column): Column

    (Scala-specific) A transform for any type that partitions by a hash of the input column.

    (Scala-specific) A transform for any type that partitions by a hash of the input column.

    Since

    4.0.0

  2. def bucket(numBuckets: Column, e: Column): Column

    (Scala-specific) A transform for any type that partitions by a hash of the input column.

    (Scala-specific) A transform for any type that partitions by a hash of the input column.

    Since

    4.0.0

  3. def days(e: Column): Column

    (Scala-specific) A transform for timestamps and dates to partition data into days.

    (Scala-specific) A transform for timestamps and dates to partition data into days.

    Since

    4.0.0

  4. def hours(e: Column): Column

    (Scala-specific) A transform for timestamps to partition data into hours.

    (Scala-specific) A transform for timestamps to partition data into hours.

    Since

    4.0.0

  5. def months(e: Column): Column

    (Scala-specific) A transform for timestamps and dates to partition data into months.

    (Scala-specific) A transform for timestamps and dates to partition data into months.

    Since

    4.0.0

  6. def years(e: Column): Column

    (Scala-specific) A transform for timestamps and dates to partition data into years.

    (Scala-specific) A transform for timestamps and dates to partition data into years.

    Since

    4.0.0