Spark Project SQL 3.0.1 API - org.apache.spark.sql.execution.stat.StatFunctions

final def !=(arg0: Any): Boolean

Definition Classes: AnyRef → Any

final def ##(): Int

Definition Classes: AnyRef → Any

final def ==(arg0: Any): Boolean

Definition Classes: AnyRef → Any

final def asInstanceOf[T0]: T0

Definition Classes: Any

def calculateCov(df: DataFrame, cols: Seq[String]): Double

Calculate the covariance of two numerical columns of a DataFrame.

df: The DataFrame
cols: the column names
returns: the covariance of the two columns.

def clone(): AnyRef

Attributes: protected[lang]
Definition Classes: AnyRef
Annotations: @throws( ... ) @native()

def crossTabulate(df: DataFrame, col1: String, col2: String): DataFrame

Generate a table of frequencies for the elements of two columns.

final def eq(arg0: AnyRef): Boolean

Definition Classes: AnyRef

def equals(arg0: Any): Boolean

Definition Classes: AnyRef → Any

def finalize(): Unit

Attributes: protected[lang]
Definition Classes: AnyRef
Annotations: @throws( classOf[java.lang.Throwable] )

final def getClass(): Class[_]

Definition Classes: AnyRef → Any
Annotations: @native()

def hashCode(): Int

Definition Classes: AnyRef → Any
Annotations: @native()

def initializeLogIfNecessary(isInterpreter: Boolean, silent: Boolean): Boolean

Attributes: protected
Definition Classes: Logging

def initializeLogIfNecessary(isInterpreter: Boolean): Unit

Attributes: protected
Definition Classes: Logging

final def isInstanceOf[T0]: Boolean

Definition Classes: Any

def isTraceEnabled(): Boolean

Attributes: protected
Definition Classes: Logging

def log: Logger

Attributes: protected
Definition Classes: Logging

def logDebug(msg: ⇒ String, throwable: Throwable): Unit

Attributes: protected
Definition Classes: Logging

def logDebug(msg: ⇒ String): Unit

Attributes: protected
Definition Classes: Logging

def logError(msg: ⇒ String, throwable: Throwable): Unit

Attributes: protected
Definition Classes: Logging

def logError(msg: ⇒ String): Unit

Attributes: protected
Definition Classes: Logging

def logInfo(msg: ⇒ String, throwable: Throwable): Unit

Attributes: protected
Definition Classes: Logging

def logInfo(msg: ⇒ String): Unit

Attributes: protected
Definition Classes: Logging

def logName: String

Attributes: protected
Definition Classes: Logging

def logTrace(msg: ⇒ String, throwable: Throwable): Unit

Attributes: protected
Definition Classes: Logging

def logTrace(msg: ⇒ String): Unit

Attributes: protected
Definition Classes: Logging

def logWarning(msg: ⇒ String, throwable: Throwable): Unit

Attributes: protected
Definition Classes: Logging

def logWarning(msg: ⇒ String): Unit

Attributes: protected
Definition Classes: Logging

def multipleApproxQuantiles(df: DataFrame, cols: Seq[String], probabilities: Seq[Double], relativeError: Double): Seq[Seq[Double]]

Calculates the approximate quantiles of multiple numerical columns of a DataFrame in one pass.

The result of this algorithm has the following deterministic bound: If the DataFrame has N elements and if we request the quantile at probability p up to error err, then the algorithm will return a sample x from the DataFrame so that the *exact* rank of x is close to (p * N). More precisely,

floor((p - err) * N) <= rank(x) <= ceil((p + err) * N).

This method implements a variation of the Greenwald-Khanna algorithm (with some speed optimizations). The algorithm was first present in Space-efficient Online Computation of Quantile Summaries by Greenwald and Khanna.

df: the dataframe
cols: numerical columns of the dataframe
probabilities: a list of quantile probabilities Each number must belong to [0, 1]. For example 0 is the minimum, 0.5 is the median, 1 is the maximum.
relativeError: The relative target precision to achieve (greater than or equal 0). If set to zero, the exact quantiles are computed, which could be very expensive. Note that values greater than 1 are accepted but give the same result as 1.
returns: for each column, returns the requested approximations

Note: null and NaN values will be ignored in numerical columns before calculation. For a column only containing null or NaN values, an empty array is returned.

final def ne(arg0: AnyRef): Boolean

Definition Classes: AnyRef

final def notify(): Unit

Definition Classes: AnyRef
Annotations: @native()

final def notifyAll(): Unit

Definition Classes: AnyRef
Annotations: @native()

def pearsonCorrelation(df: DataFrame, cols: Seq[String]): Double

Calculate the Pearson Correlation Coefficient for the given columns

def summary(ds: Dataset[_], statistics: Seq[String]): DataFrame

Calculate selected summary statistics for a dataset

final def synchronized[T0](arg0: ⇒ T0): T0

Definition Classes: AnyRef

def toString(): String

Definition Classes: AnyRef → Any

final def wait(): Unit

Definition Classes: AnyRef
Annotations: @throws( ... )

final def wait(arg0: Long, arg1: Int): Unit

Definition Classes: AnyRef
Annotations: @throws( ... )

final def wait(arg0: Long): Unit

Definition Classes: AnyRef
Annotations: @throws( ... ) @native()

Packages

StatFunctions

object StatFunctions extends Logging

Value Members

Inherited from Logging

Inherited from AnyRef

Inherited from Any

Ungrouped