Package

org.isarnproject.sketches

udaf

Permalink

package udaf

package-wide methods, implicits and definitions for sketching UDAFs

Linear Supertypes
AnyRef, Any
Ordering
  1. Alphabetic
  2. By Inheritance
Inherited
  1. udaf
  2. AnyRef
  3. Any
  1. Hide All
  2. Show All
Visibility
  1. Public
  2. All

Type Members

  1. case class TDigestArrayReduceUDAF(deltaV: Double, maxDiscreteV: Int) extends UserDefinedAggregateFunction with Product with Serializable

    Permalink

    A UDAF for aggregating (reducing) a column of t-digest vectors.

    A UDAF for aggregating (reducing) a column of t-digest vectors. Expected to be created using tdigestArrayReduceUDAF.

    deltaV

    The delta value to be used by the TDigest objects

    maxDiscreteV

    The maxDiscrete value to be used by the TDigest objects

  2. case class TDigestArrayUDAF[N](deltaV: Double, maxDiscreteV: Int)(implicit num: Numeric[N], dataTpe: TDigestUDAFDataType[N]) extends TDigestMultiUDAF with Product with Serializable

    Permalink

    A UDAF for sketching a column of numeric ArrayData with an array of TDigest objects.

    A UDAF for sketching a column of numeric ArrayData with an array of TDigest objects. Expected to be created using tdigestArrayUDAF.

    N

    the expected numeric type of the data; Double, Int, etc

    deltaV

    The delta value to be used by the TDigest objects

    maxDiscreteV

    The maxDiscrete value to be used by the TDigest objects

  3. case class TDigestMLLibVecUDAF(deltaV: Double, maxDiscreteV: Int) extends TDigestMultiUDAF with Product with Serializable

    Permalink

    A UDAF for sketching a column of MLLib Vectors with an array of TDigest objects.

    A UDAF for sketching a column of MLLib Vectors with an array of TDigest objects. Expected to be created using tdigestMLLibVecUDAF.

    deltaV

    The delta value to be used by the TDigest object

    maxDiscreteV

    The maxDiscrete value to be used by the TDigest object

  4. case class TDigestMLVecUDAF(deltaV: Double, maxDiscreteV: Int) extends TDigestMultiUDAF with Product with Serializable

    Permalink

    A UDAF for sketching a column of ML Vectors with an array of TDigest objects.

    A UDAF for sketching a column of ML Vectors with an array of TDigest objects. Expected to be created using tdigestMLVecUDAF.

    deltaV

    The delta value to be used by the TDigest object

    maxDiscreteV

    The maxDiscrete value to be used by the TDigest object

  5. abstract class TDigestMultiUDAF extends UserDefinedAggregateFunction

    Permalink

    A base class that defines the common functionality for array sketching UDAFs

  6. case class TDigestReduceUDAF(deltaV: Double, maxDiscreteV: Int) extends UserDefinedAggregateFunction with Product with Serializable

    Permalink

    A UDAF for aggregating (reducing) a column of t-digests.

    A UDAF for aggregating (reducing) a column of t-digests. Expected to be created using tdigestReduceUDAF.

    deltaV

    The delta value to be used by the TDigest objects

    maxDiscreteV

    The maxDiscrete value to be used by the TDigest objects

  7. case class TDigestUDAF[N](deltaV: Double, maxDiscreteV: Int)(implicit num: Numeric[N], dataTpe: TDigestUDAFDataType[N]) extends UserDefinedAggregateFunction with Product with Serializable

    Permalink

    A UDAF for sketching numeric data with a TDigest.

    A UDAF for sketching numeric data with a TDigest. Expected to be created using tdigestUDAF.

    N

    the expected numeric type of the data; Double, Int, etc

    deltaV

    The delta value to be used by the TDigest object

    maxDiscreteV

    The maxDiscrete value to be used by the TDigest object

  8. case class TDigestUDAFDataType[N](tpe: DataType) extends Product with Serializable

    Permalink

    For declaring implicit values that map numeric types to corresponding DataType values

Value Members

  1. implicit def implicitTDigestArraySQLToTDigestArray(tdasql: TDigestArraySQL): Array[TDigest]

    Permalink

    implicitly unpack a TDigestArraySQL to extract its Array[TDigest] payload

  2. implicit def implicitTDigestSQLToTDigest(tdsql: TDigestSQL): TDigest

    Permalink

    implicitly unpack a TDigestSQL to extract its TDigest payload

  3. object pythonBindings

    Permalink
  4. implicit val tDigestUDAFDataTypeByte: TDigestUDAFDataType[Byte]

    Permalink
  5. implicit val tDigestUDAFDataTypeDouble: TDigestUDAFDataType[Double]

    Permalink
  6. implicit val tDigestUDAFDataTypeFloat: TDigestUDAFDataType[Float]

    Permalink
  7. implicit val tDigestUDAFDataTypeInt: TDigestUDAFDataType[Int]

    Permalink
  8. implicit val tDigestUDAFDataTypeLong: TDigestUDAFDataType[Long]

    Permalink
  9. implicit val tDigestUDAFDataTypeShort: TDigestUDAFDataType[Short]

    Permalink
  10. def tdigestArrayReduceUDAF: TDigestArrayReduceUDAF

    Permalink

    Obtain a UDAF for aggregating (reducing) a column of t-digest vectors

    Obtain a UDAF for aggregating (reducing) a column of t-digest vectors

    returns

    A UDAF that can be applied to a column of t-digest vectors

    Example:
    1. scala> import org.isarnproject.sketches.udaf._, org.apache.spark.isarnproject.sketches.udt._
      // column of t-digest arrays (might also be result of aggregating from groupBy)
      scala> tds.show()
      +--------------------+
      |            tdarrays|
      +--------------------+
      |TDigestArraySQL([...|
      |TDigestArraySQL([...|
      |TDigestArraySQL([...|
      +--------------------+
      // apply tdigestArrayReduceUDAF to reduce the t-digest arrays to single array
      scala> val td = tds.agg(tdigestArrayReduceUDAF($"tdigests").alias("tdarray"))
      td: org.apache.spark.sql.DataFrame = [tdarray: tdigestarray]
      scala> td.show()
      +---------------------+
      |              tdarray|
      +---------------------+
      | TDigestArraySQL([...|
      +---------------------+
  11. def tdigestArrayUDAF[N](implicit num: Numeric[N], dataType: TDigestUDAFDataType[N]): TDigestArrayUDAF[N]

    Permalink

    Obtain a UDAF for sketching a numeric array-data Dataset column, using a t-digest for each element.

    Obtain a UDAF for sketching a numeric array-data Dataset column, using a t-digest for each element.

    N

    The numeric type of the array-data column; Double, Int, etc

    returns

    A UDAF that can be applied to a Dataset array-data column

    Example:
    1. import org.isarnproject.sketches.udaf._, org.apache.spark.isarnproject.sketches.udt._
      // create a UDAF for t-digest array, adding custom settings for delta and maxDiscrete
      val udafTD = tdigestArrayUDAF[Double].delta(0.1).maxDiscrete(25)
      // apply the UDAF to get an array of t-digests for each element in the array-data
      val agg = data.agg(udafTD($"NumericArrayColumn"))
      // extract the t-digest array
      val tdArray = agg.getAs[TDigestArraySQL](0).tdigests
  12. def tdigestMLLibVecUDAF: TDigestMLLibVecUDAF

    Permalink

    Obtain a UDAF for sketching an MLLib Vector Dataset column, using a t-digest for each element in the vector

    Obtain a UDAF for sketching an MLLib Vector Dataset column, using a t-digest for each element in the vector

    returns

    A UDAF that can be applied to a MLLib Vector column

    Example:
    1. import org.isarnproject.sketches.udaf._, org.apache.spark.isarnproject.sketches.udt._
      // create a UDAF for t-digest array, adding custom settings for delta and maxDiscrete
      val udafTD = tdigestMLLibVecUDAF[Double].delta(0.1).maxDiscrete(25)
      // apply the UDAF to get an array of t-digests for each element in the array-data
      val agg = data.agg(udafTD($"MLLibVecColumn"))
      // extract the t-digest array
      val tdArray = agg.getAs[TDigestArraySQL](0).tdigests
  13. def tdigestMLVecUDAF: TDigestMLVecUDAF

    Permalink

    Obtain a UDAF for sketching an ML Vector Dataset column, using a t-digest for each element in the vector

    Obtain a UDAF for sketching an ML Vector Dataset column, using a t-digest for each element in the vector

    returns

    A UDAF that can be applied to a ML Vector column

    Example:
    1. import org.isarnproject.sketches.udaf._, org.apache.spark.isarnproject.sketches.udt._
      // create a UDAF for t-digest array, adding custom settings for delta and maxDiscrete
      val udafTD = tdigestMLVecUDAF[Double].delta(0.1).maxDiscrete(25)
      // apply the UDAF to get an array of t-digests for each element in the array-data
      val agg = data.agg(udafTD($"MLVecColumn"))
      // extract the t-digest array
      val tdArray = agg.getAs[TDigestArraySQL](0).tdigests
  14. def tdigestReduceUDAF: TDigestReduceUDAF

    Permalink

    Obtain a UDAF for aggregating (reducing) a column (or grouping) of t-digests

    Obtain a UDAF for aggregating (reducing) a column (or grouping) of t-digests

    returns

    A UDAF that can be applied to a column or grouping of t-digests

    Example:
    1. scala> import org.isarnproject.sketches.udaf._, org.apache.spark.isarnproject.sketches.udt._
      // column of t-digests (might also be result of aggregating from groupBy)
      scala> tds.show()
      +--------------------+
      |            tdigests|
      +--------------------+
      |TDigestSQL(TDiges...|
      |TDigestSQL(TDiges...|
      |TDigestSQL(TDiges...|
      +--------------------+
      // apply tdigestReduceUDAF to reduce the t-digests to a single combined t-digest
      scala> val td = tds.agg(tdigestReduceUDAF($"tdigests").alias("tdigest"))
      td: org.apache.spark.sql.DataFrame = [tdigest: tdigest]
      scala> td.show()
      +--------------------+
      |             tdigest|
      +--------------------+
      |TDigestSQL(TDiges...|
      +--------------------+
  15. def tdigestUDAF[N](implicit num: Numeric[N], dataType: TDigestUDAFDataType[N]): TDigestUDAF[N]

    Permalink

    Obtain a UDAF for sketching a single numeric Dataset column using a t-digest

    Obtain a UDAF for sketching a single numeric Dataset column using a t-digest

    N

    The numeric type of the column; Double, Int, etc

    returns

    A UDAF that can be applied to a Dataset column

    Example:
    1. import org.isarnproject.sketches.udaf._, org.apache.spark.isarnproject.sketches.udt._
      // create a UDAF for a t-digest, adding custom settings for delta and maxDiscrete
      val udafTD = tdigestUDAF[Double].delta(0.1).maxDiscrete(25)
      // apply the UDAF to get a t-digest for a data column
      val agg = data.agg(udafTD($"NumericColumn"))
      // extract the t-digest
      val td = agg.getAs[TDigestSQL](0).tdigest

Inherited from AnyRef

Inherited from Any

Ungrouped