org.isarnproject.sketches

udaf

package udaf

package-wide methods, implicits and definitions for sketching UDAFs

Linear Supertypes
AnyRef, Any
Ordering
  1. Alphabetic
  2. By inheritance
Inherited
  1. udaf
  2. AnyRef
  3. Any
  1. Hide All
  2. Show all
Learn more about member selection
Visibility
  1. Public
  2. All

Type Members

  1. case class TDigestArrayReduceUDAF(deltaV: Double, maxDiscreteV: Int) extends UserDefinedAggregateFunction with Product with Serializable

    A UDAF for aggregating (reducing) a column of t-digest vectors.

  2. case class TDigestArrayUDAF[N](deltaV: Double, maxDiscreteV: Int)(implicit num: Numeric[N], dataTpe: TDigestUDAFDataType[N]) extends TDigestMultiUDAF with Product with Serializable

    A UDAF for sketching a column of numeric ArrayData with an array of TDigest objects.

  3. case class TDigestMLLibVecUDAF(deltaV: Double, maxDiscreteV: Int) extends TDigestMultiUDAF with Product with Serializable

    A UDAF for sketching a column of MLLib Vectors with an array of TDigest objects.

  4. case class TDigestMLVecUDAF(deltaV: Double, maxDiscreteV: Int) extends TDigestMultiUDAF with Product with Serializable

    A UDAF for sketching a column of ML Vectors with an array of TDigest objects.

  5. abstract class TDigestMultiUDAF extends UserDefinedAggregateFunction

    A base class that defines the common functionality for array sketching UDAFs

  6. case class TDigestReduceUDAF(deltaV: Double, maxDiscreteV: Int) extends UserDefinedAggregateFunction with Product with Serializable

    A UDAF for aggregating (reducing) a column of t-digests.

  7. case class TDigestUDAF[N](deltaV: Double, maxDiscreteV: Int)(implicit num: Numeric[N], dataTpe: TDigestUDAFDataType[N]) extends UserDefinedAggregateFunction with Product with Serializable

    A UDAF for sketching numeric data with a TDigest.

  8. case class TDigestUDAFDataType[N](tpe: DataType) extends Product with Serializable

    For declaring implicit values that map numeric types to corresponding DataType values

Value Members

  1. implicit def implicitTDigestArraySQLToTDigestArray(tdasql: TDigestArraySQL): Array[TDigest]

    implicitly unpack a TDigestArraySQL to extract its Array[TDigest] payload

  2. implicit def implicitTDigestSQLToTDigest(tdsql: TDigestSQL): TDigest

    implicitly unpack a TDigestSQL to extract its TDigest payload

  3. object pythonBindings

  4. implicit val tDigestUDAFDataTypeByte: TDigestUDAFDataType[Byte]

  5. implicit val tDigestUDAFDataTypeDouble: TDigestUDAFDataType[Double]

  6. implicit val tDigestUDAFDataTypeFloat: TDigestUDAFDataType[Float]

  7. implicit val tDigestUDAFDataTypeInt: TDigestUDAFDataType[Int]

  8. implicit val tDigestUDAFDataTypeLong: TDigestUDAFDataType[Long]

  9. implicit val tDigestUDAFDataTypeShort: TDigestUDAFDataType[Short]

  10. def tdigestArrayReduceUDAF: TDigestArrayReduceUDAF

    Obtain a UDAF for aggregating (reducing) a column of t-digest vectors

    Obtain a UDAF for aggregating (reducing) a column of t-digest vectors

    returns

    A UDAF that can be applied to a column of t-digest vectors

    Example:
    1. scala> import org.isarnproject.sketches.udaf._, org.apache.spark.isarnproject.sketches.udt._
      // column of t-digest arrays (might also be result of aggregating from groupBy)
      scala> tds.show()
      +--------------------+
      |            tdarrays|
      +--------------------+
      |TDigestArraySQL([...|
      |TDigestArraySQL([...|
      |TDigestArraySQL([...|
      +--------------------+
      
      // apply tdigestArrayReduceUDAF to reduce the t-digest arrays to single array
      scala> val td = tds.agg(tdigestArrayReduceUDAF($"tdigests").alias("tdarray"))
      td: org.apache.spark.sql.DataFrame = [tdarray: tdigestarray]
      
      scala> td.show()
      +---------------------+
      |              tdarray|
      +---------------------+
      | TDigestArraySQL([...|
      +---------------------+
  11. def tdigestArrayUDAF[N](implicit num: Numeric[N], dataType: TDigestUDAFDataType[N]): TDigestArrayUDAF[N]

    Obtain a UDAF for sketching a numeric array-data Dataset column, using a t-digest for each element.

    Obtain a UDAF for sketching a numeric array-data Dataset column, using a t-digest for each element.

    N

    The numeric type of the array-data column; Double, Int, etc

    returns

    A UDAF that can be applied to a Dataset array-data column

    Example:
    1. import org.isarnproject.sketches.udaf._, org.apache.spark.isarnproject.sketches.udt._
      // create a UDAF for t-digest array, adding custom settings for delta and maxDiscrete
      val udafTD = tdigestArrayUDAF[Double].delta(0.1).maxDiscrete(25)
      // apply the UDAF to get an array of t-digests for each element in the array-data
      val agg = data.agg(udafTD($"NumericArrayColumn"))
      // extract the t-digest array
      val tdArray = agg.getAs[TDigestArraySQL](0).tdigests
  12. def tdigestMLLibVecUDAF: TDigestMLLibVecUDAF

    Obtain a UDAF for sketching an MLLib Vector Dataset column, using a t-digest for each element in the vector

    Obtain a UDAF for sketching an MLLib Vector Dataset column, using a t-digest for each element in the vector

    returns

    A UDAF that can be applied to a MLLib Vector column

    Example:
    1. import org.isarnproject.sketches.udaf._, org.apache.spark.isarnproject.sketches.udt._
      // create a UDAF for t-digest array, adding custom settings for delta and maxDiscrete
      val udafTD = tdigestMLLibVecUDAF[Double].delta(0.1).maxDiscrete(25)
      // apply the UDAF to get an array of t-digests for each element in the array-data
      val agg = data.agg(udafTD($"MLLibVecColumn"))
      // extract the t-digest array
      val tdArray = agg.getAs[TDigestArraySQL](0).tdigests
  13. def tdigestMLVecUDAF: TDigestMLVecUDAF

    Obtain a UDAF for sketching an ML Vector Dataset column, using a t-digest for each element in the vector

    Obtain a UDAF for sketching an ML Vector Dataset column, using a t-digest for each element in the vector

    returns

    A UDAF that can be applied to a ML Vector column

    Example:
    1. import org.isarnproject.sketches.udaf._, org.apache.spark.isarnproject.sketches.udt._
      // create a UDAF for t-digest array, adding custom settings for delta and maxDiscrete
      val udafTD = tdigestMLVecUDAF[Double].delta(0.1).maxDiscrete(25)
      // apply the UDAF to get an array of t-digests for each element in the array-data
      val agg = data.agg(udafTD($"MLVecColumn"))
      // extract the t-digest array
      val tdArray = agg.getAs[TDigestArraySQL](0).tdigests
  14. def tdigestReduceUDAF: TDigestReduceUDAF

    Obtain a UDAF for aggregating (reducing) a column (or grouping) of t-digests

    Obtain a UDAF for aggregating (reducing) a column (or grouping) of t-digests

    returns

    A UDAF that can be applied to a column or grouping of t-digests

    Example:
    1. scala> import org.isarnproject.sketches.udaf._, org.apache.spark.isarnproject.sketches.udt._
      // column of t-digests (might also be result of aggregating from groupBy)
      scala> tds.show()
      +--------------------+
      |            tdigests|
      +--------------------+
      |TDigestSQL(TDiges...|
      |TDigestSQL(TDiges...|
      |TDigestSQL(TDiges...|
      +--------------------+
      
      // apply tdigestReduceUDAF to reduce the t-digests to a single combined t-digest
      scala> val td = tds.agg(tdigestReduceUDAF($"tdigests").alias("tdigest"))
      td: org.apache.spark.sql.DataFrame = [tdigest: tdigest]
      
      scala> td.show()
      +--------------------+
      |             tdigest|
      +--------------------+
      |TDigestSQL(TDiges...|
      +--------------------+
  15. def tdigestUDAF[N](implicit num: Numeric[N], dataType: TDigestUDAFDataType[N]): TDigestUDAF[N]

    Obtain a UDAF for sketching a single numeric Dataset column using a t-digest

    Obtain a UDAF for sketching a single numeric Dataset column using a t-digest

    N

    The numeric type of the column; Double, Int, etc

    returns

    A UDAF that can be applied to a Dataset column

    Example:
    1. import org.isarnproject.sketches.udaf._, org.apache.spark.isarnproject.sketches.udt._
      // create a UDAF for a t-digest, adding custom settings for delta and maxDiscrete
      val udafTD = tdigestUDAF[Double].delta(0.1).maxDiscrete(25)
      // apply the UDAF to get a t-digest for a data column
      val agg = data.agg(udafTD($"NumericColumn"))
      // extract the t-digest
      val td = agg.getAs[TDigestSQL](0).tdigest

Inherited from AnyRef

Inherited from Any

Ungrouped