A UDAF for aggregating (reducing) a column of t-digest vectors.
A UDAF for sketching a column of numeric ArrayData with an array of TDigest objects.
A UDAF for sketching a column of MLLib Vectors with an array of TDigest objects.
A UDAF for sketching a column of ML Vectors with an array of TDigest objects.
A base class that defines the common functionality for array sketching UDAFs
A UDAF for aggregating (reducing) a column of t-digests.
A UDAF for sketching numeric data with a TDigest.
For declaring implicit values that map numeric types to corresponding DataType values
implicitly unpack a TDigestArraySQL to extract its Array[TDigest] payload
implicitly unpack a TDigestSQL to extract its TDigest payload
Obtain a UDAF for aggregating (reducing) a column of t-digest vectors
Obtain a UDAF for aggregating (reducing) a column of t-digest vectors
A UDAF that can be applied to a column of t-digest vectors
scala> import org.isarnproject.sketches.udaf._, org.apache.spark.isarnproject.sketches.udt._ // column of t-digest arrays (might also be result of aggregating from groupBy) scala> tds.show() +--------------------+ | tdarrays| +--------------------+ |TDigestArraySQL([...| |TDigestArraySQL([...| |TDigestArraySQL([...| +--------------------+ // apply tdigestArrayReduceUDAF to reduce the t-digest arrays to single array scala> val td = tds.agg(tdigestArrayReduceUDAF($"tdigests").alias("tdarray")) td: org.apache.spark.sql.DataFrame = [tdarray: tdigestarray] scala> td.show() +---------------------+ | tdarray| +---------------------+ | TDigestArraySQL([...| +---------------------+
Obtain a UDAF for sketching a numeric array-data Dataset column, using a t-digest for each element.
Obtain a UDAF for sketching a numeric array-data Dataset column, using a t-digest for each element.
The numeric type of the array-data column; Double, Int, etc
A UDAF that can be applied to a Dataset array-data column
import org.isarnproject.sketches.udaf._, org.apache.spark.isarnproject.sketches.udt._ // create a UDAF for t-digest array, adding custom settings for delta and maxDiscrete val udafTD = tdigestArrayUDAF[Double].delta(0.1).maxDiscrete(25) // apply the UDAF to get an array of t-digests for each element in the array-data val agg = data.agg(udafTD($"NumericArrayColumn")) // extract the t-digest array val tdArray = agg.getAs[TDigestArraySQL](0).tdigests
Obtain a UDAF for sketching an MLLib Vector Dataset column, using a t-digest for each element in the vector
Obtain a UDAF for sketching an MLLib Vector Dataset column, using a t-digest for each element in the vector
A UDAF that can be applied to a MLLib Vector column
import org.isarnproject.sketches.udaf._, org.apache.spark.isarnproject.sketches.udt._ // create a UDAF for t-digest array, adding custom settings for delta and maxDiscrete val udafTD = tdigestMLLibVecUDAF[Double].delta(0.1).maxDiscrete(25) // apply the UDAF to get an array of t-digests for each element in the array-data val agg = data.agg(udafTD($"MLLibVecColumn")) // extract the t-digest array val tdArray = agg.getAs[TDigestArraySQL](0).tdigests
Obtain a UDAF for sketching an ML Vector Dataset column, using a t-digest for each element in the vector
Obtain a UDAF for sketching an ML Vector Dataset column, using a t-digest for each element in the vector
A UDAF that can be applied to a ML Vector column
import org.isarnproject.sketches.udaf._, org.apache.spark.isarnproject.sketches.udt._ // create a UDAF for t-digest array, adding custom settings for delta and maxDiscrete val udafTD = tdigestMLVecUDAF[Double].delta(0.1).maxDiscrete(25) // apply the UDAF to get an array of t-digests for each element in the array-data val agg = data.agg(udafTD($"MLVecColumn")) // extract the t-digest array val tdArray = agg.getAs[TDigestArraySQL](0).tdigests
Obtain a UDAF for aggregating (reducing) a column (or grouping) of t-digests
Obtain a UDAF for aggregating (reducing) a column (or grouping) of t-digests
A UDAF that can be applied to a column or grouping of t-digests
scala> import org.isarnproject.sketches.udaf._, org.apache.spark.isarnproject.sketches.udt._ // column of t-digests (might also be result of aggregating from groupBy) scala> tds.show() +--------------------+ | tdigests| +--------------------+ |TDigestSQL(TDiges...| |TDigestSQL(TDiges...| |TDigestSQL(TDiges...| +--------------------+ // apply tdigestReduceUDAF to reduce the t-digests to a single combined t-digest scala> val td = tds.agg(tdigestReduceUDAF($"tdigests").alias("tdigest")) td: org.apache.spark.sql.DataFrame = [tdigest: tdigest] scala> td.show() +--------------------+ | tdigest| +--------------------+ |TDigestSQL(TDiges...| +--------------------+
Obtain a UDAF for sketching a single numeric Dataset column using a t-digest
Obtain a UDAF for sketching a single numeric Dataset column using a t-digest
The numeric type of the column; Double, Int, etc
A UDAF that can be applied to a Dataset column
import org.isarnproject.sketches.udaf._, org.apache.spark.isarnproject.sketches.udt._ // create a UDAF for a t-digest, adding custom settings for delta and maxDiscrete val udafTD = tdigestUDAF[Double].delta(0.1).maxDiscrete(25) // apply the UDAF to get a t-digest for a data column val agg = data.agg(udafTD($"NumericColumn")) // extract the t-digest val td = agg.getAs[TDigestSQL](0).tdigest
package-wide methods, implicits and definitions for sketching UDAFs