A UDAF for aggregating (reducing) a column of t-digest vectors.
A UDAF for aggregating (reducing) a column of t-digest vectors. Expected to be created using tdigestArrayReduceUDAF.
The delta value to be used by the TDigest objects
The maxDiscrete value to be used by the TDigest objects
A UDAF for sketching a column of numeric ArrayData with an array of TDigest objects.
A UDAF for sketching a column of numeric ArrayData with an array of TDigest objects. Expected to be created using tdigestArrayUDAF.
the expected numeric type of the data; Double, Int, etc
The delta value to be used by the TDigest objects
The maxDiscrete value to be used by the TDigest objects
A UDAF for sketching a column of MLLib Vectors with an array of TDigest objects.
A UDAF for sketching a column of MLLib Vectors with an array of TDigest objects. Expected to be created using tdigestMLLibVecUDAF.
The delta value to be used by the TDigest object
The maxDiscrete value to be used by the TDigest object
A UDAF for sketching a column of ML Vectors with an array of TDigest objects.
A UDAF for sketching a column of ML Vectors with an array of TDigest objects. Expected to be created using tdigestMLVecUDAF.
The delta value to be used by the TDigest object
The maxDiscrete value to be used by the TDigest object
A base class that defines the common functionality for array sketching UDAFs
A UDAF for aggregating (reducing) a column of t-digests.
A UDAF for aggregating (reducing) a column of t-digests. Expected to be created using tdigestReduceUDAF.
The delta value to be used by the TDigest objects
The maxDiscrete value to be used by the TDigest objects
A UDAF for sketching numeric data with a TDigest.
A UDAF for sketching numeric data with a TDigest. Expected to be created using tdigestUDAF.
the expected numeric type of the data; Double, Int, etc
The delta value to be used by the TDigest object
The maxDiscrete value to be used by the TDigest object
For declaring implicit values that map numeric types to corresponding DataType values
implicitly unpack a TDigestArraySQL to extract its Array[TDigest] payload
implicitly unpack a TDigestSQL to extract its TDigest payload
Obtain a UDAF for aggregating (reducing) a column of t-digest vectors
Obtain a UDAF for aggregating (reducing) a column of t-digest vectors
A UDAF that can be applied to a column of t-digest vectors
scala> import org.isarnproject.sketches.udaf._, org.apache.spark.isarnproject.sketches.udt._ // column of t-digest arrays (might also be result of aggregating from groupBy) scala> tds.show() +--------------------+ | tdarrays| +--------------------+ |TDigestArraySQL([...| |TDigestArraySQL([...| |TDigestArraySQL([...| +--------------------+ // apply tdigestArrayReduceUDAF to reduce the t-digest arrays to single array scala> val td = tds.agg(tdigestArrayReduceUDAF($"tdigests").alias("tdarray")) td: org.apache.spark.sql.DataFrame = [tdarray: tdigestarray] scala> td.show() +---------------------+ | tdarray| +---------------------+ | TDigestArraySQL([...| +---------------------+
Obtain a UDAF for sketching a numeric array-data Dataset column, using a t-digest for each element.
Obtain a UDAF for sketching a numeric array-data Dataset column, using a t-digest for each element.
The numeric type of the array-data column; Double, Int, etc
A UDAF that can be applied to a Dataset array-data column
import org.isarnproject.sketches.udaf._, org.apache.spark.isarnproject.sketches.udt._ // create a UDAF for t-digest array, adding custom settings for delta and maxDiscrete val udafTD = tdigestArrayUDAF[Double].delta(0.1).maxDiscrete(25) // apply the UDAF to get an array of t-digests for each element in the array-data val agg = data.agg(udafTD($"NumericArrayColumn")) // extract the t-digest array val tdArray = agg.getAs[TDigestArraySQL](0).tdigests
Obtain a UDAF for sketching an MLLib Vector Dataset column, using a t-digest for each element in the vector
Obtain a UDAF for sketching an MLLib Vector Dataset column, using a t-digest for each element in the vector
A UDAF that can be applied to a MLLib Vector column
import org.isarnproject.sketches.udaf._, org.apache.spark.isarnproject.sketches.udt._ // create a UDAF for t-digest array, adding custom settings for delta and maxDiscrete val udafTD = tdigestMLLibVecUDAF[Double].delta(0.1).maxDiscrete(25) // apply the UDAF to get an array of t-digests for each element in the array-data val agg = data.agg(udafTD($"MLLibVecColumn")) // extract the t-digest array val tdArray = agg.getAs[TDigestArraySQL](0).tdigests
Obtain a UDAF for sketching an ML Vector Dataset column, using a t-digest for each element in the vector
Obtain a UDAF for sketching an ML Vector Dataset column, using a t-digest for each element in the vector
A UDAF that can be applied to a ML Vector column
import org.isarnproject.sketches.udaf._, org.apache.spark.isarnproject.sketches.udt._ // create a UDAF for t-digest array, adding custom settings for delta and maxDiscrete val udafTD = tdigestMLVecUDAF[Double].delta(0.1).maxDiscrete(25) // apply the UDAF to get an array of t-digests for each element in the array-data val agg = data.agg(udafTD($"MLVecColumn")) // extract the t-digest array val tdArray = agg.getAs[TDigestArraySQL](0).tdigests
Obtain a UDAF for aggregating (reducing) a column (or grouping) of t-digests
Obtain a UDAF for aggregating (reducing) a column (or grouping) of t-digests
A UDAF that can be applied to a column or grouping of t-digests
scala> import org.isarnproject.sketches.udaf._, org.apache.spark.isarnproject.sketches.udt._ // column of t-digests (might also be result of aggregating from groupBy) scala> tds.show() +--------------------+ | tdigests| +--------------------+ |TDigestSQL(TDiges...| |TDigestSQL(TDiges...| |TDigestSQL(TDiges...| +--------------------+ // apply tdigestReduceUDAF to reduce the t-digests to a single combined t-digest scala> val td = tds.agg(tdigestReduceUDAF($"tdigests").alias("tdigest")) td: org.apache.spark.sql.DataFrame = [tdigest: tdigest] scala> td.show() +--------------------+ | tdigest| +--------------------+ |TDigestSQL(TDiges...| +--------------------+
Obtain a UDAF for sketching a single numeric Dataset column using a t-digest
Obtain a UDAF for sketching a single numeric Dataset column using a t-digest
The numeric type of the column; Double, Int, etc
A UDAF that can be applied to a Dataset column
import org.isarnproject.sketches.udaf._, org.apache.spark.isarnproject.sketches.udt._ // create a UDAF for a t-digest, adding custom settings for delta and maxDiscrete val udafTD = tdigestUDAF[Double].delta(0.1).maxDiscrete(25) // apply the UDAF to get a t-digest for a data column val agg = data.agg(udafTD($"NumericColumn")) // extract the t-digest val td = agg.getAs[TDigestSQL](0).tdigest
package-wide methods, implicits and definitions for sketching UDAFs