Class/Object

org.apache.spark.sql.catalyst.util

QuantileSummaries

Related Docs: object QuantileSummaries | package util

Permalink

class QuantileSummaries extends Serializable

Helper class to compute approximate quantile summary. This implementation is based on the algorithm proposed in the paper: "Space-efficient Online Computation of Quantile Summaries" by Greenwald, Michael and Khanna, Sanjeev. (http://dx.doi.org/10.1145/375663.375670)

In order to optimize for speed, it maintains an internal buffer of the last seen samples, and only inserts them after crossing a certain size threshold. This guarantees a near-constant runtime complexity compared to the original algorithm.

Linear Supertypes
Serializable, Serializable, AnyRef, Any
Ordering
  1. Alphabetic
  2. By Inheritance
Inherited
  1. QuantileSummaries
  2. Serializable
  3. Serializable
  4. AnyRef
  5. Any
  1. Hide All
  2. Show All
Visibility
  1. Public
  2. All

Instance Constructors

  1. new QuantileSummaries(compressThreshold: Int, relativeError: Double, sampled: Array[Stats] = Array.empty, count: Long = 0L)

    Permalink

    compressThreshold

    the compression threshold. After the internal buffer of statistics crosses this size, it attempts to compress the statistics together.

    relativeError

    the target relative error. It is uniform across the complete range of values.

    sampled

    a buffer of quantile statistics. See the G-K article for more details.

    count

    the count of all the elements *inserted in the sampled buffer* (excluding the head buffer)

Value Members

  1. final def !=(arg0: Any): Boolean

    Permalink
    Definition Classes
    AnyRef → Any
  2. final def ##(): Int

    Permalink
    Definition Classes
    AnyRef → Any
  3. final def ==(arg0: Any): Boolean

    Permalink
    Definition Classes
    AnyRef → Any
  4. final def asInstanceOf[T0]: T0

    Permalink
    Definition Classes
    Any
  5. def clone(): AnyRef

    Permalink
    Attributes
    protected[java.lang]
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  6. def compress(): QuantileSummaries

    Permalink

    Returns a new summary that compresses the summary statistics and the head buffer.

    Returns a new summary that compresses the summary statistics and the head buffer.

    This implements the COMPRESS function of the GK algorithm. It does not modify the object.

    returns

    a new summary object with compressed statistics

  7. val compressThreshold: Int

    Permalink

    the compression threshold.

    the compression threshold. After the internal buffer of statistics crosses this size, it attempts to compress the statistics together.

  8. val count: Long

    Permalink

    the count of all the elements *inserted in the sampled buffer* (excluding the head buffer)

  9. final def eq(arg0: AnyRef): Boolean

    Permalink
    Definition Classes
    AnyRef
  10. def equals(arg0: Any): Boolean

    Permalink
    Definition Classes
    AnyRef → Any
  11. def finalize(): Unit

    Permalink
    Attributes
    protected[java.lang]
    Definition Classes
    AnyRef
    Annotations
    @throws( classOf[java.lang.Throwable] )
  12. final def getClass(): Class[_]

    Permalink
    Definition Classes
    AnyRef → Any
  13. def hashCode(): Int

    Permalink
    Definition Classes
    AnyRef → Any
  14. def insert(x: Double): QuantileSummaries

    Permalink

    Returns a summary with the given observation inserted into the summary.

    Returns a summary with the given observation inserted into the summary. This method may either modify in place the current summary (and return the same summary, modified in place), or it may create a new summary from scratch it necessary.

    x

    the new observation to insert into the summary

  15. final def isInstanceOf[T0]: Boolean

    Permalink
    Definition Classes
    Any
  16. def merge(other: QuantileSummaries): QuantileSummaries

    Permalink

    Merges two (compressed) summaries together.

    Merges two (compressed) summaries together.

    Returns a new summary.

  17. final def ne(arg0: AnyRef): Boolean

    Permalink
    Definition Classes
    AnyRef
  18. final def notify(): Unit

    Permalink
    Definition Classes
    AnyRef
  19. final def notifyAll(): Unit

    Permalink
    Definition Classes
    AnyRef
  20. def query(quantile: Double): Option[Double]

    Permalink

    Runs a query for a given quantile.

    Runs a query for a given quantile. The result follows the approximation guarantees detailed above. The query can only be run on a compressed summary: you need to call compress() before using it.

    quantile

    the target quantile

  21. val relativeError: Double

    Permalink

    the target relative error.

    the target relative error. It is uniform across the complete range of values.

  22. val sampled: Array[Stats]

    Permalink

    a buffer of quantile statistics.

    a buffer of quantile statistics. See the G-K article for more details.

  23. final def synchronized[T0](arg0: ⇒ T0): T0

    Permalink
    Definition Classes
    AnyRef
  24. def toString(): String

    Permalink
    Definition Classes
    AnyRef → Any
  25. final def wait(): Unit

    Permalink
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  26. final def wait(arg0: Long, arg1: Int): Unit

    Permalink
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  27. final def wait(arg0: Long): Unit

    Permalink
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )

Inherited from Serializable

Inherited from Serializable

Inherited from AnyRef

Inherited from Any

Ungrouped