org.apache.spark.sql.catalyst.expressions.aggregate

HyperLogLogPlusPlus

case class HyperLogLogPlusPlus(child: Expression, relativeSD: Double = 0.05, mutableAggBufferOffset: Int = 0, inputAggBufferOffset: Int = 0) extends ImperativeAggregate with Product with Serializable

HyperLogLog++ (HLL++) is a state of the art cardinality estimation algorithm. This class implements the dense version of the HLL++ algorithm as an Aggregate Function.

This implementation has been based on the following papers: HyperLogLog: the analysis of a near-optimal cardinality estimation algorithm http://algo.inria.fr/flajolet/Publications/FlFuGaMe07.pdf

HyperLogLog in Practice: Algorithmic Engineering of a State of The Art Cardinality Estimation Algorithm http://static.googleusercontent.com/external_content/untrusted_dlcp/research.google.com/en/us/pubs/archive/40671.pdf

Appendix to HyperLogLog in Practice: Algorithmic Engineering of a State of the Art Cardinality Estimation Algorithm https://docs.google.com/document/d/1gyjfMHy43U9OWBXxfaeG-3MjGzejW1dlpyMwEYAAWEI/view?fullscreen#

child

to estimate the cardinality of.

relativeSD

the maximum estimation error allowed.

Linear Supertypes
Ordering
  1. Alphabetic
  2. By inheritance
Inherited
  1. HyperLogLogPlusPlus
  2. Serializable
  3. Serializable
  4. ImperativeAggregate
  5. AggregateFunction
  6. ImplicitCastInputTypes
  7. ExpectsInputTypes
  8. Expression
  9. TreeNode
  10. Product
  11. Equals
  12. AnyRef
  13. Any
  1. Hide All
  2. Show all
Learn more about member selection
Visibility
  1. Public
  2. All

Instance Constructors

  1. new HyperLogLogPlusPlus(child: Expression, relativeSD: Expression)

  2. new HyperLogLogPlusPlus(child: Expression)

  3. new HyperLogLogPlusPlus(child: Expression, relativeSD: Double = 0.05, mutableAggBufferOffset: Int = 0, inputAggBufferOffset: Int = 0)

    child

    to estimate the cardinality of.

    relativeSD

    the maximum estimation error allowed.

Value Members

  1. final def !=(arg0: AnyRef): Boolean

    Definition Classes
    AnyRef
  2. final def !=(arg0: Any): Boolean

    Definition Classes
    Any
  3. final def ##(): Int

    Definition Classes
    AnyRef → Any
  4. final def ==(arg0: AnyRef): Boolean

    Definition Classes
    AnyRef
  5. final def ==(arg0: Any): Boolean

    Definition Classes
    Any
  6. val aggBufferAttributes: Seq[AttributeReference]

    Allocate enough words to store all registers.

    Allocate enough words to store all registers.

    Definition Classes
    HyperLogLogPlusPlusAggregateFunction
  7. def aggBufferSchema: StructType

    The schema of the aggregation buffer.

    The schema of the aggregation buffer.

    Definition Classes
    HyperLogLogPlusPlusAggregateFunction
  8. def apply(number: Int): Expression

    Returns the tree node at the specified number.

    Returns the tree node at the specified number. Numbers for each node can be found in the numberedTreeString.

    Definition Classes
    TreeNode
  9. def argString: String

    Returns a string representing the arguments to this node, minus any children

    Returns a string representing the arguments to this node, minus any children

    Definition Classes
    TreeNode
  10. def asCode: String

    Returns a 'scala code' representation of this TreeNode and its children.

    Returns a 'scala code' representation of this TreeNode and its children. Intended for use when debugging where the prettier toString function is obfuscating the actual structure. In the case of 'pure' TreeNodes that only contain primitives and other TreeNodes, the result can be pasted in the REPL to build an equivalent Tree.

    Definition Classes
    TreeNode
  11. final def asInstanceOf[T0]: T0

    Definition Classes
    Any
  12. def checkInputDataTypes(): TypeCheckResult

    Checks the input data types, returns TypeCheckResult.success if it's valid, or returns a TypeCheckResult with an error message if invalid.

    Checks the input data types, returns TypeCheckResult.success if it's valid, or returns a TypeCheckResult with an error message if invalid. Note: it's not valid to call this method until childrenResolved == true.

    Definition Classes
    ExpectsInputTypesExpression
  13. val child: Expression

    to estimate the cardinality of.

  14. def children: Seq[Expression]

    Returns a Seq of the children of this node.

    Returns a Seq of the children of this node. Children should not change. Immutability required for containsChild optimization

    Definition Classes
    HyperLogLogPlusPlusTreeNode
  15. def childrenResolved: Boolean

    Returns true if all the children of this expression have been resolved to a specific schema and false if any still contains any unresolved placeholders.

    Returns true if all the children of this expression have been resolved to a specific schema and false if any still contains any unresolved placeholders.

    Definition Classes
    Expression
  16. def clone(): AnyRef

    Attributes
    protected[java.lang]
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  17. def collect[B](pf: PartialFunction[Expression, B]): Seq[B]

    Returns a Seq containing the result of applying a partial function to all elements in this tree on which the function is defined.

    Returns a Seq containing the result of applying a partial function to all elements in this tree on which the function is defined.

    Definition Classes
    TreeNode
  18. def collectFirst[B](pf: PartialFunction[Expression, B]): Option[B]

    Finds and returns the first TreeNode of the tree for which the given partial function is defined (pre-order), and applies the partial function to it.

    Finds and returns the first TreeNode of the tree for which the given partial function is defined (pre-order), and applies the partial function to it.

    Definition Classes
    TreeNode
  19. lazy val containsChild: Set[TreeNode[_]]

    Definition Classes
    TreeNode
  20. def dataType: DataType

    Returns the DataType of the result of evaluating this expression.

    Returns the DataType of the result of evaluating this expression. It is invalid to query the dataType of an unresolved expression (i.e., when resolved == false).

    Definition Classes
    HyperLogLogPlusPlusExpression
  21. def defaultResult: Option[Literal]

    Result of the aggregate function when the input is empty.

    Result of the aggregate function when the input is empty. This is currently only used for the proper rewriting of distinct aggregate functions.

    Definition Classes
    AggregateFunction
  22. def deterministic: Boolean

    Returns true when the current expression always return the same result for fixed inputs from children.

    Returns true when the current expression always return the same result for fixed inputs from children.

    Note that this means that an expression should be considered as non-deterministic if: - if it relies on some mutable internal state, or - if it relies on some implicit input that is not part of the children expression list. - if it has non-deterministic child or children.

    An example would be SparkPartitionID that relies on the partition id returned by TaskContext. By default leaf expressions are deterministic as Nil.forall(_.deterministic) returns true.

    Definition Classes
    Expression
  23. final def eq(arg0: AnyRef): Boolean

    Definition Classes
    AnyRef
  24. def estimateBias(e: Double): Double

    Estimate the bias using the raw estimates with their respective biases from the HLL++ appendix.

    Estimate the bias using the raw estimates with their respective biases from the HLL++ appendix. We currently use KNN interpolation to determine the bias (as suggested in the paper).

  25. def eval(buffer: InternalRow): Any

    Compute the HyperLogLog estimate.

    Compute the HyperLogLog estimate.

    Variable names in the HLL++ paper match variable names in the code.

    Definition Classes
    HyperLogLogPlusPlusExpression
  26. def fastEquals(other: TreeNode[_]): Boolean

    Faster version of equality which short-circuits when two treeNodes are the same instance.

    Faster version of equality which short-circuits when two treeNodes are the same instance. We don't just override Object.equals, as doing so prevents the scala compiler from generating case class equals methods

    Definition Classes
    TreeNode
  27. def finalize(): Unit

    Attributes
    protected[java.lang]
    Definition Classes
    AnyRef
    Annotations
    @throws( classOf[java.lang.Throwable] )
  28. def find(f: (Expression) ⇒ Boolean): Option[Expression]

    Find the first TreeNode that satisfies the condition specified by f.

    Find the first TreeNode that satisfies the condition specified by f. The condition is recursively applied to this node and all of its children (pre-order).

    Definition Classes
    TreeNode
  29. def flatMap[A](f: (Expression) ⇒ TraversableOnce[A]): Seq[A]

    Returns a Seq by applying a function to all nodes in this tree and using the elements of the resulting collections.

    Returns a Seq by applying a function to all nodes in this tree and using the elements of the resulting collections.

    Definition Classes
    TreeNode
  30. final def foldable: Boolean

    An aggregate function is not foldable.

    An aggregate function is not foldable.

    Definition Classes
    AggregateFunctionExpression
  31. def foreach(f: (Expression) ⇒ Unit): Unit

    Runs the given function on this node and then recursively on children.

    Runs the given function on this node and then recursively on children.

    f

    the function to be applied to each node in the tree.

    Definition Classes
    TreeNode
  32. def foreachUp(f: (Expression) ⇒ Unit): Unit

    Runs the given function recursively on children then on this node.

    Runs the given function recursively on children then on this node.

    f

    the function to be applied to each node in the tree.

    Definition Classes
    TreeNode
  33. def gen(ctx: CodeGenContext): GeneratedExpressionCode

    Returns an GeneratedExpressionCode, which contains Java source code that can be used to generate the result of evaluating the expression on an input row.

    Returns an GeneratedExpressionCode, which contains Java source code that can be used to generate the result of evaluating the expression on an input row.

    ctx

    a CodeGenContext

    returns

    GeneratedExpressionCode

    Definition Classes
    Expression
  34. def genCode(ctx: CodeGenContext, ev: GeneratedExpressionCode): String

    Returns Java source code that can be compiled to evaluate this expression.

    Returns Java source code that can be compiled to evaluate this expression. The default behavior is to call the eval method of the expression. Concrete expression implementations should override this to do actual code generation.

    ctx

    a CodeGenContext

    ev

    an GeneratedExpressionCode with unique terms.

    returns

    Java source code

    Attributes
    protected
    Definition Classes
    AggregateFunctionExpression
  35. def generateTreeString(depth: Int, lastChildren: Seq[Boolean], builder: StringBuilder): StringBuilder

    Appends the string represent of this node and its children to the given StringBuilder.

    Appends the string represent of this node and its children to the given StringBuilder.

    The i-th element in lastChildren indicates whether the ancestor of the current node at depth i + 1 is the last child of its own parent node. The depth of the root node is 0, and lastChildren for the root node should be empty.

    Attributes
    protected
    Definition Classes
    TreeNode
  36. final def getClass(): Class[_]

    Definition Classes
    AnyRef → Any
  37. def getNodeNumbered(number: trees.MutableInt): Expression

    Attributes
    protected
    Definition Classes
    TreeNode
  38. def initialize(buffer: MutableRow): Unit

    Fill all words with zeros.

    Fill all words with zeros.

    Definition Classes
    HyperLogLogPlusPlusImperativeAggregate
  39. val inputAggBufferAttributes: Seq[AttributeReference]

    Attributes of fields in input aggregation buffers (immutable aggregation buffers that are merged with mutable aggregation buffers in the merge() function or merge expressions).

    Attributes of fields in input aggregation buffers (immutable aggregation buffers that are merged with mutable aggregation buffers in the merge() function or merge expressions). These attributes are created automatically by cloning the aggBufferAttributes.

    Definition Classes
    HyperLogLogPlusPlusAggregateFunction
  40. val inputAggBufferOffset: Int

    The offset of this function's start buffer value in the underlying shared input aggregation buffer.

    The offset of this function's start buffer value in the underlying shared input aggregation buffer. An input aggregation buffer is used when we merge two aggregation buffers together in the update() function and is immutable (we merge an input aggregation buffer and a mutable aggregation buffer and then store the new buffer values to the mutable aggregation buffer).

    An input aggregation buffer may contain extra fields, such as grouping keys, at its start, so mutableAggBufferOffset and inputAggBufferOffset are often different.

    For example, say we have a grouping expression, key, and two aggregate functions, avg(x) and avg(y). In the shared input aggregation buffer, the position of the first buffer value of avg(x) will be 1 and the position of the first buffer value of avg(y) will be 3 (position 0 is used for the value of key):

    avg(x) inputAggBufferOffset = 1 | v +--------+--------+--------+--------+--------+ | key | sum1 | count1 | sum2 | count2 | +--------+--------+--------+--------+--------+ ^ | avg(y) inputAggBufferOffset = 3

    Definition Classes
    HyperLogLogPlusPlusImperativeAggregate
  41. def inputTypes: Seq[AbstractDataType]

    Expected input types from child expressions.

    Expected input types from child expressions. The i-th position in the returned seq indicates the type requirement for the i-th child.

    The possible values at each position are: 1. a specific data type, e.g. LongType, StringType. 2. a non-leaf abstract data type, e.g. NumericType, IntegralType, FractionalType.

    Definition Classes
    HyperLogLogPlusPlusExpectsInputTypes
  42. final def isInstanceOf[T0]: Boolean

    Definition Classes
    Any
  43. def jsonFields: List[(String, JValue)]

    Attributes
    protected
    Definition Classes
    TreeNode
  44. def makeCopy(newArgs: Array[AnyRef]): Expression

    Creates a copy of this type of tree node after a transformation.

    Creates a copy of this type of tree node after a transformation. Must be overridden by child classes that have constructor arguments that are not present in the productIterator.

    newArgs

    the new product arguments.

    Definition Classes
    TreeNode
  45. def map[A](f: (Expression) ⇒ A): Seq[A]

    Returns a Seq containing the result of applying the given function to each node in this tree in a preorder traversal.

    Returns a Seq containing the result of applying the given function to each node in this tree in a preorder traversal.

    f

    the function to be applied.

    Definition Classes
    TreeNode
  46. def mapChildren(f: (Expression) ⇒ Expression): Expression

    Returns a copy of this node where f has been applied to all the nodes children.

    Returns a copy of this node where f has been applied to all the nodes children.

    Definition Classes
    TreeNode
  47. def merge(buffer1: MutableRow, buffer2: InternalRow): Unit

    Merge the HLL buffers by iterating through the registers in both buffers and select the maximum number of leading zeros for each register.

    Merge the HLL buffers by iterating through the registers in both buffers and select the maximum number of leading zeros for each register.

    Definition Classes
    HyperLogLogPlusPlusImperativeAggregate
  48. val mutableAggBufferOffset: Int

    The offset of this function's first buffer value in the underlying shared mutable aggregation buffer.

    The offset of this function's first buffer value in the underlying shared mutable aggregation buffer.

    For example, we have two aggregate functions avg(x) and avg(y), which share the same aggregation buffer. In this shared buffer, the position of the first buffer value of avg(x) will be 0 and the position of the first buffer value of avg(y) will be 2:

    avg(x) mutableAggBufferOffset = 0 | v +--------+--------+--------+--------+ | sum1 | count1 | sum2 | count2 | +--------+--------+--------+--------+ ^ | avg(y) mutableAggBufferOffset = 2

    Definition Classes
    HyperLogLogPlusPlusImperativeAggregate
  49. final def ne(arg0: AnyRef): Boolean

    Definition Classes
    AnyRef
  50. def nodeName: String

    Returns the name of this type of TreeNode.

    Returns the name of this type of TreeNode. Defaults to the class name.

    Definition Classes
    TreeNode
  51. final def notify(): Unit

    Definition Classes
    AnyRef
  52. final def notifyAll(): Unit

    Definition Classes
    AnyRef
  53. def nullable: Boolean

    Definition Classes
    HyperLogLogPlusPlusExpression
  54. def numberedTreeString: String

    Returns a string representation of the nodes in this tree, where each operator is numbered.

    Returns a string representation of the nodes in this tree, where each operator is numbered. The numbers can be used with apply to easily access specific subtrees.

    Definition Classes
    TreeNode
  55. val origin: Origin

    Definition Classes
    TreeNode
  56. def otherCopyArgs: Seq[AnyRef]

    Args to the constructor that should be copied, but not transformed.

    Args to the constructor that should be copied, but not transformed. These are appended to the transformed args automatically by makeCopy

    returns

    Attributes
    protected
    Definition Classes
    TreeNode
  57. def prettyJson: String

    Definition Classes
    TreeNode
  58. def prettyName: String

    Returns a user-facing string representation of this expression's name.

    Returns a user-facing string representation of this expression's name. This should usually match the name of the function in SQL.

    Definition Classes
    Expression
  59. def prettyString: String

    Returns a user-facing string representation of this expression, i.

    Returns a user-facing string representation of this expression, i.e. does not have developer centric debugging information like the expression id.

    Definition Classes
    Expression
  60. def references: AttributeSet

    Definition Classes
    Expression
  61. val relativeSD: Double

    the maximum estimation error allowed.

  62. lazy val resolved: Boolean

    Returns true if this expression and all its children have been resolved to a specific schema and input data types checking passed, and false if it still contains any unresolved placeholders or has data types mismatch.

    Returns true if this expression and all its children have been resolved to a specific schema and input data types checking passed, and false if it still contains any unresolved placeholders or has data types mismatch. Implementations of expressions should override this if the resolution of this type of expression involves more than just the resolution of its children and type checking.

    Definition Classes
    Expression
  63. def semanticEquals(other: Expression): Boolean

    Returns true when two expressions will always compute the same result, even if they differ cosmetically (i.

    Returns true when two expressions will always compute the same result, even if they differ cosmetically (i.e. capitalization of names in attributes may be different).

    Definition Classes
    Expression
  64. def semanticHash(): Int

    Returns the hash for this expression.

    Returns the hash for this expression. Expressions that compute the same result, even if they differ cosmetically should return the same hash.

    Definition Classes
    Expression
  65. def simpleString: String

    String representation of this node without any children

    String representation of this node without any children

    Definition Classes
    ExpressionTreeNode
  66. def stringArgs: Iterator[Any]

    The arguments that should be included in the arg string.

    The arguments that should be included in the arg string. Defaults to the productIterator.

    Attributes
    protected
    Definition Classes
    TreeNode
  67. def supportsPartial: Boolean

    Indicates if this function supports partial aggregation.

    Indicates if this function supports partial aggregation. Currently Hive UDAF is the only one that doesn't support partial aggregation.

    Definition Classes
    AggregateFunction
  68. final def synchronized[T0](arg0: ⇒ T0): T0

    Definition Classes
    AnyRef
  69. def toAggregateExpression(isDistinct: Boolean): AggregateExpression

    Wraps this AggregateFunction in an AggregateExpression and set isDistinct field of the AggregateExpression to the given value because AggregateExpression is the container of an AggregateFunction, aggregation mode, and the flag indicating if this aggregation is distinct aggregation or not.

    Wraps this AggregateFunction in an AggregateExpression and set isDistinct field of the AggregateExpression to the given value because AggregateExpression is the container of an AggregateFunction, aggregation mode, and the flag indicating if this aggregation is distinct aggregation or not. An AggregateFunction should not be used without being wrapped in an AggregateExpression.

    Definition Classes
    AggregateFunction
  70. def toAggregateExpression(): AggregateExpression

    Wraps this AggregateFunction in an AggregateExpression because AggregateExpression is the container of an AggregateFunction, aggregation mode, and the flag indicating if this aggregation is distinct aggregation or not.

    Wraps this AggregateFunction in an AggregateExpression because AggregateExpression is the container of an AggregateFunction, aggregation mode, and the flag indicating if this aggregation is distinct aggregation or not. An AggregateFunction should not be used without being wrapped in an AggregateExpression.

    Definition Classes
    AggregateFunction
  71. def toCommentSafeString: String

    Returns the string representation of this expression that is safe to be put in code comments of generated code.

    Returns the string representation of this expression that is safe to be put in code comments of generated code.

    Attributes
    protected
    Definition Classes
    Expression
  72. def toJSON: String

    Definition Classes
    TreeNode
  73. def toString(): String

    Definition Classes
    ExpressionTreeNode → AnyRef → Any
  74. def transform(rule: PartialFunction[Expression, Expression]): Expression

    Returns a copy of this node where rule has been recursively applied to the tree.

    Returns a copy of this node where rule has been recursively applied to the tree. When rule does not apply to a given node it is left unchanged. Users should not expect a specific directionality. If a specific directionality is needed, transformDown or transformUp should be used.

    rule

    the function use to transform this nodes children

    Definition Classes
    TreeNode
  75. def transformChildren(rule: PartialFunction[Expression, Expression], nextOperation: (Expression, PartialFunction[Expression, Expression]) ⇒ Expression): Expression

    Returns a copy of this node where rule has been recursively applied to all the children of this node.

    Returns a copy of this node where rule has been recursively applied to all the children of this node. When rule does not apply to a given node it is left unchanged.

    rule

    the function used to transform this nodes children

    Attributes
    protected
    Definition Classes
    TreeNode
  76. def transformDown(rule: PartialFunction[Expression, Expression]): Expression

    Returns a copy of this node where rule has been recursively applied to it and all of its children (pre-order).

    Returns a copy of this node where rule has been recursively applied to it and all of its children (pre-order). When rule does not apply to a given node it is left unchanged.

    rule

    the function used to transform this nodes children

    Definition Classes
    TreeNode
  77. def transformUp(rule: PartialFunction[Expression, Expression]): Expression

    Returns a copy of this node where rule has been recursively applied first to all of its children and then itself (post-order).

    Returns a copy of this node where rule has been recursively applied first to all of its children and then itself (post-order). When rule does not apply to a given node, it is left unchanged.

    rule

    the function use to transform this nodes children

    Definition Classes
    TreeNode
  78. def treeString: String

    Returns a string representation of the nodes in this tree

    Returns a string representation of the nodes in this tree

    Definition Classes
    TreeNode
  79. def trueRsd: Double

    The rsd of HLL++ is always equal to or better than the rsd requested.

    The rsd of HLL++ is always equal to or better than the rsd requested. This method returns the rsd this instance actually guarantees.

    returns

    the actual rsd.

  80. def update(buffer: MutableRow, input: InternalRow): Unit

    Update the HLL++ buffer.

    Update the HLL++ buffer.

    Variable names in the HLL++ paper match variable names in the code.

    Definition Classes
    HyperLogLogPlusPlusImperativeAggregate
  81. final def wait(): Unit

    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  82. final def wait(arg0: Long, arg1: Int): Unit

    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  83. final def wait(arg0: Long): Unit

    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  84. def withNewChildren(newChildren: Seq[Expression]): Expression

    Returns a copy of this node with the children replaced.

    Returns a copy of this node with the children replaced. TODO: Validate somewhere (in debug mode?) that children are ordered correctly.

    Definition Classes
    TreeNode
  85. def withNewInputAggBufferOffset(newInputAggBufferOffset: Int): ImperativeAggregate

    Returns a copy of this ImperativeAggregate with an updated mutableAggBufferOffset.

    Returns a copy of this ImperativeAggregate with an updated mutableAggBufferOffset. This new copy's attributes may have different ids than the original.

    Definition Classes
    HyperLogLogPlusPlusImperativeAggregate
  86. def withNewMutableAggBufferOffset(newMutableAggBufferOffset: Int): ImperativeAggregate

    Returns a copy of this ImperativeAggregate with an updated mutableAggBufferOffset.

    Returns a copy of this ImperativeAggregate with an updated mutableAggBufferOffset. This new copy's attributes may have different ids than the original.

    Definition Classes
    HyperLogLogPlusPlusImperativeAggregate

Inherited from Serializable

Inherited from Serializable

Inherited from ImperativeAggregate

Inherited from AggregateFunction

Inherited from ImplicitCastInputTypes

Inherited from ExpectsInputTypes

Inherited from Expression

Inherited from TreeNode[Expression]

Inherited from Product

Inherited from Equals

Inherited from AnyRef

Inherited from Any

Ungrouped