Packages

o

ai.chronon.aggregator.row

StatsGenerator

object StatsGenerator

Module managing FeatureStats Schema, Aggregations to be used by type and aggregator construction.

Stats Aggregation has an offline/ batch component and an online component. The metrics defined for stats depend on the schema of the join. The dataTypes and column names. For the online side, we obtain this information from the JoinCodec/valueSchema For the offline side, we obtain this information directly from the outputTable. To keep the schemas consistent we sort the metrics in the schema by name. (one column can have multiple metrics).

Linear Supertypes
AnyRef, Any
Ordering
  1. Alphabetic
  2. By Inheritance
Inherited
  1. StatsGenerator
  2. AnyRef
  3. Any
  1. Hide All
  2. Show All
Visibility
  1. Public
  2. Protected

Type Members

  1. case class MetricTransform(name: String, expression: InputTransform, operation: Operation, suffix: String = "", argMap: Map[String, String] = null) extends Product with Serializable

    MetricTransform represents a single statistic built on top of an input column.

Value Members

  1. final def !=(arg0: Any): Boolean
    Definition Classes
    AnyRef → Any
  2. final def ##: Int
    Definition Classes
    AnyRef → Any
  3. final def ==(arg0: Any): Boolean
    Definition Classes
    AnyRef → Any
  4. def SeriesFinalizer(key: String, value: AnyRef): AnyRef

    Post processing for IRs when generating a time series of stats.

    Post processing for IRs when generating a time series of stats. In the case of percentiles for examples we reduce to 5 values in order to generate candlesticks.

  5. def anyTransforms(column: String): Seq[MetricTransform]

    Stats applied to any column

  6. final def asInstanceOf[T0]: T0
    Definition Classes
    Any
  7. def buildAggPart(m: MetricTransform): AggregationPart
  8. def buildAggregator(metrics: Seq[MetricTransform], selectedSchema: StructType): RowAggregator

    Build RowAggregator to use for computing stats on a dataframe based on metrics

  9. def buildMetrics(fields: Seq[(String, DataType)]): Seq[MetricTransform]

    For the schema of the data define metrics to be aggregated

  10. def clone(): AnyRef
    Attributes
    protected[lang]
    Definition Classes
    AnyRef
    Annotations
    @throws(classOf[java.lang.CloneNotSupportedException]) @native()
  11. final def eq(arg0: AnyRef): Boolean
    Definition Classes
    AnyRef
  12. def equals(arg0: AnyRef): Boolean
    Definition Classes
    AnyRef → Any
  13. def finalize(): Unit
    Attributes
    protected[lang]
    Definition Classes
    AnyRef
    Annotations
    @throws(classOf[java.lang.Throwable])
  14. val finalizedPercentilesMerged: Array[Double]
  15. val finalizedPercentilesSeries: Array[Double]
  16. final def getClass(): Class[_ <: AnyRef]
    Definition Classes
    AnyRef → Any
    Annotations
    @native()
  17. def hashCode(): Int
    Definition Classes
    AnyRef → Any
    Annotations
    @native()
  18. val ignoreColumns: Seq[String]
  19. final def isInstanceOf[T0]: Boolean
    Definition Classes
    Any
  20. def lInfKllSketch(sketch1: AnyRef, sketch2: AnyRef, bins: Int = 128): AnyRef
  21. final def ne(arg0: AnyRef): Boolean
    Definition Classes
    AnyRef
  22. final def notify(): Unit
    Definition Classes
    AnyRef
    Annotations
    @native()
  23. final def notifyAll(): Unit
    Definition Classes
    AnyRef
    Annotations
    @native()
  24. val nullRateSuffix: String
  25. val nullSuffix: String
  26. def numericTransforms(column: String): Seq[MetricTransform]

    Stats applied to numeric columns

  27. def statsInputSchema(valueSchema: StructType): StructType

    Input schema is the data required to update partial aggregations / stats.

    Input schema is the data required to update partial aggregations / stats.

    Given a valueSchema and a metric transform list, defines the schema expected by the Stats aggregator (online and offline)

  28. def statsIrSchema(valueSchema: StructType): StructType

    A valueSchema (for join) and Metric list define uniquely the IRSchema to be used for the statistics.

    A valueSchema (for join) and Metric list define uniquely the IRSchema to be used for the statistics. In order to support custom storage for statistic percentiles this method would need to be modified. IR Schemas are used to decode streaming partial aggregations as well as KvStore partial stats.

  29. final def synchronized[T0](arg0: => T0): T0
    Definition Classes
    AnyRef
  30. def toString(): String
    Definition Classes
    AnyRef → Any
  31. val totalColumn: String
  32. final def wait(): Unit
    Definition Classes
    AnyRef
    Annotations
    @throws(classOf[java.lang.InterruptedException])
  33. final def wait(arg0: Long, arg1: Int): Unit
    Definition Classes
    AnyRef
    Annotations
    @throws(classOf[java.lang.InterruptedException])
  34. final def wait(arg0: Long): Unit
    Definition Classes
    AnyRef
    Annotations
    @throws(classOf[java.lang.InterruptedException]) @native()
  35. object InputTransform extends Enumeration

    InputTransform acts as a signal of how to process the metric.

    InputTransform acts as a signal of how to process the metric.

    IsNull: Check if the input is null.

    Raw: Operate in the input column.

    One: lit(true) in spark. Used for row counts leveraged to obtain null rate values.

Inherited from AnyRef

Inherited from Any

Ungrouped