T - the numeric type e.g. Integer, Double@Deprecated @PublicEvolving public class NumericColumnSummary<T> extends ColumnSummary implements Serializable
Some values are considered "missing" where "missing" is defined as null, NaN, or Infinity. These values are ignored in some calculations like mean, variance, and standardDeviation.
Uses the Kahan summation algorithm to avoid numeric instability when computing variance. The algorithm is described in: "Scalable and Numerically Stable Descriptive Statistics in SystemML", Tian et al, International Conference on Data Engineering 2012.
| Constructor and Description | 
|---|
| NumericColumnSummary(long nonMissingCount,
                    long nullCount,
                    long nanCount,
                    long infinityCount,
                    T min,
                    T max,
                    T sum,
                    Double mean,
                    Double variance,
                    Double standardDeviation)Deprecated.  | 
| Modifier and Type | Method and Description | 
|---|---|
| long | getInfinityCount()Deprecated.  Number of values that are positive or negative infinity. | 
| T | getMax()Deprecated.  | 
| Double | getMean()Deprecated.  Null, NaN, and Infinite values are ignored in this calculation. | 
| T | getMin()Deprecated.  | 
| long | getMissingCount()Deprecated.  The number of "missing" values where "missing" is defined as null, NaN, or Infinity. | 
| long | getNanCount()Deprecated.  Number of values that are NaN. | 
| long | getNonMissingCount()Deprecated.  The number of values that are not null, NaN, or Infinity. | 
| long | getNonNullCount()Deprecated.  The number of non-null values in this column. | 
| long | getNullCount()Deprecated.  The number of null values in this column. | 
| Double | getStandardDeviation()Deprecated.  Standard Deviation is a measure of variation in a set of numbers. | 
| T | getSum()Deprecated.  | 
| Double | getVariance()Deprecated.  Variance is a measure of how far a set of numbers are spread out. | 
| String | toString()Deprecated.  | 
containsNonNull, containsNull, getTotalCountpublic long getMissingCount()
These values are ignored in some calculations like mean, variance, and standardDeviation.
public long getNonMissingCount()
public long getNonNullCount()
getNonNullCount in class ColumnSummarypublic long getNullCount()
ColumnSummarygetNullCount in class ColumnSummarypublic long getNanCount()
(always zero for types like Short, Integer, Long)
public long getInfinityCount()
(always zero for types like Short, Integer, Long)
public T getMin()
public T getMax()
public T getSum()
public Double getMean()
public Double getVariance()
Null, NaN, and Infinite values are ignored in this calculation.
public Double getStandardDeviation()
Null, NaN, and Infinite values are ignored in this calculation.
Copyright © 2014–2023 The Apache Software Foundation. All rights reserved.