org.apache.spark.sql.catalyst.plans.logical
number of distinct values
minimum value
maximum value
number of nulls
average length of the values. For fixed-length types, this should be a constant.
maximum length of the values. For fixed-length types, this should be a constant.
average length of the values.
average length of the values. For fixed-length types, this should be a constant.
number of distinct values
maximum value
maximum length of the values.
maximum length of the values. For fixed-length types, this should be a constant.
minimum value
number of nulls
Returns a map from string to string that can be used to serialize the column stats.
Returns a map from string to string that can be used to serialize the column stats. The key is the name of the field (e.g. "distinctCount" or "min"), and the value is the string representation for the value. min/max values are converted to the external data type. For example, for DateType we store java.sql.Date, and for TimestampType we store java.sql.Timestamp. The deserialization side is defined in ColumnStat.fromMap.
As part of the protocol, the returned map always contains a key called "version". In the case min/max values are null (None), they won't appear in the map.
Statistics collected for a column.
1. Supported data types are defined in
ColumnStat.supportsType
. 2. The JVM data type stored in min/max is the internal data type for the corresponding Catalyst data type. For example, the internal type of DateType is Int, and that the internal type of TimestampType is Long. 3. There is no guarantee that the statistics collected are accurate. Approximation algorithms (sketches) might have been used, and the data collected can also be stale.number of distinct values
minimum value
maximum value
number of nulls
average length of the values. For fixed-length types, this should be a constant.
maximum length of the values. For fixed-length types, this should be a constant.