org.apache.spark.sql.catalyst.plans.logical.statsEstimation
Returns a percentage of rows meeting a condition in Filter node.
Returns a percentage of rows meeting a condition in Filter node. If it's a single condition, we calculate the percentage directly. If it's a compound condition, it is decomposed into multiple single conditions linked with AND, OR, NOT. For logical AND conditions, we need to update stats after a condition estimation so that the stats will be more accurate for subsequent estimation. This is needed for range condition such as (c > 40 AND c <= 50) For logical OR and NOT conditions, we do not update stats after a condition estimation.
the compound logical expression
a boolean flag to specify if we need to update ColumnStat of a column for subsequent conditions
an optional double value to show the percentage of rows meeting a given condition. It returns None if the condition is not supported.
Returns a percentage of rows meeting a single condition in Filter node.
Returns a percentage of rows meeting a single condition in Filter node. Currently we only support binary predicates where one side is a column, and the other is a literal.
a single logical expression
a boolean flag to specify if we need to update ColumnStat of a column for subsequent conditions
an optional double value to show the percentage of rows meeting a given condition. It returns None if the condition is not supported.
Returns an option of Statistics for a Filter logical plan node.
Returns an option of Statistics for a Filter logical plan node. For a given compound expression condition, this method computes filter selectivity (or the percentage of rows meeting the filter condition), which is used to compute row count, size in bytes, and the updated statistics after a given predicated is applied.
Option[Statistics] When there is no statistics collected, it returns None.
Returns a percentage of rows meeting a binary comparison expression.
Returns a percentage of rows meeting a binary comparison expression.
a binary comparison operator such as =, <, <=, >, >=
an Attribute (or a column)
a literal value (or constant)
a boolean flag to specify if we need to update ColumnStat of a given column for subsequent conditions
an optional double value to show the percentage of rows meeting a given condition It returns None if no statistics exists for a given column or wrong value.
Returns a percentage of rows meeting a binary comparison expression.
Returns a percentage of rows meeting a binary comparison expression. This method evaluate expression for Numeric/Date/Timestamp/Boolean columns.
a binary comparison operator such as =, <, <=, >, >=
an Attribute (or a column)
a literal value (or constant)
a boolean flag to specify if we need to update ColumnStat of a given column for subsequent conditions
an optional double value to show the percentage of rows meeting a given condition
Returns a percentage of rows meeting a binary comparison expression containing two columns.
Returns a percentage of rows meeting a binary comparison expression containing two columns. In SQL queries, we also see predicate expressions involving two columns such as "column-1 (op) column-2" where column-1 and column-2 belong to same table. Note that, if column-1 and column-2 belong to different tables, then it is a join operator's work, NOT a filter operator's work.
a binary comparison operator, including =, <=>, <, <=, >, >=
the left Attribute (or a column)
the right Attribute (or a column)
a boolean flag to specify if we need to update ColumnStat of the given columns for subsequent conditions
an optional double value to show the percentage of rows meeting a given condition
Returns a percentage of rows meeting an equality (=) expression.
Returns a percentage of rows meeting an equality (=) expression. This method evaluates the equality predicate for all data types.
For EqualNullSafe (<=>), if the literal is not null, result will be the same as EqualTo; if the literal is null, the condition will be changed to IsNull after optimization. So we don't need specific logic for EqualNullSafe here.
an Attribute (or a column)
a literal value (or constant)
a boolean flag to specify if we need to update ColumnStat of a given column for subsequent conditions
an optional double value to show the percentage of rows meeting a given condition
Returns a percentage of rows meeting "IN" operator expression.
Returns a percentage of rows meeting "IN" operator expression. This method evaluates the equality predicate for all data types.
an Attribute (or a column)
a set of literal values
a boolean flag to specify if we need to update ColumnStat of a given column for subsequent conditions
an optional double value to show the percentage of rows meeting a given condition It returns None if no statistics exists for a given column.
Returns a percentage of rows meeting a Literal expression.
Returns a percentage of rows meeting a Literal expression. This method evaluates all the possible literal cases in Filter.
FalseLiteral and TrueLiteral should be eliminated by optimizer, but null literal might be added by optimizer rule NullPropagation. For safety, we handle all the cases here.
a literal value (or constant)
an optional double value to show the percentage of rows meeting a given condition
Returns a percentage of rows meeting "IS NULL" or "IS NOT NULL" condition.
Returns a percentage of rows meeting "IS NULL" or "IS NOT NULL" condition.
an Attribute (or a column)
set to true for "IS NULL" condition. set to false for "IS NOT NULL" condition
a boolean flag to specify if we need to update ColumnStat of a given column for subsequent conditions
an optional double value to show the percentage of rows meeting a given condition It returns None if no statistics collected for a given column.