FilterEstimation

Instance Constructors

new FilterEstimation(plan: Filter, catalystConf: SQLConf)

Value Members

final def !=(arg0: Any): Boolean

Definition Classes
AnyRef → Any
final def ##(): Int

Definition Classes
AnyRef → Any
final def ==(arg0: Any): Boolean

Definition Classes
AnyRef → Any
final def asInstanceOf[T0]: T0

Definition Classes
Any
def calculateFilterSelectivity(condition: Expression, update: Boolean = true): Option[BigDecimal]

Returns a percentage of rows meeting a condition in Filter node.
Returns a percentage of rows meeting a condition in Filter node. If it's a single condition, we calculate the percentage directly. If it's a compound condition, it is decomposed into multiple single conditions linked with AND, OR, NOT. For logical AND conditions, we need to update stats after a condition estimation so that the stats will be more accurate for subsequent estimation. This is needed for range condition such as (c > 40 AND c <= 50) For logical OR and NOT conditions, we do not update stats after a condition estimation.
condition
the compound logical expression
update
a boolean flag to specify if we need to update ColumnStat of a column for subsequent conditions
returns
an optional double value to show the percentage of rows meeting a given condition. It returns None if the condition is not supported.
def calculateSingleCondition(condition: Expression, update: Boolean): Option[BigDecimal]

Returns a percentage of rows meeting a single condition in Filter node.
Returns a percentage of rows meeting a single condition in Filter node. Currently we only support binary predicates where one side is a column, and the other is a literal.
condition
a single logical expression
update
a boolean flag to specify if we need to update ColumnStat of a column for subsequent conditions
returns
an optional double value to show the percentage of rows meeting a given condition. It returns None if the condition is not supported.
val catalystConf: SQLConf
def clone(): AnyRef

Attributes
protected[java.lang]
Definition Classes
AnyRef
Annotations
@throws( ... )
final def eq(arg0: AnyRef): Boolean

Definition Classes
AnyRef
def estimate: Option[Statistics]

Returns an option of Statistics for a Filter logical plan node.
Returns an option of Statistics for a Filter logical plan node. For a given compound expression condition, this method computes filter selectivity (or the percentage of rows meeting the filter condition), which is used to compute row count, size in bytes, and the updated statistics after a given predicated is applied.
returns
Option[Statistics] When there is no statistics collected, it returns None.
def evaluateBinary(op: BinaryComparison, attr: Attribute, literal: Literal, update: Boolean): Option[BigDecimal]

Returns a percentage of rows meeting a binary comparison expression.
Returns a percentage of rows meeting a binary comparison expression.
op
a binary comparison operator such as =, <, <=, >, >=
attr
an Attribute (or a column)
literal
a literal value (or constant)
update
a boolean flag to specify if we need to update ColumnStat of a given column for subsequent conditions
returns
an optional double value to show the percentage of rows meeting a given condition It returns None if no statistics exists for a given column or wrong value.
def evaluateBinaryForNumeric(op: BinaryComparison, attr: Attribute, literal: Literal, update: Boolean): Option[BigDecimal]

Returns a percentage of rows meeting a binary comparison expression.
Returns a percentage of rows meeting a binary comparison expression. This method evaluate expression for Numeric/Date/Timestamp/Boolean columns.
op
a binary comparison operator such as =, <, <=, >, >=
attr
an Attribute (or a column)
literal
a literal value (or constant)
update
a boolean flag to specify if we need to update ColumnStat of a given column for subsequent conditions
returns
an optional double value to show the percentage of rows meeting a given condition
def evaluateBinaryForTwoColumns(op: BinaryComparison, attrLeft: Attribute, attrRight: Attribute, update: Boolean): Option[BigDecimal]

Returns a percentage of rows meeting a binary comparison expression containing two columns.
Returns a percentage of rows meeting a binary comparison expression containing two columns. In SQL queries, we also see predicate expressions involving two columns such as "column-1 (op) column-2" where column-1 and column-2 belong to same table. Note that, if column-1 and column-2 belong to different tables, then it is a join operator's work, NOT a filter operator's work.
op
a binary comparison operator, including =, <=>, <, <=, >, >=
attrLeft
the left Attribute (or a column)
attrRight
the right Attribute (or a column)
update
a boolean flag to specify if we need to update ColumnStat of the given columns for subsequent conditions
returns
an optional double value to show the percentage of rows meeting a given condition
def evaluateEquality(attr: Attribute, literal: Literal, update: Boolean): Option[BigDecimal]

Returns a percentage of rows meeting an equality (=) expression.
Returns a percentage of rows meeting an equality (=) expression. This method evaluates the equality predicate for all data types.
For EqualNullSafe (<=>), if the literal is not null, result will be the same as EqualTo; if the literal is null, the condition will be changed to IsNull after optimization. So we don't need specific logic for EqualNullSafe here.
attr
an Attribute (or a column)
literal
a literal value (or constant)
update
a boolean flag to specify if we need to update ColumnStat of a given column for subsequent conditions
returns
an optional double value to show the percentage of rows meeting a given condition
def evaluateInSet(attr: Attribute, hSet: Set[Any], update: Boolean): Option[BigDecimal]

Returns a percentage of rows meeting "IN" operator expression.
Returns a percentage of rows meeting "IN" operator expression. This method evaluates the equality predicate for all data types.
attr
an Attribute (or a column)
hSet
a set of literal values
update
a boolean flag to specify if we need to update ColumnStat of a given column for subsequent conditions
returns
an optional double value to show the percentage of rows meeting a given condition It returns None if no statistics exists for a given column.
def evaluateLiteral(literal: Literal): Option[BigDecimal]

Returns a percentage of rows meeting a Literal expression.
Returns a percentage of rows meeting a Literal expression. This method evaluates all the possible literal cases in Filter.
FalseLiteral and TrueLiteral should be eliminated by optimizer, but null literal might be added by optimizer rule NullPropagation. For safety, we handle all the cases here.
literal
a literal value (or constant)
returns
an optional double value to show the percentage of rows meeting a given condition
def evaluateNullCheck(attr: Attribute, isNull: Boolean, update: Boolean): Option[BigDecimal]

Returns a percentage of rows meeting "IS NULL" or "IS NOT NULL" condition.
Returns a percentage of rows meeting "IS NULL" or "IS NOT NULL" condition.
attr
an Attribute (or a column)
isNull
set to true for "IS NULL" condition. set to false for "IS NOT NULL" condition
update
a boolean flag to specify if we need to update ColumnStat of a given column for subsequent conditions
returns
an optional double value to show the percentage of rows meeting a given condition It returns None if no statistics collected for a given column.
def finalize(): Unit

Attributes
protected[java.lang]
Definition Classes
AnyRef
Annotations
@throws( classOf[java.lang.Throwable] )
final def getClass(): Class[_]

Definition Classes
AnyRef → Any
def initializeLogIfNecessary(isInterpreter: Boolean): Unit

Attributes
protected
Definition Classes
Logging
final def isInstanceOf[T0]: Boolean

Definition Classes
Any
def isTraceEnabled(): Boolean

Attributes
protected
Definition Classes
Logging
def log: Logger

Attributes
protected
Definition Classes
Logging
def logDebug(msg: ⇒ String, throwable: Throwable): Unit

Attributes
protected
Definition Classes
Logging
def logDebug(msg: ⇒ String): Unit

Attributes
protected
Definition Classes
Logging
def logError(msg: ⇒ String, throwable: Throwable): Unit

Attributes
protected
Definition Classes
Logging
def logError(msg: ⇒ String): Unit

Attributes
protected
Definition Classes
Logging
def logInfo(msg: ⇒ String, throwable: Throwable): Unit

Attributes
protected
Definition Classes
Logging
def logInfo(msg: ⇒ String): Unit

Attributes
protected
Definition Classes
Logging
def logName: String

Attributes
protected
Definition Classes
Logging
def logTrace(msg: ⇒ String, throwable: Throwable): Unit

Attributes
protected
Definition Classes
Logging
def logTrace(msg: ⇒ String): Unit

Attributes
protected
Definition Classes
Logging
def logWarning(msg: ⇒ String, throwable: Throwable): Unit

Attributes
protected
Definition Classes
Logging
def logWarning(msg: ⇒ String): Unit

Attributes
protected
Definition Classes
Logging
final def ne(arg0: AnyRef): Boolean

Definition Classes
AnyRef
final def notify(): Unit

Definition Classes
AnyRef
final def notifyAll(): Unit

Definition Classes
AnyRef
val plan: Filter
final def synchronized[T0](arg0: ⇒ T0): T0

Definition Classes
AnyRef
final def wait(): Unit

Definition Classes
AnyRef
Annotations
@throws( ... )
final def wait(arg0: Long, arg1: Int): Unit

Definition Classes
AnyRef
Annotations
@throws( ... )
final def wait(arg0: Long): Unit

Definition Classes
AnyRef
Annotations
@throws( ... )

Related Doc: package statsEstimation

case class FilterEstimation(plan: Filter, catalystConf: SQLConf) extends Logging with Product with Serializable

Instance Constructors

new FilterEstimation(plan: Filter, catalystConf: SQLConf)

Value Members

final def !=(arg0: Any): Boolean

final def ##(): Int

final def ==(arg0: Any): Boolean

final def asInstanceOf[T0]: T0

def calculateFilterSelectivity(condition: Expression, update: Boolean = true): Option[BigDecimal]

def calculateSingleCondition(condition: Expression, update: Boolean): Option[BigDecimal]

val catalystConf: SQLConf

def clone(): AnyRef

final def eq(arg0: AnyRef): Boolean

def estimate: Option[Statistics]

def evaluateBinary(op: BinaryComparison, attr: Attribute, literal: Literal, update: Boolean): Option[BigDecimal]

def evaluateBinaryForNumeric(op: BinaryComparison, attr: Attribute, literal: Literal, update: Boolean): Option[BigDecimal]

def evaluateBinaryForTwoColumns(op: BinaryComparison, attrLeft: Attribute, attrRight: Attribute, update: Boolean): Option[BigDecimal]

def evaluateEquality(attr: Attribute, literal: Literal, update: Boolean): Option[BigDecimal]

def evaluateInSet(attr: Attribute, hSet: Set[Any], update: Boolean): Option[BigDecimal]

def evaluateLiteral(literal: Literal): Option[BigDecimal]

def evaluateNullCheck(attr: Attribute, isNull: Boolean, update: Boolean): Option[BigDecimal]

def finalize(): Unit

final def getClass(): Class[_]

def initializeLogIfNecessary(isInterpreter: Boolean): Unit

final def isInstanceOf[T0]: Boolean

def isTraceEnabled(): Boolean

def log: Logger

def logDebug(msg: ⇒ String, throwable: Throwable): Unit

def logDebug(msg: ⇒ String): Unit

def logError(msg: ⇒ String, throwable: Throwable): Unit

def logError(msg: ⇒ String): Unit

def logInfo(msg: ⇒ String, throwable: Throwable): Unit

def logInfo(msg: ⇒ String): Unit

def logName: String

def logTrace(msg: ⇒ String, throwable: Throwable): Unit

def logTrace(msg: ⇒ String): Unit

def logWarning(msg: ⇒ String, throwable: Throwable): Unit

def logWarning(msg: ⇒ String): Unit

final def ne(arg0: AnyRef): Boolean

final def notify(): Unit

final def notifyAll(): Unit

val plan: Filter

final def synchronized[T0](arg0: ⇒ T0): T0

final def wait(): Unit

final def wait(arg0: Long, arg1: Int): Unit

final def wait(arg0: Long): Unit

Inherited from Serializable

Inherited from Serializable

Inherited from Product

Inherited from Equals

Inherited from Logging

Inherited from AnyRef

Inherited from Any

Ungrouped