StarSchemaDetection

Instance Constructors

new StarSchemaDetection(conf: SQLConf)

Value Members

final def !=(arg0: Any): Boolean

Definition Classes
AnyRef → Any
final def ##(): Int

Definition Classes
AnyRef → Any
final def ==(arg0: Any): Boolean

Definition Classes
AnyRef → Any
final def asInstanceOf[T0]: T0

Definition Classes
Any
def canEvaluate(expr: Expression, plan: LogicalPlan): Boolean

Returns true if expr can be evaluated using only the output of plan.
Returns true if expr can be evaluated using only the output of plan. This method can be used to determine when it is acceptable to move expression evaluation within a query plan.
For example consider a join between two relations R(a, b) and S(c, d).
- canEvaluate(EqualTo(a,b), R) returns true - canEvaluate(EqualTo(a,c), R) returns false - canEvaluate(Literal(1), R) returns true as literals CAN be evaluated on any plan

Attributes
protected
Definition Classes
PredicateHelper
def canEvaluateWithinJoin(expr: Expression): Boolean

Returns true iff expr could be evaluated as a condition within join.
Returns true iff expr could be evaluated as a condition within join.

Attributes
protected
Definition Classes
PredicateHelper
def clone(): AnyRef

Attributes
protected[java.lang]
Definition Classes
AnyRef
Annotations
@throws( ... )
val conf: SQLConf
final def eq(arg0: AnyRef): Boolean

Definition Classes
AnyRef
def finalize(): Unit

Attributes
protected[java.lang]
Definition Classes
AnyRef
Annotations
@throws( classOf[java.lang.Throwable] )
def findStarJoins(input: Seq[LogicalPlan], conditions: Seq[Expression]): Seq[LogicalPlan]

Star schema consists of one or more fact tables referencing a number of dimension tables.
Star schema consists of one or more fact tables referencing a number of dimension tables. In general, star-schema joins are detected using the following conditions:
1. Informational RI constraints (reliable detection) + Dimension contains a primary key that is being joined to the fact table. + Fact table contains foreign keys referencing multiple dimension tables. 2. Cardinality based heuristics + Usually, the table with the highest cardinality is the fact table. + Table being joined with the most number of tables is the fact table.
To detect star joins, the algorithm uses a combination of the above two conditions. The fact table is chosen based on the cardinality heuristics, and the dimension tables are chosen based on the RI constraints. A star join will consist of the largest fact table joined with the dimension tables on their primary keys. To detect that a column is a primary key, the algorithm uses table and column statistics.
The algorithm currently returns only the star join with the largest fact table. Choosing the largest fact table on the driving arm to avoid large inners is in general a good heuristic. This restriction will be lifted to observe multiple star joins.
The highlights of the algorithm are the following:
Given a set of joined tables/plans, the algorithm first verifies if they are eligible for star join detection. An eligible plan is a base table access with valid statistics. A base table access represents Project or Filter operators above a LeafNode. Conservatively, the algorithm only considers base table access as part of a star join since they provide reliable statistics. This restriction can be lifted with the CBO enablement by default.
If some of the plans are not base table access, or statistics are not available, the algorithm returns an empty star join plan since, in the absence of statistics, it cannot make good planning decisions. Otherwise, the algorithm finds the table with the largest cardinality (number of rows), which is assumed to be a fact table.
Next, it computes the set of dimension tables for the current fact table. A dimension table is assumed to be in a RI relationship with a fact table. To infer column uniqueness, the algorithm compares the number of distinct values with the total number of rows in the table. If their relative difference is within certain limits (i.e. ndvMaxError * 2, adjusted based on 1TB TPC-DS data), the column is assumed to be unique.
final def getClass(): Class[_]

Definition Classes
AnyRef → Any
final def isInstanceOf[T0]: Boolean

Definition Classes
Any
final def ne(arg0: AnyRef): Boolean

Definition Classes
AnyRef
final def notify(): Unit

Definition Classes
AnyRef
final def notifyAll(): Unit

Definition Classes
AnyRef
def reorderStarJoins(input: Seq[(LogicalPlan, InnerLike)], conditions: Seq[Expression]): Seq[(LogicalPlan, InnerLike)]

Reorders a star join based on heuristics.
Reorders a star join based on heuristics. It is called from ReorderJoin if CBO is disabled. 1) Finds the star join with the largest fact table. 2) Places the fact table the driving arm of the left-deep tree. This plan avoids large table access on the inner, and thus favor hash joins. 3) Applies the most selective dimensions early in the plan to reduce the amount of data flow.
def replaceAlias(condition: Expression, aliases: AttributeMap[Expression]): Expression

Attributes
protected
Definition Classes
PredicateHelper
def splitConjunctivePredicates(condition: Expression): Seq[Expression]

Attributes
protected
Definition Classes
PredicateHelper
def splitDisjunctivePredicates(condition: Expression): Seq[Expression]

Attributes
protected
Definition Classes
PredicateHelper
final def synchronized[T0](arg0: ⇒ T0): T0

Definition Classes
AnyRef
final def wait(): Unit

Definition Classes
AnyRef
Annotations
@throws( ... )
final def wait(arg0: Long, arg1: Int): Unit

Definition Classes
AnyRef
Annotations
@throws( ... )
final def wait(arg0: Long): Unit

Definition Classes
AnyRef
Annotations
@throws( ... )

Related Doc: package optimizer

case class StarSchemaDetection(conf: SQLConf) extends PredicateHelper with Product with Serializable

Instance Constructors

new StarSchemaDetection(conf: SQLConf)

Value Members

final def !=(arg0: Any): Boolean

final def ##(): Int

final def ==(arg0: Any): Boolean

final def asInstanceOf[T0]: T0

def canEvaluate(expr: Expression, plan: LogicalPlan): Boolean

def canEvaluateWithinJoin(expr: Expression): Boolean

def clone(): AnyRef

val conf: SQLConf

final def eq(arg0: AnyRef): Boolean

def finalize(): Unit

def findStarJoins(input: Seq[LogicalPlan], conditions: Seq[Expression]): Seq[LogicalPlan]

final def getClass(): Class[_]

final def isInstanceOf[T0]: Boolean

final def ne(arg0: AnyRef): Boolean

final def notify(): Unit

final def notifyAll(): Unit

def reorderStarJoins(input: Seq[(LogicalPlan, InnerLike)], conditions: Seq[Expression]): Seq[(LogicalPlan, InnerLike)]

def replaceAlias(condition: Expression, aliases: AttributeMap[Expression]): Expression

def splitConjunctivePredicates(condition: Expression): Seq[Expression]

def splitDisjunctivePredicates(condition: Expression): Seq[Expression]

final def synchronized[T0](arg0: ⇒ T0): T0

final def wait(): Unit

final def wait(arg0: Long, arg1: Int): Unit

final def wait(arg0: Long): Unit

Inherited from Serializable

Inherited from Serializable

Inherited from Product

Inherited from Equals

Inherited from PredicateHelper

Inherited from AnyRef

Inherited from Any

Ungrouped