trait BlockingOperatorWithCodegen extends SparkPlan with CodegenSupport
A special kind of operators which support whole stage codegen. Blocking means these operators will consume all the inputs first, before producing output. Typical blocking operators are sort and aggregate.
- Alphabetic
- By Inheritance
- BlockingOperatorWithCodegen
- CodegenSupport
- SparkPlan
- Serializable
- Serializable
- Logging
- QueryPlan
- TreeNode
- Product
- Equals
- AnyRef
- Any
- Hide All
- Show All
- Public
- All
Abstract Value Members
-
abstract
def
canEqual(that: Any): Boolean
- Definition Classes
- Equals
-
abstract
def
children: Seq[SparkPlan]
- Definition Classes
- TreeNode
-
abstract
def
inputRDDs(): Seq[RDD[InternalRow]]
Returns all the RDDs of InternalRow which generates the input rows.
Returns all the RDDs of InternalRow which generates the input rows.
- Definition Classes
- CodegenSupport
- Note
Right now we support up to two RDDs
-
abstract
def
output: Seq[Attribute]
- Definition Classes
- QueryPlan
-
abstract
def
productArity: Int
- Definition Classes
- Product
-
abstract
def
productElement(n: Int): Any
- Definition Classes
- Product
Concrete Value Members
-
lazy val
allAttributes: AttributeSeq
- Definition Classes
- QueryPlan
-
def
apply(number: Int): TreeNode[_]
- Definition Classes
- TreeNode
-
def
argString(maxFields: Int): String
- Definition Classes
- TreeNode
-
def
asCode: String
- Definition Classes
- TreeNode
-
final
lazy val
canonicalized: SparkPlan
- Definition Classes
- QueryPlan
- Annotations
- @transient()
-
def
clone(): SparkPlan
- Definition Classes
- TreeNode → AnyRef
-
def
collect[B](pf: PartialFunction[SparkPlan, B]): Seq[B]
- Definition Classes
- TreeNode
-
def
collectFirst[B](pf: PartialFunction[SparkPlan, B]): Option[B]
- Definition Classes
- TreeNode
-
def
collectLeaves(): Seq[SparkPlan]
- Definition Classes
- TreeNode
-
def
collectWithSubqueries[B](f: PartialFunction[SparkPlan, B]): Seq[B]
- Definition Classes
- QueryPlan
-
def
conf: SQLConf
- Definition Classes
- QueryPlan
-
final
def
consume(ctx: CodegenContext, outputVars: Seq[ExprCode], row: String = null): String
Consume the generated columns or row from current SparkPlan, call its parent's
doConsume()
.Consume the generated columns or row from current SparkPlan, call its parent's
doConsume()
.Note that
outputVars
androw
can't both be null.- Definition Classes
- CodegenSupport
-
lazy val
containsChild: Set[TreeNode[_]]
- Definition Classes
- TreeNode
-
def
doConsume(ctx: CodegenContext, input: Seq[ExprCode], row: ExprCode): String
Generate the Java source code to process the rows from child SparkPlan.
Generate the Java source code to process the rows from child SparkPlan. This should only be called from
consume
.This should be override by subclass to support codegen.
Note: The operator should not assume the existence of an outer processing loop, which it can jump from with "continue;"!
For example, filter could generate this: # code to evaluate the predicate expression, result is isNull1 and value2 if (!isNull1 && value2) { # call consume(), which will call parent.doConsume() }
Note: A plan can either consume the rows as UnsafeRow (row), or a list of variables (input). When consuming as a listing of variables, the code to produce the input is already generated and
CodegenContext.currentVars
is already set. When consuming as UnsafeRow, implementations need to putrow.code
in the generated code and setCodegenContext.INPUT_ROW
manually. Some plans may need more tweaks as they have different inputs(join build side, aggregate buffer, etc.), or other special cases.- Definition Classes
- CodegenSupport
-
final
def
execute(): RDD[InternalRow]
Returns the result of this query as an RDD[InternalRow] by delegating to
doExecute
after preparations.Returns the result of this query as an RDD[InternalRow] by delegating to
doExecute
after preparations.Concrete implementations of SparkPlan should override
doExecute
.- Definition Classes
- SparkPlan
-
final
def
executeBroadcast[T](): Broadcast[T]
Returns the result of this query as a broadcast variable by delegating to
doExecuteBroadcast
after preparations.Returns the result of this query as a broadcast variable by delegating to
doExecuteBroadcast
after preparations.Concrete implementations of SparkPlan should override
doExecuteBroadcast
.- Definition Classes
- SparkPlan
-
def
executeCollect(): Array[InternalRow]
Runs this query returning the result as an array.
Runs this query returning the result as an array.
- Definition Classes
- SparkPlan
-
def
executeCollectPublic(): Array[Row]
Runs this query returning the result as an array, using external Row format.
Runs this query returning the result as an array, using external Row format.
- Definition Classes
- SparkPlan
-
final
def
executeColumnar(): RDD[ColumnarBatch]
Returns the result of this query as an RDD[ColumnarBatch] by delegating to
doColumnarExecute
after preparations.Returns the result of this query as an RDD[ColumnarBatch] by delegating to
doColumnarExecute
after preparations.Concrete implementations of SparkPlan should override
doColumnarExecute
ifsupportsColumnar
returns true.- Definition Classes
- SparkPlan
-
def
executeTail(n: Int): Array[InternalRow]
Runs this query returning the last
n
rows as an array.Runs this query returning the last
n
rows as an array.This is modeled after
RDD.take
but never runs any job locally on the driver.- Definition Classes
- SparkPlan
-
def
executeTake(n: Int): Array[InternalRow]
Runs this query returning the first
n
rows as an array.Runs this query returning the first
n
rows as an array.This is modeled after
RDD.take
but never runs any job locally on the driver.- Definition Classes
- SparkPlan
-
def
executeToIterator(): Iterator[InternalRow]
Runs this query returning the result as an iterator of InternalRow.
Runs this query returning the result as an iterator of InternalRow.
- Definition Classes
- SparkPlan
- Note
Triggers multiple jobs (one for each partition).
-
final
def
expressions: Seq[Expression]
- Definition Classes
- QueryPlan
-
def
fastEquals(other: TreeNode[_]): Boolean
- Definition Classes
- TreeNode
-
def
find(f: (SparkPlan) ⇒ Boolean): Option[SparkPlan]
- Definition Classes
- TreeNode
-
def
flatMap[A](f: (SparkPlan) ⇒ TraversableOnce[A]): Seq[A]
- Definition Classes
- TreeNode
-
def
foreach(f: (SparkPlan) ⇒ Unit): Unit
- Definition Classes
- TreeNode
-
def
foreachUp(f: (SparkPlan) ⇒ Unit): Unit
- Definition Classes
- TreeNode
-
def
generateTreeString(depth: Int, lastChildren: Seq[Boolean], append: (String) ⇒ Unit, verbose: Boolean, prefix: String, addSuffix: Boolean, maxFields: Int, printNodeId: Boolean): Unit
- Definition Classes
- TreeNode
-
def
getTagValue[T](tag: TreeNodeTag[T]): Option[T]
- Definition Classes
- TreeNode
-
def
hashCode(): Int
- Definition Classes
- TreeNode → AnyRef → Any
-
val
id: Int
- Definition Classes
- SparkPlan
-
def
innerChildren: Seq[QueryPlan[_]]
- Definition Classes
- QueryPlan → TreeNode
-
def
inputSet: AttributeSet
- Definition Classes
- QueryPlan
-
def
limitNotReachedChecks: Seq[String]
A sequence of checks which evaluate to true if the downstream Limit operators have not received enough records and reached the limit.
A sequence of checks which evaluate to true if the downstream Limit operators have not received enough records and reached the limit. If current node is a data producing node, it can leverage this information to stop producing data and complete the data flow earlier. Common data producing nodes are leaf nodes like Range and Scan, and blocking nodes like Sort and Aggregate. These checks should be put into the loop condition of the data producing loop.
- Definition Classes
- BlockingOperatorWithCodegen → CodegenSupport
-
final
def
limitNotReachedCond: String
A helper method to generate the data producing loop condition according to the limit-not-reached checks.
A helper method to generate the data producing loop condition according to the limit-not-reached checks.
- Definition Classes
- CodegenSupport
-
def
logicalLink: Option[LogicalPlan]
- returns
The logical plan this plan is linked to.
- Definition Classes
- SparkPlan
-
def
longMetric(name: String): SQLMetric
- returns
SQLMetric for the
name
.
- Definition Classes
- SparkPlan
-
def
makeCopy(newArgs: Array[AnyRef]): SparkPlan
Overridden make copy also propagates sqlContext to copied plan.
Overridden make copy also propagates sqlContext to copied plan.
- Definition Classes
- SparkPlan → TreeNode
-
def
map[A](f: (SparkPlan) ⇒ A): Seq[A]
- Definition Classes
- TreeNode
-
def
mapChildren(f: (SparkPlan) ⇒ SparkPlan): SparkPlan
- Definition Classes
- TreeNode
-
def
mapExpressions(f: (Expression) ⇒ Expression): BlockingOperatorWithCodegen.this.type
- Definition Classes
- QueryPlan
-
def
metricTerm(ctx: CodegenContext, name: String): String
Creates a metric using the specified name.
Creates a metric using the specified name.
- returns
name of the variable representing the metric
- Definition Classes
- CodegenSupport
-
def
metrics: Map[String, SQLMetric]
- returns
All metrics containing metrics of this SparkPlan.
- Definition Classes
- SparkPlan
-
final
def
missingInput: AttributeSet
- Definition Classes
- QueryPlan
-
def
needCopyResult: Boolean
Whether or not the result rows of this operator should be copied before putting into a buffer.
Whether or not the result rows of this operator should be copied before putting into a buffer.
If any operator inside WholeStageCodegen generate multiple rows from a single row (for example, Join), this should be true.
If an operator starts a new pipeline, this should be false.
- Definition Classes
- BlockingOperatorWithCodegen → CodegenSupport
-
def
needStopCheck: Boolean
Whether or not the children of this operator should generate a stop check when consuming input rows.
Whether or not the children of this operator should generate a stop check when consuming input rows. This is used to suppress shouldStop() in a loop of WholeStageCodegen.
This should be false if an operator starts a new pipeline, which means it consumes all rows produced by children but doesn't output row to buffer by calling append(), so the children don't require shouldStop() in the loop of producing rows.
- Definition Classes
- BlockingOperatorWithCodegen → CodegenSupport
-
def
nodeName: String
- Definition Classes
- TreeNode
-
def
numberedTreeString: String
- Definition Classes
- TreeNode
-
val
origin: Origin
- Definition Classes
- TreeNode
-
def
outputOrdering: Seq[SortOrder]
Specifies how data is ordered in each partition.
Specifies how data is ordered in each partition.
- Definition Classes
- SparkPlan
-
def
outputPartitioning: Partitioning
Specifies how data is partitioned across different nodes in the cluster.
Specifies how data is partitioned across different nodes in the cluster.
- Definition Classes
- SparkPlan
-
lazy val
outputSet: AttributeSet
- Definition Classes
- QueryPlan
- Annotations
- @transient()
-
def
p(number: Int): SparkPlan
- Definition Classes
- TreeNode
-
final
def
prepare(): Unit
Prepares this SparkPlan for execution.
Prepares this SparkPlan for execution. It's idempotent.
- Definition Classes
- SparkPlan
-
def
prettyJson: String
- Definition Classes
- TreeNode
-
def
printSchema(): Unit
- Definition Classes
- QueryPlan
-
final
def
produce(ctx: CodegenContext, parent: CodegenSupport): String
Returns Java source code to process the rows from input RDD.
Returns Java source code to process the rows from input RDD.
- Definition Classes
- CodegenSupport
-
def
producedAttributes: AttributeSet
- Definition Classes
- QueryPlan
-
def
productIterator: Iterator[Any]
- Definition Classes
- Product
-
def
productPrefix: String
- Definition Classes
- Product
-
lazy val
references: AttributeSet
- Definition Classes
- QueryPlan
- Annotations
- @transient()
-
def
requiredChildDistribution: Seq[Distribution]
Specifies the data distribution requirements of all the children for this operator.
Specifies the data distribution requirements of all the children for this operator. By default it's UnspecifiedDistribution for each child, which means each child can have any distribution.
If an operator overwrites this method, and specifies distribution requirements(excluding UnspecifiedDistribution and BroadcastDistribution) for more than one child, Spark guarantees that the outputs of these children will have same number of partitions, so that the operator can safely zip partitions of these children's result RDDs. Some operators can leverage this guarantee to satisfy some interesting requirement, e.g., non-broadcast joins can specify HashClusteredDistribution(a,b) for its left child, and specify HashClusteredDistribution(c,d) for its right child, then it's guaranteed that left and right child are co-partitioned by a,b/c,d, which means tuples of same value are in the partitions of same index, e.g., (a=1,b=2) and (c=1,d=2) are both in the second partition of left and right child.
- Definition Classes
- SparkPlan
-
def
requiredChildOrdering: Seq[Seq[SortOrder]]
Specifies sort order for each partition requirements on the input data for this operator.
Specifies sort order for each partition requirements on the input data for this operator.
- Definition Classes
- SparkPlan
-
def
resetMetrics(): Unit
Resets all the metrics.
Resets all the metrics.
- Definition Classes
- SparkPlan
-
final
def
sameResult(other: SparkPlan): Boolean
- Definition Classes
- QueryPlan
-
lazy val
schema: StructType
- Definition Classes
- QueryPlan
-
def
schemaString: String
- Definition Classes
- QueryPlan
-
final
def
semanticHash(): Int
- Definition Classes
- QueryPlan
-
def
setLogicalLink(logicalPlan: LogicalPlan): Unit
Set logical plan link recursively if unset.
Set logical plan link recursively if unset.
- Definition Classes
- SparkPlan
-
def
setTagValue[T](tag: TreeNodeTag[T], value: T): Unit
- Definition Classes
- TreeNode
-
def
shouldStopCheckCode: String
Helper default should stop check code.
Helper default should stop check code.
- Definition Classes
- CodegenSupport
-
def
simpleString(maxFields: Int): String
- Definition Classes
- QueryPlan → TreeNode
-
def
simpleStringWithNodeId(): String
- Definition Classes
- QueryPlan → TreeNode
-
final
val
sqlContext: SQLContext
A handle to the SQL Context that was used to create this plan.
A handle to the SQL Context that was used to create this plan. Since many operators need access to the sqlContext for RDD operations or configuration this field is automatically populated by the query planning infrastructure.
- Definition Classes
- SparkPlan
-
def
subqueries: Seq[SparkPlan]
- Definition Classes
- QueryPlan
-
def
subqueriesAll: Seq[SparkPlan]
- Definition Classes
- QueryPlan
-
def
supportCodegen: Boolean
Whether this SparkPlan supports whole stage codegen or not.
Whether this SparkPlan supports whole stage codegen or not.
- Definition Classes
- CodegenSupport
-
def
supportsColumnar: Boolean
Return true if this stage of the plan supports columnar execution.
Return true if this stage of the plan supports columnar execution.
- Definition Classes
- SparkPlan
-
def
toJSON: String
- Definition Classes
- TreeNode
-
def
toString(): String
- Definition Classes
- TreeNode → AnyRef → Any
-
def
transform(rule: PartialFunction[SparkPlan, SparkPlan]): SparkPlan
- Definition Classes
- TreeNode
-
def
transformAllExpressions(rule: PartialFunction[Expression, Expression]): BlockingOperatorWithCodegen.this.type
- Definition Classes
- QueryPlan
-
def
transformDown(rule: PartialFunction[SparkPlan, SparkPlan]): SparkPlan
- Definition Classes
- TreeNode
-
def
transformExpressions(rule: PartialFunction[Expression, Expression]): BlockingOperatorWithCodegen.this.type
- Definition Classes
- QueryPlan
-
def
transformExpressionsDown(rule: PartialFunction[Expression, Expression]): BlockingOperatorWithCodegen.this.type
- Definition Classes
- QueryPlan
-
def
transformExpressionsUp(rule: PartialFunction[Expression, Expression]): BlockingOperatorWithCodegen.this.type
- Definition Classes
- QueryPlan
-
def
transformUp(rule: PartialFunction[SparkPlan, SparkPlan]): SparkPlan
- Definition Classes
- TreeNode
-
def
treeString(append: (String) ⇒ Unit, verbose: Boolean, addSuffix: Boolean, maxFields: Int, printOperatorId: Boolean): Unit
- Definition Classes
- TreeNode
-
final
def
treeString(verbose: Boolean, addSuffix: Boolean, maxFields: Int, printOperatorId: Boolean): String
- Definition Classes
- TreeNode
-
final
def
treeString: String
- Definition Classes
- TreeNode
-
def
unsetTagValue[T](tag: TreeNodeTag[T]): Unit
- Definition Classes
- TreeNode
-
def
usedInputs: AttributeSet
The subset of inputSet those should be evaluated before this plan.
The subset of inputSet those should be evaluated before this plan.
We will use this to insert some code to access those columns that are actually used by current plan before calling doConsume().
- Definition Classes
- CodegenSupport
-
def
vectorTypes: Option[Seq[String]]
The exact java types of the columns that are output in columnar processing mode.
The exact java types of the columns that are output in columnar processing mode. This is a performance optimization for code generation and is optional.
- Definition Classes
- SparkPlan
-
def
verboseString(maxFields: Int): String
- Definition Classes
- QueryPlan → TreeNode
-
def
verboseStringWithOperatorId(): String
- Definition Classes
- QueryPlan
-
def
verboseStringWithSuffix(maxFields: Int): String
- Definition Classes
- TreeNode
-
def
withNewChildren(newChildren: Seq[SparkPlan]): SparkPlan
- Definition Classes
- TreeNode