Spark Project SQL 3.0.1 API - org.apache.spark.sql.execution.SampleExec

final def !=(arg0: Any): Boolean

Definition Classes: AnyRef → Any

final def ##(): Int

Definition Classes: AnyRef → Any

final def ==(arg0: Any): Boolean

Definition Classes: AnyRef → Any

lazy val allAttributes: AttributeSeq

Definition Classes: QueryPlan

def apply(number: Int): TreeNode[_]

Definition Classes: TreeNode

def argString(maxFields: Int): String

Definition Classes: TreeNode

def asCode: String

Definition Classes: TreeNode

final def asInstanceOf[T0]: T0

Definition Classes: Any

def canCheckLimitNotReached: Boolean

Check if the node is supposed to produce limit not reached checks.

Attributes: protected
Definition Classes: CodegenSupport

final lazy val canonicalized: SparkPlan

Definition Classes: QueryPlan
Annotations: @transient()

val child: SparkPlan

Definition Classes: SampleExec → UnaryExecNode

final def children: Seq[SparkPlan]

Definition Classes: UnaryExecNode → TreeNode

def cleanupResources(): Unit

Cleans up the resources used by the physical operator (if any).

Cleans up the resources used by the physical operator (if any). In general, all the resources should be cleaned up when the task finishes but operators like SortMergeJoinExec and LimitExec may want eager cleanup to free up tight resources (e.g., memory).

Attributes: protected[sql]
Definition Classes: SparkPlan

def clone(): SparkPlan

Definition Classes: TreeNode → AnyRef

def collect[B](pf: PartialFunction[SparkPlan, B]): Seq[B]

Definition Classes: TreeNode

def collectFirst[B](pf: PartialFunction[SparkPlan, B]): Option[B]

Definition Classes: TreeNode

def collectLeaves(): Seq[SparkPlan]

Definition Classes: TreeNode

def collectWithSubqueries[B](f: PartialFunction[SparkPlan, B]): Seq[B]

Definition Classes: QueryPlan

def conf: SQLConf

Definition Classes: QueryPlan

final def consume(ctx: CodegenContext, outputVars: Seq[ExprCode], row: String = null): String

Consume the generated columns or row from current SparkPlan, call its parent's doConsume().

Note that outputVars and row can't both be null.

Definition Classes: CodegenSupport

lazy val containsChild: Set[TreeNode[_]]

Definition Classes: TreeNode

def copyTagsFrom(other: SparkPlan): Unit

Attributes: protected
Definition Classes: TreeNode

def doCanonicalize(): SparkPlan

Attributes: protected
Definition Classes: QueryPlan

def doConsume(ctx: CodegenContext, input: Seq[ExprCode], row: ExprCode): String

Generate the Java source code to process the rows from child SparkPlan.

Generate the Java source code to process the rows from child SparkPlan. This should only be called from consume.

This should be override by subclass to support codegen.

Note: The operator should not assume the existence of an outer processing loop, which it can jump from with "continue;"!

For example, filter could generate this: # code to evaluate the predicate expression, result is isNull1 and value2 if (!isNull1 && value2) { # call consume(), which will call parent.doConsume() }

Note: A plan can either consume the rows as UnsafeRow (row), or a list of variables (input). When consuming as a listing of variables, the code to produce the input is already generated and CodegenContext.currentVars is already set. When consuming as UnsafeRow, implementations need to put row.code in the generated code and set CodegenContext.INPUT_ROW manually. Some plans may need more tweaks as they have different inputs(join build side, aggregate buffer, etc.), or other special cases.

Definition Classes: SampleExec → CodegenSupport

def doExecute(): RDD[InternalRow]

Produces the result of the query as an RDD[InternalRow]

Overridden by concrete implementations of SparkPlan.

Attributes: protected
Definition Classes: SampleExec → SparkPlan

def doExecuteBroadcast[T](): Broadcast[T]

Produces the result of the query as a broadcast variable.

Overridden by concrete implementations of SparkPlan.

Attributes: protected[sql]
Definition Classes: SparkPlan

def doExecuteColumnar(): RDD[ColumnarBatch]

Produces the result of the query as an RDD[ColumnarBatch] if supportsColumnar returns true.

Produces the result of the query as an RDD[ColumnarBatch] if supportsColumnar returns true. By convention the executor that creates a ColumnarBatch is responsible for closing it when it is no longer needed. This allows input formats to be able to reuse batches if needed.

Attributes: protected
Definition Classes: SparkPlan

def doPrepare(): Unit

Overridden by concrete implementations of SparkPlan.

Overridden by concrete implementations of SparkPlan. It is guaranteed to run before any execute of SparkPlan. This is helpful if we want to set up some state before executing the query, e.g., BroadcastHashJoin uses it to broadcast asynchronously.

Attributes: protected
Definition Classes: SparkPlan
Note: prepare method has already walked down the tree, so the implementation doesn't have to call children's prepare methods. This will only be called once, protected by this.

def doProduce(ctx: CodegenContext): String

Generate the Java source code to process, should be overridden by subclass to support codegen.

doProduce() usually generate the framework, for example, aggregation could generate this:

if (!initialized) { # create a hash map, then build the aggregation hash map # call child.produce() initialized = true; } while (hashmap.hasNext()) { row = hashmap.next(); # build the aggregation results # create variables for results # call consume(), which will call parent.doConsume() if (shouldStop()) return; }

Attributes: protected
Definition Classes: SampleExec → CodegenSupport

final def eq(arg0: AnyRef): Boolean

Definition Classes: AnyRef

def evaluateNondeterministicVariables(attributes: Seq[Attribute], variables: Seq[ExprCode], expressions: Seq[NamedExpression]): String

Returns source code to evaluate the variables for non-deterministic expressions, and clear the code of evaluated variables, to prevent them to be evaluated twice.

Attributes: protected
Definition Classes: CodegenSupport

def evaluateRequiredVariables(attributes: Seq[Attribute], variables: Seq[ExprCode], required: AttributeSet): String

Returns source code to evaluate the variables for required attributes, and clear the code of evaluated variables, to prevent them to be evaluated twice.

Attributes: protected
Definition Classes: CodegenSupport

def evaluateVariables(variables: Seq[ExprCode]): String

Returns source code to evaluate all the variables, and clear the code of them, to prevent them to be evaluated twice.

Attributes: protected
Definition Classes: CodegenSupport

final def execute(): RDD[InternalRow]

Returns the result of this query as an RDD[InternalRow] by delegating to doExecute after preparations.

Concrete implementations of SparkPlan should override doExecute.

Definition Classes: SparkPlan

final def executeBroadcast[T](): Broadcast[T]

Returns the result of this query as a broadcast variable by delegating to doExecuteBroadcast after preparations.

Concrete implementations of SparkPlan should override doExecuteBroadcast.

Definition Classes: SparkPlan

def executeCollect(): Array[InternalRow]

Runs this query returning the result as an array.

Definition Classes: SparkPlan

def executeCollectPublic(): Array[Row]

Runs this query returning the result as an array, using external Row format.

Definition Classes: SparkPlan

final def executeColumnar(): RDD[ColumnarBatch]

Returns the result of this query as an RDD[ColumnarBatch] by delegating to doColumnarExecute after preparations.

Concrete implementations of SparkPlan should override doColumnarExecute if supportsColumnar returns true.

Definition Classes: SparkPlan

final def executeQuery[T](query: ⇒ T): T

Executes a query after preparing the query and adding query plan information to created RDDs for visualization.

Attributes: protected
Definition Classes: SparkPlan

def executeTail(n: Int): Array[InternalRow]

Runs this query returning the last n rows as an array.

This is modeled after RDD.take but never runs any job locally on the driver.

Definition Classes: SparkPlan

def executeTake(n: Int): Array[InternalRow]

Runs this query returning the first n rows as an array.

This is modeled after RDD.take but never runs any job locally on the driver.

Definition Classes: SparkPlan

def executeToIterator(): Iterator[InternalRow]

Runs this query returning the result as an iterator of InternalRow.

Definition Classes: SparkPlan
Note: Triggers multiple jobs (one for each partition).

final def expressions: Seq[Expression]

Definition Classes: QueryPlan

def fastEquals(other: TreeNode[_]): Boolean

Definition Classes: TreeNode

def finalize(): Unit

Attributes: protected[lang]
Definition Classes: AnyRef
Annotations: @throws( classOf[java.lang.Throwable] )

def find(f: (SparkPlan) ⇒ Boolean): Option[SparkPlan]

Definition Classes: TreeNode

def flatMap[A](f: (SparkPlan) ⇒ TraversableOnce[A]): Seq[A]

Definition Classes: TreeNode

def foreach(f: (SparkPlan) ⇒ Unit): Unit

Definition Classes: TreeNode

def foreachUp(f: (SparkPlan) ⇒ Unit): Unit

Definition Classes: TreeNode

def formattedNodeName: String

Attributes: protected
Definition Classes: QueryPlan

def generateTreeString(depth: Int, lastChildren: Seq[Boolean], append: (String) ⇒ Unit, verbose: Boolean, prefix: String, addSuffix: Boolean, maxFields: Int, printNodeId: Boolean): Unit

Definition Classes: TreeNode

final def getClass(): Class[_]

Definition Classes: AnyRef → Any
Annotations: @native()

def getTagValue[T](tag: TreeNodeTag[T]): Option[T]

Definition Classes: TreeNode

def hashCode(): Int

Definition Classes: TreeNode → AnyRef → Any

val id: Int

Definition Classes: SparkPlan

def initializeLogIfNecessary(isInterpreter: Boolean, silent: Boolean): Boolean

Attributes: protected
Definition Classes: Logging

def initializeLogIfNecessary(isInterpreter: Boolean): Unit

Attributes: protected
Definition Classes: Logging

def innerChildren: Seq[QueryPlan[_]]

Definition Classes: QueryPlan → TreeNode

def inputRDDs(): Seq[RDD[InternalRow]]

Returns all the RDDs of InternalRow which generates the input rows.

Definition Classes: SampleExec → CodegenSupport
Note: Right now we support up to two RDDs

def inputSet: AttributeSet

Definition Classes: QueryPlan

def isCanonicalizedPlan: Boolean

Attributes: protected
Definition Classes: QueryPlan

final def isInstanceOf[T0]: Boolean

Definition Classes: Any

def isTraceEnabled(): Boolean

Attributes: protected
Definition Classes: Logging

def jsonFields: List[JField]

Attributes: protected
Definition Classes: TreeNode

def limitNotReachedChecks: Seq[String]

A sequence of checks which evaluate to true if the downstream Limit operators have not received enough records and reached the limit.

A sequence of checks which evaluate to true if the downstream Limit operators have not received enough records and reached the limit. If current node is a data producing node, it can leverage this information to stop producing data and complete the data flow earlier. Common data producing nodes are leaf nodes like Range and Scan, and blocking nodes like Sort and Aggregate. These checks should be put into the loop condition of the data producing loop.

Definition Classes: CodegenSupport

final def limitNotReachedCond: String

A helper method to generate the data producing loop condition according to the limit-not-reached checks.

Definition Classes: CodegenSupport

def log: Logger

Attributes: protected
Definition Classes: Logging

def logDebug(msg: ⇒ String, throwable: Throwable): Unit

Attributes: protected
Definition Classes: Logging

def logDebug(msg: ⇒ String): Unit

Attributes: protected
Definition Classes: Logging

def logError(msg: ⇒ String, throwable: Throwable): Unit

Attributes: protected
Definition Classes: Logging

def logError(msg: ⇒ String): Unit

Attributes: protected
Definition Classes: Logging

def logInfo(msg: ⇒ String, throwable: Throwable): Unit

Attributes: protected
Definition Classes: Logging

def logInfo(msg: ⇒ String): Unit

Attributes: protected
Definition Classes: Logging

def logName: String

Attributes: protected
Definition Classes: Logging

def logTrace(msg: ⇒ String, throwable: Throwable): Unit

Attributes: protected
Definition Classes: Logging

def logTrace(msg: ⇒ String): Unit

Attributes: protected
Definition Classes: Logging

def logWarning(msg: ⇒ String, throwable: Throwable): Unit

Attributes: protected
Definition Classes: Logging

def logWarning(msg: ⇒ String): Unit

Attributes: protected
Definition Classes: Logging

def logicalLink: Option[LogicalPlan]

returns: The logical plan this plan is linked to.

Definition Classes: SparkPlan

def longMetric(name: String): SQLMetric

returns: SQLMetric for the name.

Definition Classes: SparkPlan

val lowerBound: Double

def makeCopy(newArgs: Array[AnyRef]): SparkPlan

Overridden make copy also propagates sqlContext to copied plan.

Definition Classes: SparkPlan → TreeNode

def map[A](f: (SparkPlan) ⇒ A): Seq[A]

Definition Classes: TreeNode

def mapChildren(f: (SparkPlan) ⇒ SparkPlan): SparkPlan

Definition Classes: TreeNode

def mapExpressions(f: (Expression) ⇒ Expression): SampleExec.this.type

Definition Classes: QueryPlan

def mapProductIterator[B](f: (Any) ⇒ B)(implicit arg0: ClassTag[B]): Array[B]

Attributes: protected
Definition Classes: TreeNode

def metricTerm(ctx: CodegenContext, name: String): String

Creates a metric using the specified name.

returns: name of the variable representing the metric

Definition Classes: CodegenSupport

lazy val metrics: Map[String, SQLMetric]

returns: All metrics containing metrics of this SparkPlan.

Definition Classes: SampleExec → SparkPlan

final def missingInput: AttributeSet

Definition Classes: QueryPlan

final def ne(arg0: AnyRef): Boolean

Definition Classes: AnyRef

def needCopyResult: Boolean

Whether or not the result rows of this operator should be copied before putting into a buffer.

If any operator inside WholeStageCodegen generate multiple rows from a single row (for example, Join), this should be true.

If an operator starts a new pipeline, this should be false.

Definition Classes: SampleExec → CodegenSupport

def needStopCheck: Boolean

Whether or not the children of this operator should generate a stop check when consuming input rows.

Whether or not the children of this operator should generate a stop check when consuming input rows. This is used to suppress shouldStop() in a loop of WholeStageCodegen.

This should be false if an operator starts a new pipeline, which means it consumes all rows produced by children but doesn't output row to buffer by calling append(), so the children don't require shouldStop() in the loop of producing rows.

Definition Classes: CodegenSupport

def nodeName: String

Definition Classes: TreeNode

final def notify(): Unit

Definition Classes: AnyRef
Annotations: @native()

final def notifyAll(): Unit

Definition Classes: AnyRef
Annotations: @native()

def numberedTreeString: String

Definition Classes: TreeNode

val origin: Origin

Definition Classes: TreeNode

def otherCopyArgs: Seq[AnyRef]

Attributes: protected
Definition Classes: TreeNode

def output: Seq[Attribute]

Definition Classes: SampleExec → QueryPlan

def outputOrdering: Seq[SortOrder]

Specifies how data is ordered in each partition.

Definition Classes: SparkPlan

def outputPartitioning: Partitioning

Specifies how data is partitioned across different nodes in the cluster.

Definition Classes: SampleExec → SparkPlan

lazy val outputSet: AttributeSet

Definition Classes: QueryPlan
Annotations: @transient()

def p(number: Int): SparkPlan

Definition Classes: TreeNode

val parent: CodegenSupport

Which SparkPlan is calling produce() of this one.

Which SparkPlan is calling produce() of this one. It's itself for the first SparkPlan.

Attributes: protected
Definition Classes: CodegenSupport

final def prepare(): Unit

Prepares this SparkPlan for execution.

Prepares this SparkPlan for execution. It's idempotent.

Definition Classes: SparkPlan

def prepareSubqueries(): Unit

Finds scalar subquery expressions in this plan node and starts evaluating them.

Attributes: protected
Definition Classes: SparkPlan

def prettyJson: String

Definition Classes: TreeNode

def printSchema(): Unit

Definition Classes: QueryPlan

final def produce(ctx: CodegenContext, parent: CodegenSupport): String

Returns Java source code to process the rows from input RDD.

Definition Classes: CodegenSupport

def producedAttributes: AttributeSet

Definition Classes: QueryPlan

lazy val references: AttributeSet

Definition Classes: QueryPlan
Annotations: @transient()

def requiredChildDistribution: Seq[Distribution]

Specifies the data distribution requirements of all the children for this operator.

Specifies the data distribution requirements of all the children for this operator. By default it's UnspecifiedDistribution for each child, which means each child can have any distribution.

If an operator overwrites this method, and specifies distribution requirements(excluding UnspecifiedDistribution and BroadcastDistribution) for more than one child, Spark guarantees that the outputs of these children will have same number of partitions, so that the operator can safely zip partitions of these children's result RDDs. Some operators can leverage this guarantee to satisfy some interesting requirement, e.g., non-broadcast joins can specify HashClusteredDistribution(a,b) for its left child, and specify HashClusteredDistribution(c,d) for its right child, then it's guaranteed that left and right child are co-partitioned by a,b/c,d, which means tuples of same value are in the partitions of same index, e.g., (a=1,b=2) and (c=1,d=2) are both in the second partition of left and right child.

Definition Classes: SparkPlan

def requiredChildOrdering: Seq[Seq[SortOrder]]

Specifies sort order for each partition requirements on the input data for this operator.

Definition Classes: SparkPlan

def resetMetrics(): Unit

Resets all the metrics.

Definition Classes: SparkPlan

final def sameResult(other: SparkPlan): Boolean

Definition Classes: QueryPlan

lazy val schema: StructType

Definition Classes: QueryPlan

def schemaString: String

Definition Classes: QueryPlan

val seed: Long

final def semanticHash(): Int

Definition Classes: QueryPlan

def setLogicalLink(logicalPlan: LogicalPlan): Unit

Set logical plan link recursively if unset.

Definition Classes: SparkPlan

def setTagValue[T](tag: TreeNodeTag[T], value: T): Unit

Definition Classes: TreeNode

def shouldStopCheckCode: String

Helper default should stop check code.

Definition Classes: CodegenSupport

def simpleString(maxFields: Int): String

Definition Classes: QueryPlan → TreeNode

def simpleStringWithNodeId(): String

Definition Classes: QueryPlan → TreeNode

def sparkContext: SparkContext

Attributes: protected
Definition Classes: SparkPlan

final val sqlContext: SQLContext

A handle to the SQL Context that was used to create this plan.

A handle to the SQL Context that was used to create this plan. Since many operators need access to the sqlContext for RDD operations or configuration this field is automatically populated by the query planning infrastructure.

Definition Classes: SparkPlan

def statePrefix: String

Attributes: protected
Definition Classes: QueryPlan

def stringArgs: Iterator[Any]

Attributes: protected
Definition Classes: TreeNode

def subqueries: Seq[SparkPlan]

Definition Classes: QueryPlan

def subqueriesAll: Seq[SparkPlan]

Definition Classes: QueryPlan

def supportCodegen: Boolean

Whether this SparkPlan supports whole stage codegen or not.

Definition Classes: CodegenSupport

def supportsColumnar: Boolean

Return true if this stage of the plan supports columnar execution.

Definition Classes: SparkPlan

final def synchronized[T0](arg0: ⇒ T0): T0

Definition Classes: AnyRef

def toJSON: String

Definition Classes: TreeNode

def toString(): String

Definition Classes: TreeNode → AnyRef → Any

def transform(rule: PartialFunction[SparkPlan, SparkPlan]): SparkPlan

Definition Classes: TreeNode

def transformAllExpressions(rule: PartialFunction[Expression, Expression]): SampleExec.this.type

Definition Classes: QueryPlan

def transformDown(rule: PartialFunction[SparkPlan, SparkPlan]): SparkPlan

Definition Classes: TreeNode

def transformExpressions(rule: PartialFunction[Expression, Expression]): SampleExec.this.type

Definition Classes: QueryPlan

def transformExpressionsDown(rule: PartialFunction[Expression, Expression]): SampleExec.this.type

Definition Classes: QueryPlan

def transformExpressionsUp(rule: PartialFunction[Expression, Expression]): SampleExec.this.type

Definition Classes: QueryPlan

def transformUp(rule: PartialFunction[SparkPlan, SparkPlan]): SparkPlan

Definition Classes: TreeNode

def treeString(append: (String) ⇒ Unit, verbose: Boolean, addSuffix: Boolean, maxFields: Int, printOperatorId: Boolean): Unit

Definition Classes: TreeNode

final def treeString(verbose: Boolean, addSuffix: Boolean, maxFields: Int, printOperatorId: Boolean): String

Definition Classes: TreeNode

final def treeString: String

Definition Classes: TreeNode

def unsetTagValue[T](tag: TreeNodeTag[T]): Unit

Definition Classes: TreeNode

val upperBound: Double

def usedInputs: AttributeSet

The subset of inputSet those should be evaluated before this plan.

We will use this to insert some code to access those columns that are actually used by current plan before calling doConsume().

Definition Classes: SampleExec → CodegenSupport

def vectorTypes: Option[Seq[String]]

The exact java types of the columns that are output in columnar processing mode.

The exact java types of the columns that are output in columnar processing mode. This is a performance optimization for code generation and is optional.

Definition Classes: SparkPlan

def verboseString(maxFields: Int): String

Definition Classes: QueryPlan → TreeNode

def verboseStringWithOperatorId(): String

Definition Classes: UnaryExecNode → QueryPlan

def verboseStringWithSuffix(maxFields: Int): String

Definition Classes: TreeNode

final def wait(): Unit

Definition Classes: AnyRef
Annotations: @throws( ... )

final def wait(arg0: Long, arg1: Int): Unit

Definition Classes: AnyRef
Annotations: @throws( ... )

final def wait(arg0: Long): Unit

Definition Classes: AnyRef
Annotations: @throws( ... ) @native()

def waitForSubqueries(): Unit

Blocks the thread until all subqueries finish evaluation and update the results.

Attributes: protected
Definition Classes: SparkPlan

def withNewChildren(newChildren: Seq[SparkPlan]): SparkPlan

Definition Classes: TreeNode

val withReplacement: Boolean

Packages

SampleExec

case class SampleExec(lowerBound: Double, upperBound: Double, withReplacement: Boolean, seed: Long, child: SparkPlan) extends SparkPlan with UnaryExecNode with CodegenSupport with Product with Serializable

Instance Constructors

Value Members

Inherited from CodegenSupport

Inherited from UnaryExecNode

Inherited from SparkPlan

Inherited from Serializable

Inherited from Serializable

Inherited from Logging

Inherited from QueryPlan[SparkPlan]

Inherited from TreeNode[SparkPlan]

Inherited from Product

Inherited from Equals

Inherited from AnyRef

Inherited from Any

Ungrouped

Packages

SampleExec 

case class SampleExec(lowerBound: Double, upperBound: Double, withReplacement: Boolean, seed: Long, child: SparkPlan) extends SparkPlan with UnaryExecNode with CodegenSupport with Product with Serializable

Instance Constructors

Value Members

Inherited from CodegenSupport

Inherited from UnaryExecNode

Inherited from SparkPlan

Inherited from Serializable

Inherited from Serializable

Inherited from Logging

Inherited from QueryPlan[SparkPlan]

Inherited from TreeNode[SparkPlan]

Inherited from Product

Inherited from Equals

Inherited from AnyRef

Inherited from Any

Ungrouped

SampleExec