Package

org.apache.spark.sql.catalyst.plans

logical

Permalink

package logical

Visibility
  1. Public
  2. All

Type Members

  1. case class Aggregate(groupingExpressions: Seq[Expression], aggregateExpressions: Seq[NamedExpression], child: LogicalPlan) extends UnaryNode with Product with Serializable

    Permalink
  2. case class AppendColumns(func: (Any) ⇒ Any, argumentClass: Class[_], argumentSchema: StructType, deserializer: Expression, serializer: Seq[NamedExpression], child: LogicalPlan) extends UnaryNode with Product with Serializable

    Permalink

    A relation produced by applying func to each element of the child, concatenating the resulting columns at the end of the input row.

    A relation produced by applying func to each element of the child, concatenating the resulting columns at the end of the input row.

    deserializer

    used to extract the input to func from an input row.

    serializer

    use to serialize the output of func.

  3. case class AppendColumnsWithObject(func: (Any) ⇒ Any, childSerializer: Seq[NamedExpression], newColumnsSerializer: Seq[NamedExpression], child: LogicalPlan) extends UnaryNode with ObjectConsumer with Product with Serializable

    Permalink

    An optimized version of AppendColumns, that can be executed on deserialized object directly.

  4. abstract class BinaryNode extends LogicalPlan

    Permalink

    A logical plan node with a left and right child.

  5. case class CoGroup(func: (Any, Iterator[Any], Iterator[Any]) ⇒ TraversableOnce[Any], keyDeserializer: Expression, leftDeserializer: Expression, rightDeserializer: Expression, leftGroup: Seq[Attribute], rightGroup: Seq[Attribute], leftAttr: Seq[Attribute], rightAttr: Seq[Attribute], outputObjAttr: Attribute, left: LogicalPlan, right: LogicalPlan) extends BinaryNode with ObjectProducer with Product with Serializable

    Permalink

    A relation produced by applying func to each grouping key and associated values from left and right children.

  6. case class ColumnStat(distinctCount: BigInt, min: Option[Any], max: Option[Any], nullCount: BigInt, avgLen: Long, maxLen: Long) extends Product with Serializable

    Permalink

    Statistics collected for a column.

    Statistics collected for a column.

    1. Supported data types are defined in ColumnStat.supportsType. 2. The JVM data type stored in min/max is the internal data type for the corresponding Catalyst data type. For example, the internal type of DateType is Int, and that the internal type of TimestampType is Long. 3. There is no guarantee that the statistics collected are accurate. Approximation algorithms (sketches) might have been used, and the data collected can also be stale.

    distinctCount

    number of distinct values

    min

    minimum value

    max

    maximum value

    nullCount

    number of nulls

    avgLen

    average length of the values. For fixed-length types, this should be a constant.

    maxLen

    maximum length of the values. For fixed-length types, this should be a constant.

  7. trait Command extends LeafNode

    Permalink

    A logical node that represents a non-query command to be executed by the system.

    A logical node that represents a non-query command to be executed by the system. For example, commands can be used by parsers to represent DDL operations. Commands, unlike queries, are eagerly executed.

  8. case class Deduplicate(keys: Seq[Attribute], child: LogicalPlan, streaming: Boolean) extends UnaryNode with Product with Serializable

    Permalink

    A logical plan for dropDuplicates.

  9. case class DeserializeToObject(deserializer: Expression, outputObjAttr: Attribute, child: LogicalPlan) extends UnaryNode with ObjectProducer with Product with Serializable

    Permalink

    Takes the input row from child and turns it into object using the given deserializer expression.

  10. case class Distinct(child: LogicalPlan) extends UnaryNode with Product with Serializable

    Permalink

    Returns a new logical plan that dedups input rows.

  11. case class EventTimeWatermark(eventTime: Attribute, delay: CalendarInterval, child: LogicalPlan) extends LogicalPlan with Product with Serializable

    Permalink

    Used to mark a user specified column as holding the event time for a row.

  12. case class Except(left: LogicalPlan, right: LogicalPlan) extends SetOperation with Product with Serializable

    Permalink
  13. case class Expand(projections: Seq[Seq[Expression]], output: Seq[Attribute], child: LogicalPlan) extends UnaryNode with Product with Serializable

    Permalink

    Apply a number of projections to every input row, hence we will get multiple output rows for an input row.

    Apply a number of projections to every input row, hence we will get multiple output rows for an input row.

    projections

    to apply

    output

    of all projections.

    child

    operator.

  14. case class Filter(condition: Expression, child: LogicalPlan) extends UnaryNode with PredicateHelper with Product with Serializable

    Permalink
  15. case class FlatMapGroupsInR(func: Array[Byte], packageNames: Array[Byte], broadcastVars: Array[Broadcast[AnyRef]], inputSchema: StructType, outputSchema: StructType, keyDeserializer: Expression, valueDeserializer: Expression, groupingAttributes: Seq[Attribute], dataAttributes: Seq[Attribute], outputObjAttr: Attribute, child: LogicalPlan) extends UnaryNode with ObjectProducer with Product with Serializable

    Permalink
  16. case class FlatMapGroupsWithState(func: (Any, Iterator[Any], LogicalGroupState[Any]) ⇒ Iterator[Any], keyDeserializer: Expression, valueDeserializer: Expression, groupingAttributes: Seq[Attribute], dataAttributes: Seq[Attribute], outputObjAttr: Attribute, stateEncoder: ExpressionEncoder[Any], outputMode: OutputMode, isMapGroupsWithState: Boolean = false, timeout: GroupStateTimeout, child: LogicalPlan) extends UnaryNode with ObjectProducer with Product with Serializable

    Permalink

    Applies func to each unique group in child, based on the evaluation of groupingAttributes, while using state data.

    Applies func to each unique group in child, based on the evaluation of groupingAttributes, while using state data. Func is invoked with an object representation of the grouping key an iterator containing the object representation of all the rows with that key.

    func

    function called on each group

    keyDeserializer

    used to extract the key object for each group.

    valueDeserializer

    used to extract the items in the iterator from an input row.

    groupingAttributes

    used to group the data

    dataAttributes

    used to read the data

    outputObjAttr

    used to define the output object

    stateEncoder

    used to serialize/deserialize state before calling func

    outputMode

    the output mode of func

    isMapGroupsWithState

    whether it is created by the mapGroupsWithState method

    timeout

    used to timeout groups that have not received data in a while

  17. case class Generate(generator: Generator, join: Boolean, outer: Boolean, qualifier: Option[String], generatorOutput: Seq[Attribute], child: LogicalPlan) extends UnaryNode with Product with Serializable

    Permalink

    Applies a Generator to a stream of input rows, combining the output of each into a new stream of rows.

    Applies a Generator to a stream of input rows, combining the output of each into a new stream of rows. This operation is similar to a flatMap in functional programming with one important additional feature, which allows the input rows to be joined with their output.

    generator

    the generator expression

    join

    when true, each output row is implicitly joined with the input tuple that produced it.

    outer

    when true, each input row will be output at least once, even if the output of the given generator is empty.

    qualifier

    Qualifier for the attributes of generator(UDTF)

    generatorOutput

    The output schema of the Generator.

    child

    Children logical plan node

  18. case class GlobalLimit(limitExpr: Expression, child: LogicalPlan) extends UnaryNode with Product with Serializable

    Permalink
  19. case class GroupingSets(selectedGroupByExprs: Seq[Seq[Expression]], groupByExprs: Seq[Expression], child: LogicalPlan, aggregations: Seq[NamedExpression]) extends UnaryNode with Product with Serializable

    Permalink

    A GROUP BY clause with GROUPING SETS can generate a result set equivalent to generated by a UNION ALL of multiple simple GROUP BY clauses.

    A GROUP BY clause with GROUPING SETS can generate a result set equivalent to generated by a UNION ALL of multiple simple GROUP BY clauses.

    We will transform GROUPING SETS into logical plan Aggregate(.., Expand) in Analyzer

    selectedGroupByExprs

    A sequence of selected GroupBy expressions, all exprs should exist in groupByExprs.

    groupByExprs

    The Group By expressions candidates.

    child

    Child operator

    aggregations

    The Aggregation expressions, those non selected group by expressions will be considered as constant null if it appears in the expressions

  20. case class HintInfo(isBroadcastable: Option[Boolean] = None) extends Product with Serializable

    Permalink
  21. case class InsertIntoTable(table: LogicalPlan, partition: Map[String, Option[String]], query: LogicalPlan, overwrite: Boolean, ifPartitionNotExists: Boolean) extends LogicalPlan with Product with Serializable

    Permalink

    Insert some data into a table.

    Insert some data into a table. Note that this plan is unresolved and has to be replaced by the concrete implementations during analysis.

    table

    the logical plan representing the table. In the future this should be a org.apache.spark.sql.catalyst.catalog.CatalogTable once we converge Hive tables and data source tables.

    partition

    a map from the partition key to the partition value (optional). If the partition value is optional, dynamic partition insert will be performed. As an example, INSERT INTO tbl PARTITION (a=1, b=2) AS ... would have Map('a' -> Some('1'), 'b' -> Some('2')), and INSERT INTO tbl PARTITION (a=1, b) AS ... would have Map('a' -> Some('1'), 'b' -> None).

    query

    the logical plan representing data to write to.

    overwrite

    overwrite existing table or partitions.

    ifPartitionNotExists

    If true, only write if the partition does not exist. Only valid for static partitions.

  22. case class Intersect(left: LogicalPlan, right: LogicalPlan) extends SetOperation with Product with Serializable

    Permalink
  23. case class Join(left: LogicalPlan, right: LogicalPlan, joinType: JoinType, condition: Option[Expression]) extends BinaryNode with PredicateHelper with Product with Serializable

    Permalink
  24. abstract class LeafNode extends LogicalPlan

    Permalink

    A logical plan node with no children.

  25. case class LocalLimit(limitExpr: Expression, child: LogicalPlan) extends UnaryNode with Product with Serializable

    Permalink
  26. case class LocalRelation(output: Seq[Attribute], data: Seq[InternalRow] = Nil) extends LeafNode with MultiInstanceRelation with Product with Serializable

    Permalink
  27. trait LogicalGroupState[S] extends AnyRef

    Permalink

    Internal class representing State

  28. abstract class LogicalPlan extends QueryPlan[LogicalPlan] with Logging

    Permalink
  29. case class MapElements(func: AnyRef, argumentClass: Class[_], argumentSchema: StructType, outputObjAttr: Attribute, child: LogicalPlan) extends UnaryNode with ObjectConsumer with ObjectProducer with Product with Serializable

    Permalink

    A relation produced by applying func to each element of the child.

  30. case class MapGroups(func: (Any, Iterator[Any]) ⇒ TraversableOnce[Any], keyDeserializer: Expression, valueDeserializer: Expression, groupingAttributes: Seq[Attribute], dataAttributes: Seq[Attribute], outputObjAttr: Attribute, child: LogicalPlan) extends UnaryNode with ObjectProducer with Product with Serializable

    Permalink

    Applies func to each unique group in child, based on the evaluation of groupingAttributes.

    Applies func to each unique group in child, based on the evaluation of groupingAttributes. Func is invoked with an object representation of the grouping key an iterator containing the object representation of all the rows with that key.

    keyDeserializer

    used to extract the key object for each group.

    valueDeserializer

    used to extract the items in the iterator from an input row.

  31. case class MapPartitions(func: (Iterator[Any]) ⇒ Iterator[Any], outputObjAttr: Attribute, child: LogicalPlan) extends UnaryNode with ObjectConsumer with ObjectProducer with Product with Serializable

    Permalink

    A relation produced by applying func to each partition of the child.

  32. case class MapPartitionsInR(func: Array[Byte], packageNames: Array[Byte], broadcastVars: Array[Broadcast[AnyRef]], inputSchema: StructType, outputSchema: StructType, outputObjAttr: Attribute, child: LogicalPlan) extends UnaryNode with ObjectConsumer with ObjectProducer with Product with Serializable

    Permalink

    A relation produced by applying a serialized R function func to each partition of the child.

  33. trait ObjectConsumer extends UnaryNode

    Permalink

    A trait for logical operators that consumes domain objects as input.

    A trait for logical operators that consumes domain objects as input. The output of its child must be a single-field row containing the input object.

  34. trait ObjectProducer extends LogicalPlan

    Permalink

    A trait for logical operators that produces domain objects as output.

    A trait for logical operators that produces domain objects as output. The output of this operator is a single-field safe row containing the produced object.

  35. case class Pivot(groupByExprs: Seq[NamedExpression], pivotColumn: Expression, pivotValues: Seq[Literal], aggregates: Seq[Expression], child: LogicalPlan) extends UnaryNode with Product with Serializable

    Permalink
  36. case class Project(projectList: Seq[NamedExpression], child: LogicalPlan) extends UnaryNode with Product with Serializable

    Permalink
  37. case class Range(start: Long, end: Long, step: Long, numSlices: Option[Int], output: Seq[Attribute]) extends LeafNode with MultiInstanceRelation with Product with Serializable

    Permalink
  38. case class Repartition(numPartitions: Int, shuffle: Boolean, child: LogicalPlan) extends RepartitionOperation with Product with Serializable

    Permalink

    Returns a new RDD that has exactly numPartitions partitions.

    Returns a new RDD that has exactly numPartitions partitions. Differs from RepartitionByExpression as this method is called directly by DataFrame's, because the user asked for coalesce or repartition. RepartitionByExpression is used when the consumer of the output requires some specific ordering or distribution of the data.

  39. case class RepartitionByExpression(partitionExpressions: Seq[Expression], child: LogicalPlan, numPartitions: Int) extends RepartitionOperation with Product with Serializable

    Permalink

    This method repartitions data using Expressions into numPartitions, and receives information about the number of partitions during execution.

    This method repartitions data using Expressions into numPartitions, and receives information about the number of partitions during execution. Used when a specific ordering or distribution is expected by the consumer of the query result. Use Repartition for RDD-like coalesce and repartition.

  40. abstract class RepartitionOperation extends UnaryNode

    Permalink

    A base interface for RepartitionByExpression and Repartition

  41. case class ResolvedHint(child: LogicalPlan, hints: HintInfo = HintInfo()) extends UnaryNode with Product with Serializable

    Permalink

    A resolved hint node.

    A resolved hint node. The analyzer should convert all UnresolvedHint into ResolvedHint.

  42. case class ReturnAnswer(child: LogicalPlan) extends UnaryNode with Product with Serializable

    Permalink

    When planning take() or collect() operations, this special node that is inserted at the top of the logical plan before invoking the query planner.

    When planning take() or collect() operations, this special node that is inserted at the top of the logical plan before invoking the query planner.

    Rules can pattern-match on this node in order to apply transformations that only take effect at the top of the logical query plan.

  43. case class Sample(lowerBound: Double, upperBound: Double, withReplacement: Boolean, seed: Long, child: LogicalPlan)(isTableSample: Boolean = false) extends UnaryNode with Product with Serializable

    Permalink

    Sample the dataset.

    Sample the dataset.

    lowerBound

    Lower-bound of the sampling probability (usually 0.0)

    upperBound

    Upper-bound of the sampling probability. The expected fraction sampled will be ub - lb.

    withReplacement

    Whether to sample with replacement.

    seed

    the random seed

    child

    the LogicalPlan

    isTableSample

    Is created from TABLESAMPLE in the parser.

  44. case class ScriptInputOutputSchema(inputRowFormat: Seq[(String, String)], outputRowFormat: Seq[(String, String)], inputSerdeClass: Option[String], outputSerdeClass: Option[String], inputSerdeProps: Seq[(String, String)], outputSerdeProps: Seq[(String, String)], recordReaderClass: Option[String], recordWriterClass: Option[String], schemaLess: Boolean) extends Product with Serializable

    Permalink

    Input and output properties when passing data to a script.

    Input and output properties when passing data to a script. For example, in Hive this would specify which SerDes to use.

  45. case class ScriptTransformation(input: Seq[Expression], script: String, output: Seq[Attribute], child: LogicalPlan, ioschema: ScriptInputOutputSchema) extends UnaryNode with Product with Serializable

    Permalink

    Transforms the input by forking and running the specified script.

    Transforms the input by forking and running the specified script.

    input

    the set of expression that should be passed to the script.

    script

    the command that should be executed.

    output

    the attributes that are produced by the script.

    ioschema

    the input and output schema applied in the execution of the script.

  46. case class SerializeFromObject(serializer: Seq[NamedExpression], child: LogicalPlan) extends UnaryNode with ObjectConsumer with Product with Serializable

    Permalink

    Takes the input object from child and turns it into unsafe row using the given serializer expression.

  47. abstract class SetOperation extends BinaryNode

    Permalink
  48. case class Sort(order: Seq[SortOrder], global: Boolean, child: LogicalPlan) extends UnaryNode with Product with Serializable

    Permalink

    order

    The ordering expressions

    global

    True means global sorting apply for entire data set, False means sorting only apply within the partition.

    child

    Child logical plan

  49. case class Statistics(sizeInBytes: BigInt, rowCount: Option[BigInt] = None, attributeStats: AttributeMap[ColumnStat] = AttributeMap(Nil), hints: HintInfo = HintInfo()) extends Product with Serializable

    Permalink

    Estimates of various statistics.

    Estimates of various statistics. The default estimation logic simply lazily multiplies the corresponding statistic produced by the children. To override this behavior, override statistics and assign it an overridden version of Statistics.

    NOTE: concrete and/or overridden versions of statistics fields should pay attention to the performance of the implementations. The reason is that estimations might get triggered in performance-critical processes, such as query plan planning.

    Note that we are using a BigInt here since it is easy to overflow a 64-bit integer in cardinality estimation (e.g. cartesian joins).

    sizeInBytes

    Physical size in bytes. For leaf operators this defaults to 1, otherwise it defaults to the product of children's sizeInBytes.

    rowCount

    Estimated number of rows.

    attributeStats

    Statistics for Attributes.

    hints

    Query hints.

  50. case class Subquery(child: LogicalPlan) extends UnaryNode with Product with Serializable

    Permalink

    This node is inserted at the top of a subquery when it is optimized.

    This node is inserted at the top of a subquery when it is optimized. This makes sure we can recognize a subquery as such, and it allows us to write subquery aware transformations.

  51. case class SubqueryAlias(alias: String, child: LogicalPlan) extends UnaryNode with Product with Serializable

    Permalink
  52. case class TypedFilter(func: AnyRef, argumentClass: Class[_], argumentSchema: StructType, deserializer: Expression, child: LogicalPlan) extends UnaryNode with Product with Serializable

    Permalink

    A relation produced by applying func to each element of the child and filter them by the resulting boolean value.

    A relation produced by applying func to each element of the child and filter them by the resulting boolean value.

    This is logically equal to a normal Filter operator whose condition expression is decoding the input row to object and apply the given function with decoded object. However we need the encapsulation of TypedFilter to make the concept more clear and make it easier to write optimizer rules.

  53. abstract class UnaryNode extends LogicalPlan

    Permalink

    A logical plan node with single child.

  54. case class Union(children: Seq[LogicalPlan]) extends LogicalPlan with Product with Serializable

    Permalink
  55. case class UnresolvedHint(name: String, parameters: Seq[Any], child: LogicalPlan) extends UnaryNode with Product with Serializable

    Permalink

    A general hint for the child that is not yet resolved.

    A general hint for the child that is not yet resolved. This node is generated by the parser and should be removed This node will be eliminated post analysis.

    name

    the name of the hint

    parameters

    the parameters of the hint

    child

    the LogicalPlan on which this hint applies

  56. case class View(desc: CatalogTable, output: Seq[Attribute], child: LogicalPlan) extends LogicalPlan with MultiInstanceRelation with Product with Serializable

    Permalink

    A container for holding the view description(CatalogTable), and the output of the view.

    A container for holding the view description(CatalogTable), and the output of the view. The child should be a logical plan parsed from the CatalogTable.viewText, should throw an error if the viewText is not defined. This operator will be removed at the end of analysis stage.

    desc

    A view description(CatalogTable) that provides necessary information to resolve the view.

    output

    The output of a view operator, this is generated during planning the view, so that we are able to decouple the output from the underlying structure.

    child

    The logical plan of a view operator, it should be a logical plan parsed from the CatalogTable.viewText, should throw an error if the viewText is not defined.

  57. case class Window(windowExpressions: Seq[NamedExpression], partitionSpec: Seq[Expression], orderSpec: Seq[SortOrder], child: LogicalPlan) extends UnaryNode with Product with Serializable

    Permalink
  58. case class With(child: LogicalPlan, cteRelations: Seq[(String, SubqueryAlias)]) extends UnaryNode with Product with Serializable

    Permalink

    A container for holding named common table expressions (CTEs) and a query plan.

    A container for holding named common table expressions (CTEs) and a query plan. This operator will be removed during analysis and the relations will be substituted into child.

    child

    The final query of this CTE.

    cteRelations

    A sequence of pair (alias, the CTE definition) that this CTE defined Each CTE can see the base tables and the previously defined CTEs only.

  59. case class WithWindowDefinition(windowDefinitions: Map[String, WindowSpecDefinition], child: LogicalPlan) extends UnaryNode with Product with Serializable

    Permalink

Value Members

  1. object AppendColumns extends Serializable

    Permalink

    Factory for constructing new AppendColumn nodes.

  2. object CatalystSerde

    Permalink
  3. object CoGroup extends Serializable

    Permalink

    Factory for constructing new CoGroup nodes.

  4. object ColumnStat extends Logging with Serializable

    Permalink
  5. object EventTimeTimeout extends GroupStateTimeout with Product with Serializable

    Permalink
  6. object EventTimeWatermark extends Serializable

    Permalink
  7. object Expand extends Serializable

    Permalink
  8. object FlatMapGroupsInR extends Serializable

    Permalink

    Factory for constructing new FlatMapGroupsInR nodes.

  9. object FlatMapGroupsWithState extends Serializable

    Permalink

    Factory for constructing new MapGroupsWithState nodes.

  10. object FunctionUtils

    Permalink
  11. object Limit

    Permalink
  12. object LocalRelation extends Serializable

    Permalink
  13. object MapElements extends Serializable

    Permalink
  14. object MapGroups extends Serializable

    Permalink

    Factory for constructing new MapGroups nodes.

  15. object MapPartitions extends Serializable

    Permalink
  16. object MapPartitionsInR extends Serializable

    Permalink
  17. object NoTimeout extends GroupStateTimeout with Product with Serializable

    Permalink

    Types of timeouts used in FlatMapGroupsWithState

  18. object OneRowRelation extends LeafNode with Product with Serializable

    Permalink

    A relation with one row.

    A relation with one row. This is used in "SELECT ..." without a from clause.

  19. object ProcessingTimeTimeout extends GroupStateTimeout with Product with Serializable

    Permalink
  20. object Range extends Serializable

    Permalink

    Factory for constructing new Range nodes.

  21. object SetOperation

    Permalink
  22. object TypedFilter extends Serializable

    Permalink
  23. object Union extends Serializable

    Permalink

    Factory for constructing new Union nodes.

  24. package statsEstimation

    Permalink

Ungrouped