Package

org.apache.spark.sql.catalyst.plans

logical

Permalink

package logical

Visibility
  1. Public
  2. All

Type Members

  1. case class Aggregate(groupingExpressions: Seq[Expression], aggregateExpressions: Seq[NamedExpression], child: LogicalPlan) extends UnaryNode with Product with Serializable

    Permalink
  2. case class AppendColumns(func: (Any) ⇒ Any, argumentClass: Class[_], argumentSchema: StructType, deserializer: Expression, serializer: Seq[NamedExpression], child: LogicalPlan) extends UnaryNode with Product with Serializable

    Permalink

    A relation produced by applying func to each element of the child, concatenating the resulting columns at the end of the input row.

    A relation produced by applying func to each element of the child, concatenating the resulting columns at the end of the input row.

    deserializer

    used to extract the input to func from an input row.

    serializer

    use to serialize the output of func.

  3. case class AppendColumnsWithObject(func: (Any) ⇒ Any, childSerializer: Seq[NamedExpression], newColumnsSerializer: Seq[NamedExpression], child: LogicalPlan) extends UnaryNode with ObjectConsumer with Product with Serializable

    Permalink

    An optimized version of AppendColumns, that can be executed on deserialized object directly.

  4. abstract class BinaryNode extends LogicalPlan

    Permalink

    A logical plan node with a left and right child.

  5. case class BroadcastHint(child: LogicalPlan) extends UnaryNode with Product with Serializable

    Permalink

    A hint for the optimizer that we should broadcast the child if used in a join operator.

  6. case class CoGroup(func: (Any, Iterator[Any], Iterator[Any]) ⇒ TraversableOnce[Any], keyDeserializer: Expression, leftDeserializer: Expression, rightDeserializer: Expression, leftGroup: Seq[Attribute], rightGroup: Seq[Attribute], leftAttr: Seq[Attribute], rightAttr: Seq[Attribute], outputObjAttr: Attribute, left: LogicalPlan, right: LogicalPlan) extends BinaryNode with ObjectProducer with Product with Serializable

    Permalink

    A relation produced by applying func to each grouping key and associated values from left and right children.

  7. case class ColumnStat(distinctCount: BigInt, min: Option[Any], max: Option[Any], nullCount: BigInt, avgLen: Long, maxLen: Long) extends Product with Serializable

    Permalink

    Statistics collected for a column.

    Statistics collected for a column.

    1. Supported data types are defined in ColumnStat.supportsType. 2. The JVM data type stored in min/max is the external data type (used in Row) for the corresponding Catalyst data type. For example, for DateType we store java.sql.Date, and for TimestampType we store java.sql.Timestamp. 3. For integral types, they are all upcasted to longs, i.e. shorts are stored as longs. 4. There is no guarantee that the statistics collected are accurate. Approximation algorithms (sketches) might have been used, and the data collected can also be stale.

    distinctCount

    number of distinct values

    min

    minimum value

    max

    maximum value

    nullCount

    number of nulls

    avgLen

    average length of the values. For fixed-length types, this should be a constant.

    maxLen

    maximum length of the values. For fixed-length types, this should be a constant.

  8. trait Command extends LeafNode

    Permalink

    A logical node that represents a non-query command to be executed by the system.

    A logical node that represents a non-query command to be executed by the system. For example, commands can be used by parsers to represent DDL operations. Commands, unlike queries, are eagerly executed.

  9. case class DeserializeToObject(deserializer: Expression, outputObjAttr: Attribute, child: LogicalPlan) extends UnaryNode with ObjectProducer with Product with Serializable

    Permalink

    Takes the input row from child and turns it into object using the given deserializer expression.

  10. case class Distinct(child: LogicalPlan) extends UnaryNode with Product with Serializable

    Permalink

    Returns a new logical plan that dedups input rows.

  11. case class EventTimeWatermark(eventTime: Attribute, delay: CalendarInterval, child: LogicalPlan) extends LogicalPlan with Product with Serializable

    Permalink

    Used to mark a user specified column as holding the event time for a row.

  12. case class Except(left: LogicalPlan, right: LogicalPlan) extends SetOperation with Product with Serializable

    Permalink
  13. case class Expand(projections: Seq[Seq[Expression]], output: Seq[Attribute], child: LogicalPlan) extends UnaryNode with Product with Serializable

    Permalink

    Apply a number of projections to every input row, hence we will get multiple output rows for an input row.

    Apply a number of projections to every input row, hence we will get multiple output rows for an input row.

    projections

    to apply

    output

    of all projections.

    child

    operator.

  14. case class Filter(condition: Expression, child: LogicalPlan) extends UnaryNode with PredicateHelper with Product with Serializable

    Permalink
  15. case class FlatMapGroupsInR(func: Array[Byte], packageNames: Array[Byte], broadcastVars: Array[Broadcast[AnyRef]], inputSchema: StructType, outputSchema: StructType, keyDeserializer: Expression, valueDeserializer: Expression, groupingAttributes: Seq[Attribute], dataAttributes: Seq[Attribute], outputObjAttr: Attribute, child: LogicalPlan) extends UnaryNode with ObjectProducer with Product with Serializable

    Permalink
  16. case class Generate(generator: Generator, join: Boolean, outer: Boolean, qualifier: Option[String], generatorOutput: Seq[Attribute], child: LogicalPlan) extends UnaryNode with Product with Serializable

    Permalink

    Applies a Generator to a stream of input rows, combining the output of each into a new stream of rows.

    Applies a Generator to a stream of input rows, combining the output of each into a new stream of rows. This operation is similar to a flatMap in functional programming with one important additional feature, which allows the input rows to be joined with their output.

    generator

    the generator expression

    join

    when true, each output row is implicitly joined with the input tuple that produced it.

    outer

    when true, each input row will be output at least once, even if the output of the given generator is empty. outer has no effect when join is false.

    qualifier

    Qualifier for the attributes of generator(UDTF)

    generatorOutput

    The output schema of the Generator.

    child

    Children logical plan node

  17. case class GlobalLimit(limitExpr: Expression, child: LogicalPlan) extends UnaryNode with Product with Serializable

    Permalink
  18. case class GroupingSets(bitmasks: Seq[Int], groupByExprs: Seq[Expression], child: LogicalPlan, aggregations: Seq[NamedExpression]) extends UnaryNode with Product with Serializable

    Permalink

    A GROUP BY clause with GROUPING SETS can generate a result set equivalent to generated by a UNION ALL of multiple simple GROUP BY clauses.

    A GROUP BY clause with GROUPING SETS can generate a result set equivalent to generated by a UNION ALL of multiple simple GROUP BY clauses.

    We will transform GROUPING SETS into logical plan Aggregate(.., Expand) in Analyzer

    bitmasks

    A list of bitmasks, each of the bitmask indicates the selected GroupBy expressions

    groupByExprs

    The Group By expressions candidates, take effective only if the associated bit in the bitmask set to 1.

    child

    Child operator

    aggregations

    The Aggregation expressions, those non selected group by expressions will be considered as constant null if it appears in the expressions

  19. case class InsertIntoTable(table: LogicalPlan, partition: Map[String, Option[String]], child: LogicalPlan, overwrite: OverwriteOptions, ifNotExists: Boolean) extends LogicalPlan with Product with Serializable

    Permalink

    Insert some data into a table.

    Insert some data into a table.

    table

    the logical plan representing the table. In the future this should be a org.apache.spark.sql.catalyst.catalog.CatalogTable once we converge Hive tables and data source tables.

    partition

    a map from the partition key to the partition value (optional). If the partition value is optional, dynamic partition insert will be performed. As an example, INSERT INTO tbl PARTITION (a=1, b=2) AS ... would have Map('a' -> Some('1'), 'b' -> Some('2')), and INSERT INTO tbl PARTITION (a=1, b) AS ... would have Map('a' -> Some('1'), 'b' -> None).

    child

    the logical plan representing data to write to.

    overwrite

    overwrite existing table or partitions.

    ifNotExists

    If true, only write if the table or partition does not exist.

  20. case class Intersect(left: LogicalPlan, right: LogicalPlan) extends SetOperation with Product with Serializable

    Permalink
  21. case class Join(left: LogicalPlan, right: LogicalPlan, joinType: JoinType, condition: Option[Expression]) extends BinaryNode with PredicateHelper with Product with Serializable

    Permalink
  22. abstract class LeafNode extends LogicalPlan

    Permalink

    A logical plan node with no children.

  23. case class LocalLimit(limitExpr: Expression, child: LogicalPlan) extends UnaryNode with Product with Serializable

    Permalink
  24. case class LocalRelation(output: Seq[Attribute], data: Seq[InternalRow] = Nil) extends LeafNode with MultiInstanceRelation with Product with Serializable

    Permalink
  25. abstract class LogicalPlan extends QueryPlan[LogicalPlan] with Logging

    Permalink
  26. case class MapElements(func: AnyRef, argumentClass: Class[_], argumentSchema: StructType, outputObjAttr: Attribute, child: LogicalPlan) extends UnaryNode with ObjectConsumer with ObjectProducer with Product with Serializable

    Permalink

    A relation produced by applying func to each element of the child.

  27. case class MapGroups(func: (Any, Iterator[Any]) ⇒ TraversableOnce[Any], keyDeserializer: Expression, valueDeserializer: Expression, groupingAttributes: Seq[Attribute], dataAttributes: Seq[Attribute], outputObjAttr: Attribute, child: LogicalPlan) extends UnaryNode with ObjectProducer with Product with Serializable

    Permalink

    Applies func to each unique group in child, based on the evaluation of groupingAttributes.

    Applies func to each unique group in child, based on the evaluation of groupingAttributes. Func is invoked with an object representation of the grouping key an iterator containing the object representation of all the rows with that key.

    keyDeserializer

    used to extract the key object for each group.

    valueDeserializer

    used to extract the items in the iterator from an input row.

  28. case class MapPartitions(func: (Iterator[Any]) ⇒ Iterator[Any], outputObjAttr: Attribute, child: LogicalPlan) extends UnaryNode with ObjectConsumer with ObjectProducer with Product with Serializable

    Permalink

    A relation produced by applying func to each partition of the child.

  29. case class MapPartitionsInR(func: Array[Byte], packageNames: Array[Byte], broadcastVars: Array[Broadcast[AnyRef]], inputSchema: StructType, outputSchema: StructType, outputObjAttr: Attribute, child: LogicalPlan) extends UnaryNode with ObjectConsumer with ObjectProducer with Product with Serializable

    Permalink

    A relation produced by applying a serialized R function func to each partition of the child.

  30. trait ObjectConsumer extends UnaryNode

    Permalink

    A trait for logical operators that consumes domain objects as input.

    A trait for logical operators that consumes domain objects as input. The output of its child must be a single-field row containing the input object.

  31. trait ObjectProducer extends LogicalPlan

    Permalink

    A trait for logical operators that produces domain objects as output.

    A trait for logical operators that produces domain objects as output. The output of this operator is a single-field safe row containing the produced object.

  32. case class OverwriteOptions(enabled: Boolean, staticPartitionKeys: TablePartitionSpec = Map.empty) extends Product with Serializable

    Permalink

    Options for writing new data into a table.

    Options for writing new data into a table.

    enabled

    whether to overwrite existing data in the table.

    staticPartitionKeys

    if non-empty, specifies that we only want to overwrite partitions that match this partial partition spec. If empty, all partitions will be overwritten.

  33. case class Pivot(groupByExprs: Seq[NamedExpression], pivotColumn: Expression, pivotValues: Seq[Literal], aggregates: Seq[Expression], child: LogicalPlan) extends UnaryNode with Product with Serializable

    Permalink
  34. case class Project(projectList: Seq[NamedExpression], child: LogicalPlan) extends UnaryNode with Product with Serializable

    Permalink
  35. case class Range(start: Long, end: Long, step: Long, numSlices: Option[Int], output: Seq[Attribute]) extends LeafNode with MultiInstanceRelation with Product with Serializable

    Permalink
  36. case class Repartition(numPartitions: Int, shuffle: Boolean, child: LogicalPlan) extends UnaryNode with Product with Serializable

    Permalink

    Returns a new RDD that has exactly numPartitions partitions.

    Returns a new RDD that has exactly numPartitions partitions. Differs from RepartitionByExpression as this method is called directly by DataFrame's, because the user asked for coalesce or repartition. RepartitionByExpression is used when the consumer of the output requires some specific ordering or distribution of the data.

  37. case class RepartitionByExpression(partitionExpressions: Seq[Expression], child: LogicalPlan, numPartitions: Option[Int] = None) extends UnaryNode with Product with Serializable

    Permalink

    This method repartitions data using Expressions into numPartitions, and receives information about the number of partitions during execution.

    This method repartitions data using Expressions into numPartitions, and receives information about the number of partitions during execution. Used when a specific ordering or distribution is expected by the consumer of the query result. Use Repartition for RDD-like coalesce and repartition. If numPartitions is not specified, the number of partitions will be the number set by spark.sql.shuffle.partitions.

  38. case class ReturnAnswer(child: LogicalPlan) extends UnaryNode with Product with Serializable

    Permalink

    When planning take() or collect() operations, this special node that is inserted at the top of the logical plan before invoking the query planner.

    When planning take() or collect() operations, this special node that is inserted at the top of the logical plan before invoking the query planner.

    Rules can pattern-match on this node in order to apply transformations that only take effect at the top of the logical query plan.

  39. case class Sample(lowerBound: Double, upperBound: Double, withReplacement: Boolean, seed: Long, child: LogicalPlan)(isTableSample: Boolean = false) extends UnaryNode with Product with Serializable

    Permalink

    Sample the dataset.

    Sample the dataset.

    lowerBound

    Lower-bound of the sampling probability (usually 0.0)

    upperBound

    Upper-bound of the sampling probability. The expected fraction sampled will be ub - lb.

    withReplacement

    Whether to sample with replacement.

    seed

    the random seed

    child

    the LogicalPlan

    isTableSample

    Is created from TABLESAMPLE in the parser.

  40. case class ScriptInputOutputSchema(inputRowFormat: Seq[(String, String)], outputRowFormat: Seq[(String, String)], inputSerdeClass: Option[String], outputSerdeClass: Option[String], inputSerdeProps: Seq[(String, String)], outputSerdeProps: Seq[(String, String)], recordReaderClass: Option[String], recordWriterClass: Option[String], schemaLess: Boolean) extends Product with Serializable

    Permalink

    Input and output properties when passing data to a script.

    Input and output properties when passing data to a script. For example, in Hive this would specify which SerDes to use.

  41. case class ScriptTransformation(input: Seq[Expression], script: String, output: Seq[Attribute], child: LogicalPlan, ioschema: ScriptInputOutputSchema) extends UnaryNode with Product with Serializable

    Permalink

    Transforms the input by forking and running the specified script.

    Transforms the input by forking and running the specified script.

    input

    the set of expression that should be passed to the script.

    script

    the command that should be executed.

    output

    the attributes that are produced by the script.

    ioschema

    the input and output schema applied in the execution of the script.

  42. case class SerializeFromObject(serializer: Seq[NamedExpression], child: LogicalPlan) extends UnaryNode with ObjectConsumer with Product with Serializable

    Permalink

    Takes the input object from child and turns it into unsafe row using the given serializer expression.

  43. abstract class SetOperation extends BinaryNode

    Permalink
  44. case class Sort(order: Seq[SortOrder], global: Boolean, child: LogicalPlan) extends UnaryNode with Product with Serializable

    Permalink

    order

    The ordering expressions

    global

    True means global sorting apply for entire data set, False means sorting only apply within the partition.

    child

    Child logical plan

  45. case class Statistics(sizeInBytes: BigInt, rowCount: Option[BigInt] = None, colStats: Map[String, ColumnStat] = Map.empty, isBroadcastable: Boolean = false) extends Product with Serializable

    Permalink

    Estimates of various statistics.

    Estimates of various statistics. The default estimation logic simply lazily multiplies the corresponding statistic produced by the children. To override this behavior, override statistics and assign it an overridden version of Statistics.

    NOTE: concrete and/or overridden versions of statistics fields should pay attention to the performance of the implementations. The reason is that estimations might get triggered in performance-critical processes, such as query plan planning.

    Note that we are using a BigInt here since it is easy to overflow a 64-bit integer in cardinality estimation (e.g. cartesian joins).

    sizeInBytes

    Physical size in bytes. For leaf operators this defaults to 1, otherwise it defaults to the product of children's sizeInBytes.

    rowCount

    Estimated number of rows.

    colStats

    Column-level statistics.

    isBroadcastable

    If true, output is small enough to be used in a broadcast join.

  46. case class Subquery(child: LogicalPlan) extends UnaryNode with Product with Serializable

    Permalink

    This node is inserted at the top of a subquery when it is optimized.

    This node is inserted at the top of a subquery when it is optimized. This makes sure we can recognize a subquery as such, and it allows us to write subquery aware transformations.

  47. case class SubqueryAlias(alias: String, child: LogicalPlan, view: Option[TableIdentifier]) extends UnaryNode with Product with Serializable

    Permalink
  48. case class TypedFilter(func: AnyRef, argumentClass: Class[_], argumentSchema: StructType, deserializer: Expression, child: LogicalPlan) extends UnaryNode with Product with Serializable

    Permalink

    A relation produced by applying func to each element of the child and filter them by the resulting boolean value.

    A relation produced by applying func to each element of the child and filter them by the resulting boolean value.

    This is logically equal to a normal Filter operator whose condition expression is decoding the input row to object and apply the given function with decoded object. However we need the encapsulation of TypedFilter to make the concept more clear and make it easier to write optimizer rules.

  49. abstract class UnaryNode extends LogicalPlan

    Permalink

    A logical plan node with single child.

  50. case class Union(children: Seq[LogicalPlan]) extends LogicalPlan with Product with Serializable

    Permalink
  51. case class Window(windowExpressions: Seq[NamedExpression], partitionSpec: Seq[Expression], orderSpec: Seq[SortOrder], child: LogicalPlan) extends UnaryNode with Product with Serializable

    Permalink
  52. case class With(child: LogicalPlan, cteRelations: Seq[(String, SubqueryAlias)]) extends UnaryNode with Product with Serializable

    Permalink

    A container for holding named common table expressions (CTEs) and a query plan.

    A container for holding named common table expressions (CTEs) and a query plan. This operator will be removed during analysis and the relations will be substituted into child.

    child

    The final query of this CTE.

    cteRelations

    A sequence of pair (alias, the CTE definition) that this CTE defined Each CTE can see the base tables and the previously defined CTEs only.

  53. case class WithWindowDefinition(windowDefinitions: Map[String, WindowSpecDefinition], child: LogicalPlan) extends UnaryNode with Product with Serializable

    Permalink

Value Members

  1. object AppendColumns extends Serializable

    Permalink

    Factory for constructing new AppendColumn nodes.

  2. object CatalystSerde

    Permalink
  3. object CoGroup extends Serializable

    Permalink

    Factory for constructing new CoGroup nodes.

  4. object ColumnStat extends Logging with Serializable

    Permalink
  5. object EventTimeWatermark extends Serializable

    Permalink
  6. object Expand extends Serializable

    Permalink
  7. object FlatMapGroupsInR extends Serializable

    Permalink

    Factory for constructing new FlatMapGroupsInR nodes.

  8. object Limit

    Permalink
  9. object LocalRelation extends Serializable

    Permalink
  10. object MapElements extends Serializable

    Permalink
  11. object MapGroups extends Serializable

    Permalink

    Factory for constructing new MapGroups nodes.

  12. object MapPartitions extends Serializable

    Permalink
  13. object MapPartitionsInR extends Serializable

    Permalink
  14. object OneRowRelation extends LeafNode with Product with Serializable

    Permalink

    A relation with one row.

    A relation with one row. This is used in "SELECT ..." without a from clause.

  15. object Range extends Serializable

    Permalink

    Factory for constructing new Range nodes.

  16. object SetOperation

    Permalink
  17. object TypedFilter extends Serializable

    Permalink
  18. object Union extends Serializable

    Permalink

    Factory for constructing new Union nodes.

Ungrouped