Package

org.apache.spark.sql.catalyst

analysis

Permalink

package analysis

Provides a logical query plan Analyzer and supporting classes for performing analysis. Analysis consists of translating UnresolvedAttributes and UnresolvedRelations into fully typed objects using information in a schema Catalog.

Linear Supertypes
AnyRef, Any
Ordering
  1. Alphabetic
  2. By Inheritance
Inherited
  1. analysis
  2. AnyRef
  3. Any
  1. Hide All
  2. Show All
Visibility
  1. Public
  2. All

Type Members

  1. case class AliasViewChild(conf: SQLConf) extends Rule[LogicalPlan] with CastSupport with Product with Serializable

    Permalink

    Make sure that a view's child plan produces the view's output attributes.

    Make sure that a view's child plan produces the view's output attributes. We try to wrap the child by: 1. Generate the queryOutput by: 1.1. If the query column names are defined, map the column names to attributes in the child output by name(This is mostly for handling view queries like SELECT * FROM ..., the schema of the referenced table/view may change after the view has been created, so we have to save the output of the query to viewQueryColumnNames, and restore them during view resolution, in this way, we are able to get the correct view column ordering and omit the extra columns that we don't require); 1.2. Else set the child output attributes to queryOutput. 2. Map the queryQutput to view output by index, if the corresponding attributes don't match, try to up cast and alias the attribute in queryOutput to the attribute in the view output. 3. Add a Project over the child, with the new output generated by the previous steps. If the view output doesn't have the same number of columns neither with the child output, nor with the query column names, throw an AnalysisException.

    This should be only done after the batch of Resolution, because the view attributes are not completely resolved during the batch of Resolution.

  2. case class AnalysisContext(defaultDatabase: Option[String] = None, nestedViewDepth: Int = 0) extends Product with Serializable

    Permalink

    Provides a way to keep state during the analysis, this enables us to decouple the concerns of analysis environment from the catalog.

    Provides a way to keep state during the analysis, this enables us to decouple the concerns of analysis environment from the catalog.

    Note this is thread local.

    defaultDatabase

    The default database used in the view resolution, this overrules the current catalog database.

    nestedViewDepth

    The nested depth in the view resolution, this enables us to limit the depth of nested views.

  3. implicit class AnalysisErrorAt extends AnyRef

    Permalink
  4. class Analyzer extends RuleExecutor[LogicalPlan] with CheckAnalysis

    Permalink

    Provides a logical query plan analyzer, which translates UnresolvedAttributes and UnresolvedRelations into fully typed objects using information in a SessionCatalog and a FunctionRegistry.

  5. trait CastSupport extends AnyRef

    Permalink

    Mix-in trait for constructing valid Cast expressions.

  6. trait CheckAnalysis extends PredicateHelper

    Permalink

    Throws user facing errors when passed invalid queries that fail to analyze.

  7. class DatabaseAlreadyExistsException extends AnalysisException

    Permalink

    Thrown by a catalog when an item already exists.

    Thrown by a catalog when an item already exists. The analyzer will rethrow the exception as an org.apache.spark.sql.AnalysisException with the correct position information.

  8. class FunctionAlreadyExistsException extends AnalysisException

    Permalink
  9. trait FunctionRegistry extends AnyRef

    Permalink

    A catalog for looking up user defined functions, used by an Analyzer.

    A catalog for looking up user defined functions, used by an Analyzer.

    Note: The implementation should be thread-safe to allow concurrent access.

  10. case class GetColumnByOrdinal(ordinal: Int, dataType: DataType) extends LeafExpression with Unevaluable with NonSQLExpression with Product with Serializable

    Permalink
  11. case class MultiAlias(child: Expression, names: Seq[String]) extends UnaryExpression with NamedExpression with CodegenFallback with Product with Serializable

    Permalink

    Used to assign new names to Generator's output, such as hive udtf.

    Used to assign new names to Generator's output, such as hive udtf. For example the SQL expression "stack(2, key, value, key, value) as (a, b)" could be represented as follows: MultiAlias(stack_function, Seq(a, b))

    child

    the computation being performed

    names

    the names to be associated with each output of computing child.

  12. trait MultiInstanceRelation extends AnyRef

    Permalink

    A trait that should be mixed into query operators where a single instance might appear multiple times in a logical query plan.

    A trait that should be mixed into query operators where a single instance might appear multiple times in a logical query plan. It is invalid to have multiple copies of the same attribute produced by distinct operators in a query tree as this breaks the guarantee that expression ids, which are used to differentiate attributes, are unique.

    During analysis, operators that include this trait may be asked to produce a new version of itself with globally unique expression ids.

  13. class NoSuchDatabaseException extends AnalysisException

    Permalink

    Thrown by a catalog when an item cannot be found.

    Thrown by a catalog when an item cannot be found. The analyzer will rethrow the exception as an org.apache.spark.sql.AnalysisException with the correct position information.

  14. class NoSuchFunctionException extends AnalysisException

    Permalink
  15. class NoSuchPartitionException extends AnalysisException

    Permalink
  16. class NoSuchPartitionsException extends AnalysisException

    Permalink
  17. class NoSuchPermanentFunctionException extends AnalysisException

    Permalink
  18. class NoSuchTableException extends AnalysisException

    Permalink
  19. class NoSuchTempFunctionException extends AnalysisException

    Permalink
  20. class PartitionAlreadyExistsException extends AnalysisException

    Permalink
  21. class PartitionsAlreadyExistException extends AnalysisException

    Permalink
  22. case class ResolveInlineTables(conf: SQLConf) extends Rule[LogicalPlan] with CastSupport with Product with Serializable

    Permalink

    An analyzer rule that replaces UnresolvedInlineTable with LocalRelation.

  23. case class ResolveTimeZone(conf: SQLConf) extends Rule[LogicalPlan] with Product with Serializable

    Permalink

    Replace TimeZoneAwareExpression without timezone id by its copy with session local time zone.

  24. case class ResolvedStar(expressions: Seq[NamedExpression]) extends Star with Unevaluable with Product with Serializable

    Permalink

    Represents all the resolved input attributes to a given relational operator.

    Represents all the resolved input attributes to a given relational operator. This is used in the data frame DSL.

    expressions

    Expressions to expand.

  25. type Resolver = (String, String) ⇒ Boolean

    Permalink

    Resolver should return true if the first string refers to the same entity as the second string.

    Resolver should return true if the first string refers to the same entity as the second string. For example, by using case insensitive equality.

  26. class SimpleFunctionRegistry extends FunctionRegistry

    Permalink
  27. abstract class Star extends LeafExpression with NamedExpression

    Permalink

    Represents all of the input attributes to a given relational operator, for example in "SELECT * FROM ...".

    Represents all of the input attributes to a given relational operator, for example in "SELECT * FROM ...". A Star gets automatically expanded during analysis.

  28. class SubstituteUnresolvedOrdinals extends Rule[LogicalPlan]

    Permalink

    Replaces ordinal in 'order by' or 'group by' with UnresolvedOrdinal expression.

  29. class TableAlreadyExistsException extends AnalysisException

    Permalink
  30. class TempTableAlreadyExistsException extends AnalysisException

    Permalink
  31. trait TypeCheckResult extends AnyRef

    Permalink

    Represents the result of Expression.checkInputDataTypes.

    Represents the result of Expression.checkInputDataTypes. We will throw AnalysisException in CheckAnalysis if isFailure is true.

  32. case class UnresolvedAlias(child: Expression, aliasFunc: Option[(Expression) ⇒ String] = None) extends UnaryExpression with NamedExpression with Unevaluable with Product with Serializable

    Permalink

    Holds the expression that has yet to be aliased.

    Holds the expression that has yet to be aliased.

    child

    The computation that is needs to be resolved during analysis.

    aliasFunc

    The function if specified to be called to generate an alias to associate with the result of computing child

  33. case class UnresolvedAttribute(nameParts: Seq[String]) extends Attribute with Unevaluable with Product with Serializable

    Permalink

    Holds the name of an attribute that has yet to be resolved.

  34. case class UnresolvedDeserializer(deserializer: Expression, inputAttributes: Seq[Attribute] = Nil) extends UnaryExpression with Unevaluable with NonSQLExpression with Product with Serializable

    Permalink

    Holds the deserializer expression and the attributes that are available during the resolution for it.

    Holds the deserializer expression and the attributes that are available during the resolution for it. Deserializer expression is a special kind of expression that is not always resolved by children output, but by given attributes, e.g. the keyDeserializer in MapGroups should be resolved by groupingAttributes instead of children output.

    deserializer

    The unresolved deserializer expression

    inputAttributes

    The input attributes used to resolve deserializer expression, can be empty if we want to resolve deserializer by children output.

  35. class UnresolvedException[TreeType <: TreeNode[_]] extends TreeNodeException[TreeType]

    Permalink

    Thrown when an invalid attempt is made to access a property of a tree that has yet to be fully resolved.

  36. case class UnresolvedExtractValue(child: Expression, extraction: Expression) extends UnaryExpression with Unevaluable with Product with Serializable

    Permalink

    Extracts a value or values from an Expression

    Extracts a value or values from an Expression

    child

    The expression to extract value from, can be Map, Array, Struct or array of Structs.

    extraction

    The expression to describe the extraction, can be key of Map, index of Array, field name of Struct.

  37. case class UnresolvedFunction(name: FunctionIdentifier, children: Seq[Expression], isDistinct: Boolean) extends Expression with Unevaluable with Product with Serializable

    Permalink
  38. case class UnresolvedGenerator(name: FunctionIdentifier, children: Seq[Expression]) extends Expression with Generator with Product with Serializable

    Permalink

    Represents an unresolved generator, which will be created by the parser for the org.apache.spark.sql.catalyst.plans.logical.Generate operator.

    Represents an unresolved generator, which will be created by the parser for the org.apache.spark.sql.catalyst.plans.logical.Generate operator. The analyzer will resolve this generator.

  39. case class UnresolvedInlineTable(names: Seq[String], rows: Seq[Seq[Expression]]) extends LeafNode with Product with Serializable

    Permalink

    An inline table that has not been resolved yet.

    An inline table that has not been resolved yet. Once resolved, it is turned by the analyzer into a org.apache.spark.sql.catalyst.plans.logical.LocalRelation.

    names

    list of column names

    rows

    expressions for the data

  40. case class UnresolvedOrdinal(ordinal: Int) extends LeafExpression with Unevaluable with NonSQLExpression with Product with Serializable

    Permalink

    Represents unresolved ordinal used in order by or group by.

    Represents unresolved ordinal used in order by or group by.

    For example:

    select a from table order by 1
    select a   from table group by 1
    ordinal

    ordinal starts from 1, instead of 0

  41. case class UnresolvedRelation(tableIdentifier: TableIdentifier) extends LeafNode with Product with Serializable

    Permalink

    Holds the name of a relation that has yet to be looked up in a catalog.

  42. case class UnresolvedStar(target: Option[Seq[String]]) extends Star with Unevaluable with Product with Serializable

    Permalink

    Represents all of the input attributes to a given relational operator, for example in "SELECT * FROM ...".

    Represents all of the input attributes to a given relational operator, for example in "SELECT * FROM ...".

    This is also used to expand structs. For example: "SELECT record.* from (SELECT struct(a,b,c) as record ...)

    target

    an optional name that should be the target of the expansion. If omitted all targets' columns are produced. This can either be a table name or struct name. This is a list of identifiers that is the path of the expansion.

  43. case class UnresolvedTableValuedFunction(functionName: String, functionArgs: Seq[Expression]) extends LeafNode with Product with Serializable

    Permalink

    A table-valued function, e.g.

    A table-valued function, e.g.

    select * from range(10);

Value Members

  1. object AnalysisContext extends Serializable

    Permalink
  2. object CleanupAliases extends Rule[LogicalPlan]

    Permalink

    Cleans up unnecessary Aliases inside the plan.

    Cleans up unnecessary Aliases inside the plan. Basically we only need Alias as a top level expression in Project(project list) or Aggregate(aggregate expressions) or Window(window expressions). Notice that if an expression has other expression parameters which are not in its children, e.g. RuntimeReplaceable, the transformation for Aliases in this rule can't work for those parameters.

  3. object DecimalPrecision extends Rule[LogicalPlan]

    Permalink

    Calculates and propagates precision for fixed-precision decimals.

    Calculates and propagates precision for fixed-precision decimals. Hive has a number of rules for this based on the SQL standard and MS SQL: https://cwiki.apache.org/confluence/download/attachments/27362075/Hive_Decimal_Precision_Scale_Support.pdf https://msdn.microsoft.com/en-us/library/ms190476.aspx

    In particular, if we have expressions e1 and e2 with precision/scale p1/s2 and p2/s2 respectively, then the following operations have the following precision / scale:

    Operation Result Precision Result Scale ------------------------------------------------------------------------ e1 + e2 max(s1, s2) + max(p1-s1, p2-s2) + 1 max(s1, s2) e1 - e2 max(s1, s2) + max(p1-s1, p2-s2) + 1 max(s1, s2) e1 * e2 p1 + p2 + 1 s1 + s2 e1 / e2 p1 - s1 + s2 + max(6, s1 + p2 + 1) max(6, s1 + p2 + 1) e1 % e2 min(p1-s1, p2-s2) + max(s1, s2) max(s1, s2) e1 union e2 max(s1, s2) + max(p1-s1, p2-s2) max(s1, s2) sum(e1) p1 + 10 s1 avg(e1) p1 + 4 s1 + 4

    To implement the rules for fixed-precision types, we introduce casts to turn them to unlimited precision, do the math on unlimited-precision numbers, then introduce casts back to the required fixed precision. This allows us to do all rounding and overflow handling in the cast-to-fixed-precision operator.

    In addition, when mixing non-decimal types with decimals, we use the following rules: - BYTE gets turned into DECIMAL(3, 0) - SHORT gets turned into DECIMAL(5, 0) - INT gets turned into DECIMAL(10, 0) - LONG gets turned into DECIMAL(20, 0) - FLOAT and DOUBLE cause fixed-length decimals to turn into DOUBLE

  4. object EliminateEventTimeWatermark extends Rule[LogicalPlan]

    Permalink

    Ignore event time watermark in batch query, which is only supported in Structured Streaming.

    Ignore event time watermark in batch query, which is only supported in Structured Streaming. TODO: add this rule into analyzer rule list.

  5. object EliminateSubqueryAliases extends Rule[LogicalPlan]

    Permalink

    Removes SubqueryAlias operators from the plan.

    Removes SubqueryAlias operators from the plan. Subqueries are only required to provide scoping information for attributes and can be removed once analysis is complete.

  6. object EliminateUnions extends Rule[LogicalPlan]

    Permalink

    Removes Union operators from the plan if it just has one child.

  7. object EliminateView extends Rule[LogicalPlan]

    Permalink

    Removes View operators from the plan.

    Removes View operators from the plan. The operator is respected till the end of analysis stage because we want to see which part of an analyzed logical plan is generated from a view.

  8. object EmptyFunctionRegistry extends FunctionRegistry

    Permalink

    A trivial catalog that returns an error when a function is requested.

    A trivial catalog that returns an error when a function is requested. Used for testing when all functions are already filled in and the analyzer needs only to resolve attribute references.

  9. object FunctionRegistry

    Permalink
  10. object ResolveCreateNamedStruct extends Rule[LogicalPlan]

    Permalink

    Resolve a CreateNamedStruct if it contains NamePlaceholders.

  11. object ResolveHints

    Permalink

    Collection of rules related to hints.

    Collection of rules related to hints. The only hint currently available is broadcast join hint.

    Note that this is separately into two rules because in the future we might introduce new hint rules that have different ordering requirements from broadcast.

  12. object ResolveTableValuedFunctions extends Rule[LogicalPlan]

    Permalink

    Rule that resolves table-valued function references.

  13. object SimpleAnalyzer extends Analyzer

    Permalink

    A trivial Analyzer with a dummy SessionCatalog and EmptyFunctionRegistry.

    A trivial Analyzer with a dummy SessionCatalog and EmptyFunctionRegistry. Used for testing when all relations are already filled in and the analyzer needs only to resolve attribute references.

  14. object TimeWindowing extends Rule[LogicalPlan]

    Permalink

    Maps a time column to multiple time windows using the Expand operator.

    Maps a time column to multiple time windows using the Expand operator. Since it's non-trivial to figure out how many windows a time column can map to, we over-estimate the number of windows and filter out the rows where the time column is not inside the time window.

  15. object TypeCheckResult

    Permalink
  16. object TypeCoercion

    Permalink

    A collection of Rule that can be used to coerce differing types that participate in operations into compatible ones.

    A collection of Rule that can be used to coerce differing types that participate in operations into compatible ones.

    Notes about type widening / tightest common types: Broadly, there are two cases when we need to widen data types (e.g. union, binary comparison). In case 1, we are looking for a common data type for two or more data types, and in this case no loss of precision is allowed. Examples include type inference in JSON (e.g. what's the column's data type if one row is an integer while the other row is a long?). In case 2, we are looking for a widened data type with some acceptable loss of precision (e.g. there is no common type for double and decimal because double's range is larger than decimal, and yet decimal is more precise than double, but in union we would cast the decimal into double).

  17. object UnresolvedAttribute extends Serializable

    Permalink
  18. object UnresolvedFunction extends Serializable

    Permalink
  19. object UnsupportedOperationChecker

    Permalink

    Analyzes the presence of unsupported operations in a logical plan.

  20. object UpdateOuterReferences extends Rule[LogicalPlan]

    Permalink

    The aggregate expressions from subquery referencing outer query block are pushed down to the outer query block for evaluation.

    The aggregate expressions from subquery referencing outer query block are pushed down to the outer query block for evaluation. This rule below updates such outer references as AttributeReference referring attributes from the parent/outer query block.

    For example (SQL):

    SELECT l.a FROM l GROUP BY 1 HAVING EXISTS (SELECT 1 FROM r WHERE r.d < min(l.b))

    Plan before the rule. Project [a#226] +- Filter exists#245 [min(b#227)#249] : +- Project [1 AS 1#247] : +- Filter (d#238 < min(outer(b#227))) <----- : +- SubqueryAlias r : +- Project [_1#234 AS c#237, _2#235 AS d#238] : +- LocalRelation [_1#234, _2#235] +- Aggregate [a#226], [a#226, min(b#227) AS min(b#227)#249] +- SubqueryAlias l +- Project [_1#223 AS a#226, _2#224 AS b#227] +- LocalRelation [_1#223, _2#224] Plan after the rule. Project [a#226] +- Filter exists#245 [min(b#227)#249] : +- Project [1 AS 1#247] : +- Filter (d#238 < outer(min(b#227)#249)) <----- : +- SubqueryAlias r : +- Project [_1#234 AS c#237, _2#235 AS d#238] : +- LocalRelation [_1#234, _2#235] +- Aggregate [a#226], [a#226, min(b#227) AS min(b#227)#249] +- SubqueryAlias l +- Project [_1#223 AS a#226, _2#224 AS b#227] +- LocalRelation [_1#223, _2#224]

  21. val caseInsensitiveResolution: (String, String) ⇒ Boolean

    Permalink
  22. val caseSensitiveResolution: (String, String) ⇒ Boolean

    Permalink
  23. def withPosition[A](t: TreeNode[_])(f: ⇒ A): A

    Permalink

    Catches any AnalysisExceptions thrown by f and attaches t's position if any.

Inherited from AnyRef

Inherited from Any

Ungrouped