analysis

Type Members

implicit class AnalysisErrorAt extends AnyRef
class Analyzer extends RuleExecutor[LogicalPlan] with CheckAnalysis

Provides a logical query plan analyzer, which translates UnresolvedAttributes and UnresolvedRelations into fully typed objects using information in a SessionCatalog and a FunctionRegistry.
trait CheckAnalysis extends PredicateHelper

Throws user facing errors when passed invalid queries that fail to analyze.
class DatabaseAlreadyExistsException extends AnalysisException

Thrown by a catalog when an item already exists.
Thrown by a catalog when an item already exists. The analyzer will rethrow the exception as an org.apache.spark.sql.AnalysisException with the correct position information.
class FunctionAlreadyExistsException extends AnalysisException
trait FunctionRegistry extends AnyRef

A catalog for looking up user defined functions, used by an Analyzer.
A catalog for looking up user defined functions, used by an Analyzer.
Note: The implementation should be thread-safe to allow concurrent access.
case class MultiAlias(child: Expression, names: Seq[String]) extends UnaryExpression with NamedExpression with CodegenFallback with Product with Serializable

Used to assign new names to Generator's output, such as hive udtf.
Used to assign new names to Generator's output, such as hive udtf. For example the SQL expression "stack(2, key, value, key, value) as (a, b)" could be represented as follows: MultiAlias(stack_function, Seq(a, b))
child
the computation being performed
names
the names to be associated with each output of computing child.
trait MultiInstanceRelation extends AnyRef

A trait that should be mixed into query operators where an single instance might appear multiple times in a logical query plan.
A trait that should be mixed into query operators where an single instance might appear multiple times in a logical query plan. It is invalid to have multiple copies of the same attribute produced by distinct operators in a query tree as this breaks the guarantee that expression ids, which are used to differentiate attributes, are unique.
During analysis, operators that include this trait may be asked to produce a new version of itself with globally unique expression ids.
class NoSuchDatabaseException extends AnalysisException

Thrown by a catalog when an item cannot be found.
Thrown by a catalog when an item cannot be found. The analyzer will rethrow the exception as an org.apache.spark.sql.AnalysisException with the correct position information.
class NoSuchFunctionException extends AnalysisException
class NoSuchPartitionException extends AnalysisException
class NoSuchPartitionsException extends AnalysisException
class NoSuchPermanentFunctionException extends AnalysisException
class NoSuchTableException extends AnalysisException
class NoSuchTempFunctionException extends AnalysisException
sealed trait OutputMode extends AnyRef
class PartitionAlreadyExistsException extends AnalysisException
class PartitionsAlreadyExistException extends AnalysisException
case class ResolvedStar(expressions: Seq[NamedExpression]) extends Star with Unevaluable with Product with Serializable

Represents all the resolved input attributes to a given relational operator.
Represents all the resolved input attributes to a given relational operator. This is used in the data frame DSL.
expressions
Expressions to expand.
type Resolver = (String, String) ⇒ Boolean

Resolver should return true if the first string refers to the same entity as the second string.
Resolver should return true if the first string refers to the same entity as the second string. For example, by using case insensitive equality.
class SimpleFunctionRegistry extends FunctionRegistry
abstract class Star extends LeafExpression with NamedExpression

Represents all of the input attributes to a given relational operator, for example in "SELECT * FROM ...".
Represents all of the input attributes to a given relational operator, for example in "SELECT * FROM ...". A Star gets automatically expanded during analysis.
class TableAlreadyExistsException extends AnalysisException
class TempFunctionAlreadyExistsException extends AnalysisException
class TempTableAlreadyExistsException extends AnalysisException
trait TypeCheckResult extends AnyRef

Represents the result of Expression.checkInputDataTypes.
Represents the result of Expression.checkInputDataTypes. We will throw AnalysisException in CheckAnalysis if isFailure is true.
case class UnresolvedAlias(child: Expression, aliasName: Option[String] = None) extends UnaryExpression with NamedExpression with Unevaluable with Product with Serializable

Holds the expression that has yet to be aliased.
Holds the expression that has yet to be aliased.
child
The computation that is needs to be resolved during analysis.
aliasName
The name if specified to be associated with the result of computing child
case class UnresolvedAttribute(nameParts: Seq[String]) extends Attribute with Unevaluable with Product with Serializable

Holds the name of an attribute that has yet to be resolved.
case class UnresolvedDeserializer(deserializer: Expression, inputAttributes: Seq[Attribute] = Nil) extends UnaryExpression with Unevaluable with NonSQLExpression with Product with Serializable

Holds the deserializer expression and the attributes that are available during the resolution for it.
Holds the deserializer expression and the attributes that are available during the resolution for it. Deserializer expression is a special kind of expression that is not always resolved by children output, but by given attributes, e.g. the keyDeserializer in MapGroups should be resolved by groupingAttributes instead of children output.
deserializer
The unresolved deserializer expression
inputAttributes
The input attributes used to resolve deserializer expression, can be empty if we want to resolve deserializer by children output.
class UnresolvedException[TreeType <: TreeNode[_]] extends TreeNodeException[TreeType]

Thrown when an invalid attempt is made to access a property of a tree that has yet to be fully resolved.
case class UnresolvedExtractValue(child: Expression, extraction: Expression) extends UnaryExpression with Unevaluable with Product with Serializable

Extracts a value or values from an Expression
Extracts a value or values from an Expression
child
The expression to extract value from, can be Map, Array, Struct or array of Structs.
extraction
The expression to describe the extraction, can be key of Map, index of Array, field name of Struct.
case class UnresolvedFunction(name: FunctionIdentifier, children: Seq[Expression], isDistinct: Boolean) extends Expression with Unevaluable with Product with Serializable
case class UnresolvedGenerator(name: FunctionIdentifier, children: Seq[Expression]) extends Expression with Generator with Product with Serializable

Represents an unresolved generator, which will be created by the parser for the org.apache.spark.sql.catalyst.plans.logical.Generate operator.
Represents an unresolved generator, which will be created by the parser for the org.apache.spark.sql.catalyst.plans.logical.Generate operator. The analyzer will resolve this generator.
case class UnresolvedRelation(tableIdentifier: TableIdentifier, alias: Option[String] = None) extends LeafNode with Product with Serializable

Holds the name of a relation that has yet to be looked up in a catalog.
case class UnresolvedStar(target: Option[Seq[String]]) extends Star with Unevaluable with Product with Serializable

Represents all of the input attributes to a given relational operator, for example in "SELECT * FROM ...".
Represents all of the input attributes to a given relational operator, for example in "SELECT * FROM ...".
This is also used to expand structs. For example: "SELECT record.* from (SELECT struct(a,b,c) as record ...)
target
an optional name that should be the target of the expansion. If omitted all targets' columns are produced. This can either be a table name or struct name. This is a list of identifiers that is the path of the expansion.

Value Members

object Append extends OutputMode with Product with Serializable
object CleanupAliases extends Rule[LogicalPlan]

Cleans up unnecessary Aliases inside the plan.
Cleans up unnecessary Aliases inside the plan. Basically we only need Alias as a top level expression in Project(project list) or Aggregate(aggregate expressions) or Window(window expressions).
object DecimalPrecision extends Rule[LogicalPlan]

Calculates and propagates precision for fixed-precision decimals.
Calculates and propagates precision for fixed-precision decimals. Hive has a number of rules for this based on the SQL standard and MS SQL: https://cwiki.apache.org/confluence/download/attachments/27362075/Hive_Decimal_Precision_Scale_Support.pdf https://msdn.microsoft.com/en-us/library/ms190476.aspx
In particular, if we have expressions e1 and e2 with precision/scale p1/s2 and p2/s2 respectively, then the following operations have the following precision / scale:
Operation Result Precision Result Scale ------------------------------------------------------------------------ e1 + e2 max(s1, s2) + max(p1-s1, p2-s2) + 1 max(s1, s2) e1 - e2 max(s1, s2) + max(p1-s1, p2-s2) + 1 max(s1, s2) e1 * e2 p1 + p2 + 1 s1 + s2 e1 / e2 p1 - s1 + s2 + max(6, s1 + p2 + 1) max(6, s1 + p2 + 1) e1 % e2 min(p1-s1, p2-s2) + max(s1, s2) max(s1, s2) e1 union e2 max(s1, s2) + max(p1-s1, p2-s2) max(s1, s2) sum(e1) p1 + 10 s1 avg(e1) p1 + 4 s1 + 4
To implement the rules for fixed-precision types, we introduce casts to turn them to unlimited precision, do the math on unlimited-precision numbers, then introduce casts back to the required fixed precision. This allows us to do all rounding and overflow handling in the cast-to-fixed-precision operator.
In addition, when mixing non-decimal types with decimals, we use the following rules: - BYTE gets turned into DECIMAL(3, 0) - SHORT gets turned into DECIMAL(5, 0) - INT gets turned into DECIMAL(10, 0) - LONG gets turned into DECIMAL(20, 0) - FLOAT and DOUBLE cause fixed-length decimals to turn into DOUBLE
object DistinctAggregationRewriter extends Rule[LogicalPlan]

This rule rewrites an aggregate query with distinct aggregations into an expanded double aggregation in which the regular aggregation expressions and every distinct clause is aggregated in a separate group.
This rule rewrites an aggregate query with distinct aggregations into an expanded double aggregation in which the regular aggregation expressions and every distinct clause is aggregated in a separate group. The results are then combined in a second aggregate.
For example (in scala):
```
val data = Seq(
  ("a", "ca1", "cb1", 10),
  ("a", "ca1", "cb2", 5),
  ("b", "ca1", "cb1", 13))
  .toDF("key", "cat1", "cat2", "value")
data.createOrReplaceTempView("data")

val agg = data.groupBy($"key")
  .agg(
    countDistinct($"cat1").as("cat1_cnt"),
    countDistinct($"cat2").as("cat2_cnt"),
    sum($"value").as("total"))
```
This translates to the following (pseudo) logical plan:
```
Aggregate(
   key = ['key]
   functions = [COUNT(DISTINCT 'cat1),
                COUNT(DISTINCT 'cat2),
                sum('value)]
   output = ['key, 'cat1_cnt, 'cat2_cnt, 'total])
  LocalTableScan [...]
```
This rule rewrites this logical plan to the following (pseudo) logical plan:
```
Aggregate(
   key = ['key]
   functions = [count(if (('gid = 1)) 'cat1 else null),
                count(if (('gid = 2)) 'cat2 else null),
                first(if (('gid = 0)) 'total else null) ignore nulls]
   output = ['key, 'cat1_cnt, 'cat2_cnt, 'total])
  Aggregate(
     key = ['key, 'cat1, 'cat2, 'gid]
     functions = [sum('value)]
     output = ['key, 'cat1, 'cat2, 'gid, 'total])
    Expand(
       projections = [('key, null, null, 0, cast('value as bigint)),
                      ('key, 'cat1, null, 1, null),
                      ('key, null, 'cat2, 2, null)]
       output = ['key, 'cat1, 'cat2, 'gid, 'value])
      LocalTableScan [...]
```
The rule does the following things here: 1. Expand the data. There are three aggregation groups in this query:
1. the non-distinct group; ii. the distinct 'cat1 group; iii. the distinct 'cat2 group. An expand operator is inserted to expand the child data for each group. The expand will null out all unused columns for the given group; this must be done in order to ensure correctness later on. Groups can by identified by a group id (gid) column added by the expand operator. 2. De-duplicate the distinct paths and aggregate the non-aggregate path. The group by clause of this aggregate consists of the original group by clause, all the requested distinct columns and the group id. Both de-duplication of distinct column and the aggregation of the non-distinct group take advantage of the fact that we group by the group id (gid) and that we have nulled out all non-relevant columns the given group. 3. Aggregating the distinct groups and combining this with the results of the non-distinct aggregation. In this step we use the group id to filter the inputs for the aggregate functions. The result of the non-distinct group are 'aggregated' by using the first operator, it might be more elegant to use the native UDAF merge mechanism for this in the future.
This rule duplicates the input data by two or more times (# distinct groups + an optional non-distinct group). This will put quite a bit of memory pressure of the used aggregate and exchange operators. Keeping the number of distinct groups as low a possible should be priority, we could improve this in the current rule by applying more advanced expression canonicalization techniques.
object EliminateSubqueryAliases extends Rule[LogicalPlan]

Removes SubqueryAlias operators from the plan.
Removes SubqueryAlias operators from the plan. Subqueries are only required to provide scoping information for attributes and can be removed once analysis is complete.
object EliminateUnions extends Rule[LogicalPlan]

Removes Union operators from the plan if it just has one child.
object EmptyFunctionRegistry extends FunctionRegistry

A trivial catalog that returns an error when a function is requested.
A trivial catalog that returns an error when a function is requested. Used for testing when all functions are already filled in and the analyzer needs only to resolve attribute references.
object FunctionRegistry
object SimpleAnalyzer extends Analyzer

A trivial Analyzer with an dummy SessionCatalog and EmptyFunctionRegistry.
A trivial Analyzer with an dummy SessionCatalog and EmptyFunctionRegistry. Used for testing when all relations are already filled in and the analyzer needs only to resolve attribute references.
object TimeWindowing extends Rule[LogicalPlan]

Maps a time column to multiple time windows using the Expand operator.
Maps a time column to multiple time windows using the Expand operator. Since it's non-trivial to figure out how many windows a time column can map to, we over-estimate the number of windows and filter out the rows where the time column is not inside the time window.
object TypeCheckResult
object TypeCoercion

A collection of Rule that can be used to coerce differing types that participate in operations into compatible ones.
A collection of Rule that can be used to coerce differing types that participate in operations into compatible ones.
Notes about type widening / tightest common types: Broadly, there are two cases when we need to widen data types (e.g. union, binary comparison). In case 1, we are looking for a common data type for two or more data types, and in this case no loss of precision is allowed. Examples include type inference in JSON (e.g. what's the column's data type if one row is an integer while the other row is a long?). In case 2, we are looking for a widened data type with some acceptable loss of precision (e.g. there is no common type for double and decimal because double's range is larger than decimal, and yet decimal is more precise than double, but in union we would cast the decimal into double).
object UnresolvedAttribute extends Serializable
object UnresolvedFunction extends Serializable
object UnsupportedOperationChecker

Analyzes the presence of unsupported operations in a logical plan.
object Update extends OutputMode with Product with Serializable
val caseInsensitiveResolution: (String, String) ⇒ Boolean
val caseSensitiveResolution: (String, String) ⇒ Boolean
def withPosition[A](t: TreeNode[_])(f: ⇒ A): A

Catches any AnalysisExceptions thrown by f and attaches t's position if any.

package analysis

Type Members

implicit class AnalysisErrorAt extends AnyRef

class Analyzer extends RuleExecutor[LogicalPlan] with CheckAnalysis

trait CheckAnalysis extends PredicateHelper

class DatabaseAlreadyExistsException extends AnalysisException

class FunctionAlreadyExistsException extends AnalysisException

trait FunctionRegistry extends AnyRef

case class MultiAlias(child: Expression, names: Seq[String]) extends UnaryExpression with NamedExpression with CodegenFallback with Product with Serializable

trait MultiInstanceRelation extends AnyRef

class NoSuchDatabaseException extends AnalysisException

class NoSuchFunctionException extends AnalysisException

class NoSuchPartitionException extends AnalysisException

class NoSuchPartitionsException extends AnalysisException

class NoSuchPermanentFunctionException extends AnalysisException

class NoSuchTableException extends AnalysisException

class NoSuchTempFunctionException extends AnalysisException

sealed trait OutputMode extends AnyRef

class PartitionAlreadyExistsException extends AnalysisException

class PartitionsAlreadyExistException extends AnalysisException

case class ResolvedStar(expressions: Seq[NamedExpression]) extends Star with Unevaluable with Product with Serializable

type Resolver = (String, String) ⇒ Boolean

class SimpleFunctionRegistry extends FunctionRegistry

abstract class Star extends LeafExpression with NamedExpression

class TableAlreadyExistsException extends AnalysisException

class TempFunctionAlreadyExistsException extends AnalysisException

class TempTableAlreadyExistsException extends AnalysisException

trait TypeCheckResult extends AnyRef

case class UnresolvedAlias(child: Expression, aliasName: Option[String] = None) extends UnaryExpression with NamedExpression with Unevaluable with Product with Serializable

case class UnresolvedAttribute(nameParts: Seq[String]) extends Attribute with Unevaluable with Product with Serializable

case class UnresolvedDeserializer(deserializer: Expression, inputAttributes: Seq[Attribute] = Nil) extends UnaryExpression with Unevaluable with NonSQLExpression with Product with Serializable

class UnresolvedException[TreeType <: TreeNode[_]] extends TreeNodeException[TreeType]

case class UnresolvedExtractValue(child: Expression, extraction: Expression) extends UnaryExpression with Unevaluable with Product with Serializable

case class UnresolvedFunction(name: FunctionIdentifier, children: Seq[Expression], isDistinct: Boolean) extends Expression with Unevaluable with Product with Serializable

case class UnresolvedGenerator(name: FunctionIdentifier, children: Seq[Expression]) extends Expression with Generator with Product with Serializable

case class UnresolvedRelation(tableIdentifier: TableIdentifier, alias: Option[String] = None) extends LeafNode with Product with Serializable

case class UnresolvedStar(target: Option[Seq[String]]) extends Star with Unevaluable with Product with Serializable

Value Members

object Append extends OutputMode with Product with Serializable

object CleanupAliases extends Rule[LogicalPlan]

object DecimalPrecision extends Rule[LogicalPlan]

object DistinctAggregationRewriter extends Rule[LogicalPlan]

object EliminateSubqueryAliases extends Rule[LogicalPlan]

object EliminateUnions extends Rule[LogicalPlan]

object EmptyFunctionRegistry extends FunctionRegistry

object FunctionRegistry

object SimpleAnalyzer extends Analyzer

object TimeWindowing extends Rule[LogicalPlan]

object TypeCheckResult

object TypeCoercion

object UnresolvedAttribute extends Serializable

object UnresolvedFunction extends Serializable

object UnsupportedOperationChecker

object Update extends OutputMode with Product with Serializable

val caseInsensitiveResolution: (String, String) ⇒ Boolean

val caseSensitiveResolution: (String, String) ⇒ Boolean

def withPosition[A](t: TreeNode[_])(f: ⇒ A): A

Inherited from AnyRef

Inherited from Any

Ungrouped