Optimized cast for a column in a row to double.
Cast a given column in a schema to epoch time in long milliseconds.
A relation having a parent-child relationship with a base relation.
A relation having a parent-child relationship with a base relation.
::DeveloperApi:: Implemented by objects that produce relations for a specific kind of data source with a given schema.
::DeveloperApi:: Implemented by objects that produce relations for a specific kind of data source with a given schema. When Spark SQL is given a DDL operation with a USING clause specified (to specify the implemented SchemaRelationProvider) and a user defined schema, this interface is used to pass in the parameters specified by a user.
Users may specify the fully qualified class name of a given data source.
When that class is not found Spark SQL will append the class name
DefaultSource
to the path, allowing for less verbose invocation.
For example, 'org.apache.spark.sql.json' would resolve to the data source
'org.apache.spark.sql.json.DefaultSource'.
A new instance of this class with be instantiated each time a DDL call is made.
The difference between a SchemaRelationProvider and an ExternalSchemaRelationProvider is that latter accepts schema and other clauses in DDL string and passes over to the backend as is, while the schema specified for former is parsed by Spark SQL. A relation provider can inherit both SchemaRelationProvider and ExternalSchemaRelationProvider if it can support both Spark SQL schema and backend-specific schema.
Unlike Spark's InsertIntoTable this plan provides the count of rows inserted as the output.
Some extensions to JdbcDialect
used by Snappy implementation.
Trait to apply different join order policies like Replicates with filters first, then largest colocated group, and finally non-colocated with filters, if any.
Trait to apply different join order policies like Replicates with filters first, then largest colocated group, and finally non-colocated with filters, if any.
One can change the ordering policies as part of query hints and later can be admin provided externally against a regex based query pattern.
e.g. select * from /*+ joinOrder(replicates+filters, non-colocated+filters) */ table1, table2 where ....
note: I think this should be at the query level instead of per select scope i.e. something like /*+ joinOrder(replicates+filters, non-colocated+filters) */ select * from tab1, (select xx from tab2, tab3 where ... ), tab4 where ...
::DeveloperApi
::DeveloperApi
API for updates and deletes to a relation.
A relation having a parent-child relationship with one or more
DependentRelation
s as children.
A relation having a parent-child relationship with one or more
DependentRelation
s as children.
::DeveloperApi:: A BaseRelation that can eliminate unneeded columns and filter using selected predicates before producing an RDD containing all matching tuples as Unsafe Row objects.
::DeveloperApi:: A BaseRelation that can eliminate unneeded columns and filter using selected predicates before producing an RDD containing all matching tuples as Unsafe Row objects.
The actual filter should be the conjunction of all filters
,
i.e. they should be "and" together.
The pushed down filters are currently purely an optimization as they will all be evaluated again. This means it is safe to use them with methods that produce false positives such as filtering partitions based on a bloom filter.
1.3.0
Table to table or Table to index replacement.
A set of possible replacements of table to indexes.
A set of possible replacements of table to indexes.
Note: The chain if consists of multiple partitioned tables, they must satisfy
colocation criteria.
Multiple replacements.
user provided join + filter conditions.
Replace table with index if colocation criteria is satisfied.
Replace table with index hint
::DeveloperApi
::DeveloperApi
An extension to InsertableRelation
that allows for data to be
inserted (possibily having different schema) into the target relation after
comparing against the result of insertSchema
.
A class for tracking the statistics of a set of numbers (count, mean and variance) in a numerically robust way.
A class for tracking the statistics of a set of numbers (count, mean and variance) in a numerically robust way. Includes support for merging two StatVarianceCounters.
Taken from Spark's StatCounter implementation removing max and min.
Simply assemble rest of the tables as per user defined join order.
Pick the current colocated group and put tables with filters with the currently built plan.
This doesn't require any alteration to joinOrder as such.
Tracks the child DependentRelation
s for all
ParentRelation
s.
Tracks the child DependentRelation
s for all
ParentRelation
s. This is an optimization for faster access
to avoid scanning the entire catalog.
This we have to copy from spark patterns.scala because we want handle single table with filters as well.
This we have to copy from spark patterns.scala because we want handle single table with filters as well.
This will have another advantage later if we decide to move our rule to the last instead of injecting just after ReorderJoin, whereby additional nodes like Project requires handling.
This hint too doesn't require any implementation as such.
Put rest of the colocated table joins after applying ColocatedWithFilters.
Tables considered non-colocated according to currentColocatedGroup with Filters are put into join condition.
Put replicated tables with filters first.
Put replicated tables with filters first. If we find only one replicated table with filter, we try that with largest colocated group.
Support for DML and other operations on external tables.