A filter that evaluates to true
iff both left
or right
evaluate to true
.
A filter that evaluates to true
iff both left
or right
evaluate to true
.
1.3.0
::DeveloperApi:: Represents a collection of tuples with a known schema.
::DeveloperApi::
Represents a collection of tuples with a known schema. Classes that extend BaseRelation must
be able to produce the schema of their data in the form of a StructType. Concrete
implementation should inherit from one of the descendant Scan
classes, which define various
abstract methods for execution.
BaseRelations must also define an equality function that only returns true when the two instances will return the same data. This equality function is used when determining when it is safe to substitute cached results for a given relation.
1.3.0
::Experimental:: An interface for experimenting with a more direct connection to the query planner.
::Experimental:: An interface for experimenting with a more direct connection to the query planner. Compared to PrunedFilteredScan, this operator receives the raw expressions from the org.apache.spark.sql.catalyst.plans.logical.LogicalPlan. Unlike the other APIs this interface is NOT designed to be binary compatible across releases and thus should only be used for experimentation.
1.3.0
1.3.0
::DeveloperApi:: Data sources should implement this trait so that they can register an alias to their data source.
::DeveloperApi:: Data sources should implement this trait so that they can register an alias to their data source. This allows users to give the data source alias as the format type over the fully qualified class name.
A new instance of this class will be instantiated each time a DDL call is made.
1.5.0
Performs equality comparison, similar to EqualTo.
A filter that evaluates to true
iff the attribute evaluates to a value
equal to value
.
A filter that evaluates to true
iff the attribute evaluates to a value
equal to value
.
1.3.0
A filter predicate for data sources.
A filter predicate for data sources.
1.3.0
A filter that evaluates to true
iff the attribute evaluates to a value
greater than value
.
A filter that evaluates to true
iff the attribute evaluates to a value
greater than value
.
1.3.0
A filter that evaluates to true
iff the attribute evaluates to a value
greater than or equal to value
.
A filter that evaluates to true
iff the attribute evaluates to a value
greater than or equal to value
.
1.3.0
::Experimental:: A BaseRelation that provides much of the common code required for relations that store their data to an HDFS compatible filesystem.
::Experimental:: A BaseRelation that provides much of the common code required for relations that store their data to an HDFS compatible filesystem.
For the read path, similar to PrunedFilteredScan, it can eliminate unneeded columns and
filter using selected predicates before producing an RDD containing all matching tuples as
Row objects. In addition, when reading from Hive style partitioned tables stored in file
systems, it's able to discover partitioning information from the paths of input directories, and
perform partition pruning before start reading the data. Subclasses of HadoopFsRelation()
must override one of the four buildScan
methods to implement the read path.
For the write path, it provides the ability to write to both non-partitioned and partitioned tables. Directory layout of the partitioned tables is compatible with Hive.
1.4.0
::Experimental:: Implemented by objects that produce relations for a specific kind of data source with a given schema and partitioned columns.
::Experimental:: Implemented by objects that produce relations for a specific kind of data source with a given schema and partitioned columns. When Spark SQL is given a DDL operation with a USING clause specified (to specify the implemented HadoopFsRelationProvider), a user defined schema, and an optional list of partition columns, this interface is used to pass in the parameters specified by a user.
Users may specify the fully qualified class name of a given data source. When that class is
not found Spark SQL will append the class name DefaultSource
to the path, allowing for
less verbose invocation. For example, 'org.apache.spark.sql.json' would resolve to the
data source 'org.apache.spark.sql.json.DefaultSource'
A new instance of this class will be instantiated each time a DDL call is made.
The difference between a RelationProvider and a HadoopFsRelationProvider is that users need to provide a schema and a (possibly empty) list of partition columns when using a HadoopFsRelationProvider. A relation provider can inherits both RelationProvider, and HadoopFsRelationProvider if it can support schema inference, user-specified schemas, and accessing partitioned relations.
1.4.0
A filter that evaluates to true
iff the attribute evaluates to one of the values in the array.
A filter that evaluates to true
iff the attribute evaluates to one of the values in the array.
1.3.0
::DeveloperApi:: A BaseRelation that can be used to insert data into it through the insert method.
::DeveloperApi:: A BaseRelation that can be used to insert data into it through the insert method. If overwrite in insert method is true, the old data in the relation should be overwritten with the new data. If overwrite in insert method is false, the new data should be appended.
InsertableRelation has the following three assumptions. 1. It assumes that the data (Rows in the DataFrame) provided to the insert method exactly matches the ordinal of fields in the schema of the BaseRelation. 2. It assumes that the schema of this relation will not be changed. Even if the insert method updates the schema (e.g. a relation of JSON or Parquet data may have a schema update after an insert operation), the new schema will not be used. 3. It assumes that fields of the data provided in the insert method are nullable. If a data source needs to check the actual nullability of a field, it needs to do it in the insert method.
1.3.0
A filter that evaluates to true
iff the attribute evaluates to a non-null value.
A filter that evaluates to true
iff the attribute evaluates to a non-null value.
1.3.0
A filter that evaluates to true
iff the attribute evaluates to null.
A filter that evaluates to true
iff the attribute evaluates to null.
1.3.0
A filter that evaluates to true
iff the attribute evaluates to a value
less than value
.
A filter that evaluates to true
iff the attribute evaluates to a value
less than value
.
1.3.0
A filter that evaluates to true
iff the attribute evaluates to a value
less than or equal to value
.
A filter that evaluates to true
iff the attribute evaluates to a value
less than or equal to value
.
1.3.0
A filter that evaluates to true
iff child
is evaluated to false
.
A filter that evaluates to true
iff child
is evaluated to false
.
1.3.0
A filter that evaluates to true
iff at least one of left
or right
evaluates to true
.
A filter that evaluates to true
iff at least one of left
or right
evaluates to true
.
1.3.0
::Experimental:: OutputWriter is used together with HadoopFsRelation for persisting rows to the underlying file system.
::Experimental:: OutputWriter is used together with HadoopFsRelation for persisting rows to the underlying file system. Subclasses of OutputWriter must provide a zero-argument constructor. An OutputWriter instance is created and initialized when a new output file is opened on executor side. This instance is used to persist rows to this single output file.
1.4.0
::Experimental:: A factory that produces OutputWriters.
::Experimental:: A factory that produces OutputWriters. A new OutputWriterFactory is created on driver side for each write job issued when writing to a HadoopFsRelation, and then gets serialized to executor side to create actual OutputWriters on the fly.
1.4.0
::DeveloperApi:: A BaseRelation that can eliminate unneeded columns and filter using selected predicates before producing an RDD containing all matching tuples as Row objects.
::DeveloperApi:: A BaseRelation that can eliminate unneeded columns and filter using selected predicates before producing an RDD containing all matching tuples as Row objects.
The actual filter should be the conjunction of all filters
,
i.e. they should be "and" together.
The pushed down filters are currently purely an optimization as they will all be evaluated again. This means it is safe to use them with methods that produce false positives such as filtering partitions based on a bloom filter.
1.3.0
::DeveloperApi:: A BaseRelation that can eliminate unneeded columns before producing an RDD containing all of its tuples as Row objects.
::DeveloperApi:: A BaseRelation that can eliminate unneeded columns before producing an RDD containing all of its tuples as Row objects.
1.3.0
::DeveloperApi:: Implemented by objects that produce relations for a specific kind of data source.
::DeveloperApi:: Implemented by objects that produce relations for a specific kind of data source. When Spark SQL is given a DDL operation with a USING clause specified (to specify the implemented RelationProvider), this interface is used to pass in the parameters specified by a user.
Users may specify the fully qualified class name of a given data source. When that class is
not found Spark SQL will append the class name DefaultSource
to the path, allowing for
less verbose invocation. For example, 'org.apache.spark.sql.json' would resolve to the
data source 'org.apache.spark.sql.json.DefaultSource'
A new instance of this class will be instantiated each time a DDL call is made.
1.3.0
::DeveloperApi:: Implemented by objects that produce relations for a specific kind of data source with a given schema.
::DeveloperApi:: Implemented by objects that produce relations for a specific kind of data source with a given schema. When Spark SQL is given a DDL operation with a USING clause specified ( to specify the implemented SchemaRelationProvider) and a user defined schema, this interface is used to pass in the parameters specified by a user.
Users may specify the fully qualified class name of a given data source. When that class is
not found Spark SQL will append the class name DefaultSource
to the path, allowing for
less verbose invocation. For example, 'org.apache.spark.sql.json' would resolve to the
data source 'org.apache.spark.sql.json.DefaultSource'
A new instance of this class will be instantiated each time a DDL call is made.
The difference between a RelationProvider and a SchemaRelationProvider is that users need to provide a schema when using a SchemaRelationProvider. A relation provider can inherits both RelationProvider and SchemaRelationProvider if it can support both schema inference and user-specified schemas.
1.3.0
A filter that evaluates to true
iff the attribute evaluates to
a string that contains the string value
.
A filter that evaluates to true
iff the attribute evaluates to
a string that contains the string value
.
1.3.1
A filter that evaluates to true
iff the attribute evaluates to
a string that starts with value
.
A filter that evaluates to true
iff the attribute evaluates to
a string that starts with value
.
1.3.1
A filter that evaluates to true
iff the attribute evaluates to
a string that starts with value
.
A filter that evaluates to true
iff the attribute evaluates to
a string that starts with value
.
1.3.1
::DeveloperApi:: A BaseRelation that can produce all of its tuples as an RDD of Row objects.
::DeveloperApi:: A BaseRelation that can produce all of its tuples as an RDD of Row objects.
1.3.0
A set of APIs for adding data sources to Spark SQL.