Extends QueryExecution with hive specific features.
Analyzes the given table in the current database to generate statistics, which will be used in query optimizations.
Analyzes the given table in the current database to generate statistics, which will be used in query optimizations.
Right now, it only supports Hive tables and it only updates the size of a Hive table in the Hive metastore.
1.2.0
Overridden by child classes that need to set configuration before the client init.
Overridden by child classes that need to set configuration before the client init.
When true, a table created by a Hive CTAS statement (no USING clause) will be converted to a data source table, using the data source set by spark.
When true, a table created by a Hive CTAS statement (no USING clause) will be converted to a data source table, using the data source set by spark.sql.sources.default. The table in CTAS statement will be converted when it meets any of the following conditions:
When true, enables an experimental feature where metastore tables that use the parquet SerDe are automatically converted to use the Spark SQL parquet table scan, instead of the Hive SerDe.
When true, enables an experimental feature where metastore tables that use the parquet SerDe are automatically converted to use the Spark SQL parquet table scan, instead of the Hive SerDe.
When true, also tries to merge possibly different but compatible Parquet schemas in different Parquet data files.
When true, also tries to merge possibly different but compatible Parquet schemas in different Parquet data files.
This configuration is only effective when "spark.sql.hive.convertMetastoreParquet" is true.
The copy of the hive client that is used for execution.
The copy of the hive client that is used for execution. Currently this must always be Hive 13 as this is the version of Hive that is packaged with Spark SQL. This copy of the client is used for execution related tasks like registering temporary functions or ensuring that the ThreadLocal SessionState is correctly populated. This copy of Hive is *not* used for storing persistent metadata, and only point to a dummy metastore in a temporary directory.
A comma separated list of class prefixes that should explicitly be reloaded for each version of Hive that Spark SQL is communicating with.
A comma separated list of class prefixes that should explicitly be reloaded for each version of Hive that Spark SQL is communicating with. For example, Hive UDFs that are declared in a prefix that typically would be shared (i.e. org.apache.spark.*)
The location of the jars that should be used to instantiate the HiveMetastoreClient.
The location of the jars that should be used to instantiate the HiveMetastoreClient. This property can be one of three options:
A comma separated list of class prefixes that should be loaded using the classloader that is shared between Spark SQL and a specific version of Hive.
A comma separated list of class prefixes that should be loaded using the classloader that is shared between Spark SQL and a specific version of Hive. An example of classes that should be shared is JDBC drivers that are needed to talk to the metastore. Other classes that need to be shared are those that interact with classes that are already shared. For example, custom appenders that are used by log4j.
The version of the hive client that will be used to communicate with the metastore.
The version of the hive client that will be used to communicate with the metastore. Note that this does not necessarily need to be the same version of Hive that is used internally by Spark SQL for execution.
The copy of the Hive client that is used to retrieve metadata from the Hive MetaStore.
The copy of the Hive client that is used to retrieve metadata from the Hive MetaStore. The version of the Hive client that is used here must match the metastore that is configured in the hive-site.xml file.
Invalidate and refresh all the cached the metadata of the given table.
Invalidate and refresh all the cached the metadata of the given table. For performance reasons, Spark SQL or the external data source library it uses might cache certain metadata about a table, such as the location of blocks. When those change outside of Spark SQL, users should call this function to invalidate the cache.
1.3.0
(Since version 1.3.0) use createDataFrame
(Since version 1.3.0) use createDataFrame
(Since version 1.3.0) use createDataFrame
(Since version 1.3.0) use createDataFrame
(Since version 1.4.0) use read.jdbc()
(Since version 1.4.0) use read.jdbc()
(Since version 1.4.0) use read.jdbc()
(Since version 1.4.0) Use read.json()
(Since version 1.4.0) Use read.json()
(Since version 1.4.0) Use read.json()
(Since version 1.4.0) Use read.json()
(Since version 1.4.0) Use read.json()
(Since version 1.4.0) Use read.json()
(Since version 1.4.0) Use read.json()
(Since version 1.4.0) Use read.json()
(Since version 1.4.0) Use read.json()
(Since version 1.4.0) Use read.format(source).schema(schema).options(options).load()
(Since version 1.4.0) Use read.format(source).schema(schema).options(options).load()
(Since version 1.4.0) Use read.format(source).options(options).load()
(Since version 1.4.0) Use read.format(source).options(options).load()
(Since version 1.4.0) Use read.format(source).load(path)
(Since version 1.4.0) Use read.load(path)
(Since version 1.4.0) Use read.parquet()
An instance of the Spark SQL execution engine that integrates with data stored in Hive. Configuration for Hive is read from hive-site.xml on the classpath.
1.0.0