Creates a MemSQL table with a schema matching the provided org.apache.spark.sql.DataFrame and loads the data into it.
Creates a MemSQL table with a schema matching the provided org.apache.spark.sql.DataFrame and loads the data into it.
If dbHost, dbPort, user and password are not specified, the com.memsql.spark.context.MemSQLContext will determine where each partition's data is sent. If the Spark executors are colocated with writable MemSQL nodes, then each Spark partition will insert into a randomly chosen colocated writable MemSQL node. If the Spark executors are not colocated with writable MemSQL nodes, Spark partitions will insert writable MemSQL nodes round robin.
The name of the database.
The name of the table.
The host of the database.
The port of the database.
The user for the database.
The password for the database.
Use CREATE TABLE IF NOT EXISTS
A scala.List of com.memsql.spark.connector.dataframe.MemSQLKey specifications to add to the
CREATE TABLE
statement.
A scala.List of com.memsql.spark.connector.dataframe.MemSQLExtraColumn specifications to
add to the CREATE TABLE
statement.
If set, data is loaded directly into leaf partitions. Can increase performance at the expense of higher variance sharding.
A org.apache.spark.sql.DataFrame containing the schema and inserted rows in MemSQL.
Creates a MemSQL table with a schema matching the provided org.apache.spark.sql.DataFrame.
Creates a MemSQL table with a schema matching the provided org.apache.spark.sql.DataFrame.
The name of the database.
The name of the table.
The master aggregator host.
The master aggregator port.
The user for the database.
The password for the database.
Use CREATE TABLE IF NOT EXISTS
A scala.List of com.memsql.spark.connector.dataframe.MemSQLKey specifications to add to the
CREATE TABLE
statement.
A scala.List of com.memsql.spark.connector.dataframe.MemSQLExtraColumn specifications to
add to the CREATE TABLE
statement.
A org.apache.spark.sql.DataFrame containing the schema and inserted rows in MemSQL.
Saves a Spark org.apache.spark.sql.DataFrame to a MemSQL table with the same column names.
Saves a Spark org.apache.spark.sql.DataFrame to a MemSQL table with the same column names.
If dbHost, dbPort, user and password are not specified, the com.memsql.spark.context.MemSQLContext will determine where each partition's data is sent. If the Spark executors are colocated with writable MemSQL nodes, then each Spark partition will insert into a randomly chosen colocated writable MemSQL node. If the Spark executors are not colocated with writable MemSQL nodes, Spark partitions will insert writable MemSQL nodes round robin.
The name of the database.
The name of the table.
The host of the database.
The port of the database.
The user for the database.
The password for the database.
How to handle duplicate key errors when inserting rows. If this is OnDupKeyBehavior.Replace, we will replace existing rows with the ones in rdd. If this is OnDupKeyBehavior.Ignore, we will leave existing rows as they are. If this is Update, we will use the SQL code in onDuplicateKeySql. If this is None, we will throw an error if there are any duplicate key errors.
Optional SQL to include in the "ON DUPLICATE KEY UPDATE" clause of the INSERT queries we generate. If this is a non-empty string, onDuplicateKeyBehavior must be OnDupKeyBehavior.Update.
How many rows to insert per INSERT query. Has no effect if onDuplicateKeySql is not specified.
If set, data is loaded directly into leaf partitions. Can increase performance at the expense of higher variance sharding.
The number of rows inserted into MemSQL.