public abstract class RDDAndDStreamCommonJavaFunctions<T>
extends java.lang.Object
RDD
or DStream
.Modifier and Type | Method and Description |
---|---|
void |
saveToCassandra(java.lang.String keyspace,
java.lang.String table)
Saves the data from the underlying
RDD or DStream to a Cassandra table. |
void |
saveToCassandra(java.lang.String keyspace,
java.lang.String table,
com.datastax.spark.connector.mapper.ColumnMapper<T> columnMapper)
Saves the data from the underlying
RDD or DStream to a Cassandra table. |
void |
saveToCassandra(java.lang.String keyspace,
java.lang.String table,
java.util.Map<java.lang.String,java.lang.String> columnNameOverride)
Saves the data from the underlying
RDD or DStream to a Cassandra table. |
abstract void |
saveToCassandra(java.lang.String keyspace,
java.lang.String table,
com.datastax.spark.connector.writer.RowWriterFactory<T> rowWriterFactory)
Saves the data from the underlying
RDD or DStream to a Cassandra table. |
void |
saveToCassandra(java.lang.String keyspace,
java.lang.String table,
java.lang.String[] columnNames)
Saves the data from the underlying
RDD or DStream to a Cassandra table. |
void |
saveToCassandra(java.lang.String keyspace,
java.lang.String table,
java.lang.String[] columnNames,
com.datastax.spark.connector.mapper.ColumnMapper<T> columnMapper)
Saves the data from the underlying
RDD or DStream to a Cassandra table. |
void |
saveToCassandra(java.lang.String keyspace,
java.lang.String table,
java.lang.String[] columnNames,
int batchSize)
Saves the data from the underlying
RDD or DStream to a Cassandra table in batches of the given size. |
void |
saveToCassandra(java.lang.String keyspace,
java.lang.String table,
java.lang.String[] columnNames,
int batchSize,
com.datastax.spark.connector.mapper.ColumnMapper<T> columnMapper)
Saves the data from the underlying
RDD or DStream to a Cassandra table in batches of given size. |
void |
saveToCassandra(java.lang.String keyspace,
java.lang.String table,
java.lang.String[] columnNames,
int batchSize,
java.util.Map<java.lang.String,java.lang.String> columnNameOverride)
Saves the data from the underlying
RDD or DStream to a Cassandra table in batches of given size. |
abstract void |
saveToCassandra(java.lang.String keyspace,
java.lang.String table,
java.lang.String[] columnNames,
int batchSize,
com.datastax.spark.connector.writer.RowWriterFactory<T> rowWriterFactory)
Saves the data from the underlying
RDD or DStream to a Cassandra table in batches of given size. |
void |
saveToCassandra(java.lang.String keyspace,
java.lang.String table,
java.lang.String[] columnNames,
java.util.Map<java.lang.String,java.lang.String> columnNameOverride)
Saves the data from the underlying
RDD or DStream to a Cassandra table. |
abstract void |
saveToCassandra(java.lang.String keyspace,
java.lang.String table,
java.lang.String[] columnNames,
com.datastax.spark.connector.writer.RowWriterFactory<T> rowWriterFactory)
Saves the data from the underlying
RDD or DStream to a Cassandra table. |
public abstract void saveToCassandra(java.lang.String keyspace, java.lang.String table, com.datastax.spark.connector.writer.RowWriterFactory<T> rowWriterFactory)
RDD
or DStream
to a Cassandra table.
This method works just like saveToCassandra(String, String)
.
It additionally allows the specification of a factory of custom RowWriter
.
By default, a factory of DefaultRowWriter
is used together with
an underlying JavaBeanColumnMapper
.
If the underlying data source is a DStream
, all generated RDDs will be saved
to Cassandra as if this method was called on each of them.saveToCassandra(String, String)
public abstract void saveToCassandra(java.lang.String keyspace, java.lang.String table, java.lang.String[] columnNames, com.datastax.spark.connector.writer.RowWriterFactory<T> rowWriterFactory)
RDD
or DStream
to a Cassandra table.
This method works just like saveToCassandra(String, String, String[])
.
It additionally allows the specification of a factory of custom RowWriter
.
By default, a factory of DefaultRowWriter
is used together with
an underlying JavaBeanColumnMapper
.
If the underlying data source is a DStream
, all generated RDDs will be saved
to Cassandra as if this method was called on each of them.public abstract void saveToCassandra(java.lang.String keyspace, java.lang.String table, java.lang.String[] columnNames, int batchSize, com.datastax.spark.connector.writer.RowWriterFactory<T> rowWriterFactory)
RDD
or DStream
to a Cassandra table in batches of given size.
This method works just like saveToCassandra(String, String, String[], int)
.
It additionally allows the specification of a factory of custom RowWriter
.
By default, a factory of DefaultRowWriter
is used together with
an underlying JavaBeanColumnMapper
.
If the underlying data source is a DStream
, all generated RDDs will be saved
to Cassandra as if this method was called on each of them.public void saveToCassandra(java.lang.String keyspace, java.lang.String table, com.datastax.spark.connector.mapper.ColumnMapper<T> columnMapper)
RDD
or DStream
to a Cassandra table.
This method works just like saveToCassandra(String, String)
.
It additionally allows the specification of a custom column mapper. By default,
JavaBeanColumnMapper
is used, which works
perfectly with Java bean-like classes.
If the underlying data source is a DStream
, all generated RDDs will be saved
to Cassandra as if this method was called on each of them.public void saveToCassandra(java.lang.String keyspace, java.lang.String table, java.lang.String[] columnNames, com.datastax.spark.connector.mapper.ColumnMapper<T> columnMapper)
RDD
or DStream
to a Cassandra table.
This method works just like saveToCassandra(String, String, String[])
.
It additionally allows the specification of a custom column mapper. By default,
JavaBeanColumnMapper
is used, which works
perfectly with Java bean-like classes.
If the underlying data source is a DStream
, all generated RDDs will be saved
to Cassandra as if this method was called on each of them.public void saveToCassandra(java.lang.String keyspace, java.lang.String table, java.lang.String[] columnNames, int batchSize, com.datastax.spark.connector.mapper.ColumnMapper<T> columnMapper)
RDD
or DStream
to a Cassandra table in batches of given size.
This method works just like saveToCassandra(String, String, String[], int)
.
It additionally allows the specification of a custom column mapper. By default,
JavaBeanColumnMapper
is used, which works
perfectly with Java bean-like classes.
If the underlying data source is a DStream
, all generated RDDs will be saved
to Cassandra as if this method was called on each of them.public void saveToCassandra(java.lang.String keyspace, java.lang.String table, java.util.Map<java.lang.String,java.lang.String> columnNameOverride)
RDD
or DStream
to a Cassandra table.
This method works just like saveToCassandra(String, String)
.
It additionally allows the specification of a custom RDD object property mapping to columns in Cassandra table.
If the underlying data source is a DStream
, all generated RDDs will be saved
to Cassandra as if this method was called on each of them.
Example:
CREATE KEYSPACE test WITH replication = {'class': 'SimpleStrategy', 'replication_factor': 1 }; CREATE TABLE test.words(word VARCHAR PRIMARY KEY, count INT, other VARCHAR); // all the Java classes which you want to use with Spark should be serializable public class WordCount implements Serializable { private String w; private Integer c; private String o; // constructors, setters, etc. public String getW() { return w; } public Integer getC() { return c; } public String getO() { return o; } } JavaSparkContext jsc = ... JavaRDDrdd = jsc.parallelize(Arrays.asList(new WordCount("foo", 5, "bar"))); Map mapping = new HashMap (3); mapping.put("w", "word"); mapping.put("c", "count"); mapping.put("o", "other"); CassandraJavaUtil.javaFunctions(rdd, WordCount.class).saveToCassandra("test", "words", mapping);
public void saveToCassandra(java.lang.String keyspace, java.lang.String table, java.lang.String[] columnNames, java.util.Map<java.lang.String,java.lang.String> columnNameOverride)
RDD
or DStream
to a Cassandra table.
This method works just like saveToCassandra(String, String, String[])
.
It additionally allows the specification of a custom RDD/DStream object property mapping to columns in Cassandra table.
If the underlying data source is a DStream
, all generated RDDs will be saved
to Cassandra as if this method was called on each of them.public void saveToCassandra(java.lang.String keyspace, java.lang.String table, java.lang.String[] columnNames, int batchSize, java.util.Map<java.lang.String,java.lang.String> columnNameOverride)
RDD
or DStream
to a Cassandra table in batches of given size.
This method works just like saveToCassandra(String, String, String[], int)
.
It additionally allows the specification of a custom RDD/DStream object property mapping to columns in Cassandra table.
If the underlying data source is a DStream
, all generated RDDs will be saved
to Cassandra as if this method was called on each of them.public void saveToCassandra(java.lang.String keyspace, java.lang.String table)
RDD
or DStream
to a Cassandra table.
Saves all properties that have corresponding Cassandra columns.
The underlying RDD class must provide data for all columns.
By default, writes are performed at ConsistencyLevel.ONE
in order to leverage data-locality
and minimize network traffic. This write consistency level is controlled by the following property:
- spark.cassandra.output.consistency.level
: consistency level for RDD writes, string matching
the ConsistencyLevel
enum name.
If the underlying data source is a DStream
, all generated RDDs will be saved
to Cassandra as if this method was called on each of them.
Example:
CREATE KEYSPACE test WITH replication = {'class': 'SimpleStrategy', 'replication_factor': 1 }; CREATE TABLE test.words(word VARCHAR PRIMARY KEY, count INT, other VARCHAR); // all the Java classes which you want to use with Spark should be serializable public class WordCount implements Serializable { private String word; private Integer count; private String other; // constructors, setters, etc. public String getWord() { return word; } public Integer getCount() { return count; } public String getOther() { return other; } } JavaSparkContext jsc = ... JavaRDDrdd = jsc.parallelize(Arrays.asList(new WordCount("foo", 5, "bar"))); CassandraJavaUtil.javaFunctions(rdd, WordCount.class).saveToCassandra("test", "words");
public void saveToCassandra(java.lang.String keyspace, java.lang.String table, java.lang.String[] columnNames)
RDD
or DStream
to a Cassandra table.
The RDD
object properties must match Cassandra table column names.
Non-selected property/column names are left unchanged in Cassandra.
All primary key columns must be selected.
By default, writes are performed at ConsistencyLevel.ONE
in order to leverage data-locality
and minimize network traffic. This write consistency level is controlled by the following property:
- spark.cassandra.output.consistency.level
: consistency level for RDD writes, string matching
the ConsistencyLevel
enum name.
If the underlying data source is a DStream
, all generated RDDs will be saved
to Cassandra as if this method was called on each of them.
Example:
CREATE KEYSPACE test WITH replication = {'class': 'SimpleStrategy', 'replication_factor': 1 }; CREATE TABLE test.words(word VARCHAR PRIMARY KEY, count INT, other VARCHAR); // all the Java classes which you want to use with Spark should be serializable public class WordCount implements Serializable { private String word; private Integer count; private String other; // constructors, setters, etc. public String getWord() { return word; } public Integer getCount() { return count; } public String getOther() { return other; } } JavaSparkContext jsc = ... JavaRDDrdd = jsc.parallelize(Arrays.asList(new WordCount("foo", 5, "bar"))); CassandraJavaUtil.javaFunctions(rdd, WordCount.class) .saveToCassandra("test", "words", new String[] {"word", "count"}); // will not save the "other" column
public void saveToCassandra(java.lang.String keyspace, java.lang.String table, java.lang.String[] columnNames, int batchSize)
RDD
or DStream
to a Cassandra table in batches of the given size.
Use this overload only if you find automatically tuned batch size doesn't result in optimal performance.
Larger batches raise memory use by temporary buffers and may incur
larger GC pressure on the server. Small batches would result in more roundtrips
and worse throughput. Typically sending a few kilobytes of data per every batch
is enough to achieve good performance.
By default, writes are performed at ConsistencyLevel.ONE in order to leverage data-locality and minimize
network traffic. This write consistency level is controlled by the following property:
- spark.cassandra.output.consistency.level
: consistency level for RDD writes, string matching the
ConsistencyLevel
enum name.
If the underlying data source is a DStream
, all generated RDDs will be saved
to Cassandra as if this method was called on each of them.