Maps each row into object of a different type using provided function taking column value(s) as argument(s).
Maps each row into object of a different type using provided function taking column value(s) as argument(s). Can be used to convert each row to a tuple or a case class object:
sc.cassandraTable("ks", "table").select("column1").as((s: String) => s) // yields CassandraRDD[String] sc.cassandraTable("ks", "table").select("column1", "column2").as((_: String, _: Long)) // yields CassandraRDD[(String, Long)] case class MyRow(key: String, value: Long) sc.cassandraTable("ks", "table").select("column1", "column2").as(MyRow) // yields CassandraRDD[MyRow]
How many rows are fetched at once from server
How many rows are fetched at once from server
consistency level for reads
consistency level for reads
Saves the data from RDD
to a Cassandra table.
Saves the data from RDD
to a Cassandra table. Uses the specified column names with an additional batch size.
the name of the Keyspace to use
the name of the Table to use
The batch size. By default, if the batch size is unspecified, the right amount is calculated automatically according the average row size. Specify explicit value here only if you find automatically tuned batch size doesn't result in optimal performance. Larger batches raise memory use by temporary buffers and may incur larger GC pressure on the server. Small batches would result in more round trips and worse throughput. Typically sending a few kilobytes of data per every batch is enough to achieve good performance.
Saves the data from RDD
to a Cassandra table.
Saves the data from RDD
to a Cassandra table. Uses the specified column names.
the name of the Keyspace to use
the name of the Table to use
Saves the data from RDD
to a Cassandra table.
Saves the data from RDD
to a Cassandra table.
rdd.saveToCassandra(AllColumns("test", "words"))
the name of the Keyspace to use
the name of the Table to use
Narrows down the selected set of columns.
Narrows down the selected set of columns.
Use this for better performance, when you don't need all the columns in the result RDD.
When called multiple times, it selects the subset of the already selected columns, so
after a column was removed by the previous select
call, it is not possible to
add it back.
Returns the names of columns to be selected from the table.
Returns the names of columns to be selected from the table.
How many rows to fetch in a single Spark Task.
How many rows to fetch in a single Spark Task.
Adds a CQL WHERE
predicate(s) to the query.
Adds a CQL WHERE
predicate(s) to the query.
Useful for leveraging secondary indexes in Cassandra.
Implicitly adds an ALLOW FILTERING
clause to the WHERE clause, however beware that some predicates
might be rejected by Cassandra, particularly in cases when they filter on an unindexed, non-clustering column.
(cassandraStreamingRDD: StringAdd).self
(cassandraStreamingRDD: StringFormat).self
(Since version 0.7.0) use mapPartitionsWithIndex
(cassandraStreamingRDD: ArrowAssoc[CassandraStreamingRDD[R]]).x
(Since version 2.10.0) Use leftOfArrow
instead
(cassandraStreamingRDD: Ensuring[CassandraStreamingRDD[R]]).x
(Since version 2.10.0) Use resultOfEnsuring
instead
RDD representing a Cassandra table for Spark Streaming.
com.datastax.spark.connector.rdd.CassandraRDD