Maps each row into object of a different type using provided function taking column value(s) as argument(s).
Maps each row into object of a different type using provided function taking column value(s) as argument(s). Can be used to convert each row to a tuple or a case class object:
sc.cassandraTable("ks", "table").select("column1").as((s: String) => s) // yields CassandraRDD[String] sc.cassandraTable("ks", "table").select("column1", "column2").as((_: String, _: Long)) // yields CassandraRDD[(String, Long)] case class MyRow(key: String, value: Long) sc.cassandraTable("ks", "table").select("column1", "column2").as(MyRow) // yields CassandraRDD[MyRow]
Get the ClassLoader which loaded Spark.
Get the ClassLoader which loaded Spark.
Saves the data from RDD
to a Cassandra table.
Saves the data from RDD
to a Cassandra table. Uses the specified column names.
the name of the Keyspace to use
the name of the Table to use
additional configuration object allowing to set consistency level, batch size, etc.
Narrows down the selected set of columns.
Narrows down the selected set of columns.
Use this for better performance, when you don't need all the columns in the result RDD.
When called multiple times, it selects the subset of the already selected columns, so
after a column was removed by the previous select
call, it is not possible to
add it back.
The selected columns are NamedColumnRef instances. This type allows to specify columns for
straightforward retrieval and to read TTL or write time of regular columns as well. Implicit
conversions included in com.datastax.spark.connector package make it possible to provide
just column names (which is also backward compatible) and optional add .ttl
or .writeTime
suffix in order to create an appropriate NamedColumnRef instance.
Returns the names of columns to be selected from the table.
Returns the names of columns to be selected from the table.
Applies a function to each item, and groups consecutive items having the same value together.
Applies a function to each item, and groups consecutive items having the same value together.
Contrary to groupBy
, items from the same group must be already next to each other in the
original collection. Works locally on each partition, so items from different
partitions will never be placed in the same group.
Groups items with the same key, assuming the items with the same key are next to each other in the collection.
Groups items with the same key, assuming the items with the same key are next to each other
in the collection. It does not perform shuffle, therefore it is much faster than using
much more universal Spark RDD groupByKey
. For this method to be useful with Cassandra tables,
the key must represent a prefix of the primary key, containing at least the partition key of the
Cassandra table.
Adds a CQL WHERE
predicate(s) to the query.
Adds a CQL WHERE
predicate(s) to the query.
Useful for leveraging secondary indexes in Cassandra.
Implicitly adds an ALLOW FILTERING
clause to the WHERE clause, however beware that some predicates
might be rejected by Cassandra, particularly in cases when they filter on an unindexed, non-clustering column.
Returns a copy of this Cassandra RDD with specified connector
Returns a copy of this Cassandra RDD with specified connector
Allows to set custom read configuration, e.
Allows to set custom read configuration, e.g. consistency level or fetch size.
(cassandraStreamingRDD: StringAdd).self
(cassandraStreamingRDD: StringFormat).self
(cassandraStreamingRDD: RDDFunctions[R]).sparkContext
(Since version 1.0.0) use mapPartitionsWithIndex and filter
(Since version 1.0.0) use mapPartitionsWithIndex and flatMap
(Since version 1.0.0) use mapPartitionsWithIndex and foreach
(Since version 1.2.0) use TaskContext.get
(Since version 0.7.0) use mapPartitionsWithIndex
(Since version 1.0.0) use mapPartitionsWithIndex
(Since version 1.0.0) use collect
(cassandraStreamingRDD: ArrowAssoc[CassandraStreamingRDD[R]]).x
(Since version 2.10.0) Use leftOfArrow
instead
(cassandraStreamingRDD: Ensuring[CassandraStreamingRDD[R]]).x
(Since version 2.10.0) Use resultOfEnsuring
instead
RDD representing a Cassandra table for Spark Streaming.
com.datastax.spark.connector.rdd.CassandraRDD