package connector
The root package of Cassandra connector for Apache Spark. Offers handy implicit conversions that add Cassandra-specific methods to SparkContext and RDD.
Call cassandraTable method on the SparkContext object to create a CassandraRDD exposing Cassandra tables as Spark RDDs.
Call RDDFunctions saveToCassandra
function on any RDD
to save distributed collection to a Cassandra table.
Example:
CREATE KEYSPACE test WITH replication = {'class': 'SimpleStrategy', 'replication_factor': 1 }; CREATE TABLE test.words (word text PRIMARY KEY, count int); INSERT INTO test.words(word, count) VALUES ("and", 50);
import com.datastax.spark.connector._ val sparkMasterHost = "127.0.0.1" val cassandraHost = "127.0.0.1" val keyspace = "test" val table = "words" // Tell Spark the address of one Cassandra node: val conf = new SparkConf(true).set("spark.cassandra.connection.host", cassandraHost) // Connect to the Spark cluster: val sc = new SparkContext("spark://" + sparkMasterHost + ":7077", "example", conf) // Read the table and print its contents: val rdd = sc.cassandraTable(keyspace, table) rdd.toArray().foreach(println) // Write two rows to the table: val col = sc.parallelize(Seq(("of", 1200), ("the", "863"))) col.saveToCassandra(keyspace, table) sc.stop()
- Alphabetic
- By Inheritance
- connector
- AnyRef
- Any
- Hide All
- Show All
- Public
- All
Type Members
- sealed trait BatchSize extends AnyRef
- case class BytesInBatch(batchSize: Int) extends BatchSize with Product with Serializable
-
final
class
CassandraRow extends ScalaGettableData with Serializable
Represents a single row fetched from Cassandra.
Represents a single row fetched from Cassandra. Offers getters to read individual fields by column name or column index. The getters try to convert value to desired type, whenever possible. Most of the column types can be converted to a
String
. For nullable columns, you should use thegetXXXOption
getters which convertnull
s toNone
values, otherwise aNullPointerException
would be thrown.All getters throw an exception if column name/index is not found. Column indexes start at 0.
If the value cannot be converted to desired type, com.datastax.spark.connector.types.TypeConversionException is thrown.
Recommended getters for Cassandra types:
-
ascii
:getString
,getStringOption
-bigint
:getLong
,getLongOption
-blob
:getBytes
,getBytesOption
-boolean
:getBool
,getBoolOption
-counter
:getLong
,getLongOption
-decimal
:getDecimal
,getDecimalOption
-double
:getDouble
,getDoubleOption
-float
:getFloat
,getFloatOption
-inet
:getInet
,getInetOption
-int
:getInt
,getIntOption
-text
:getString
,getStringOption
-timestamp
:getDate
,getDateOption
-timeuuid
:getUUID
,getUUIDOption
-uuid
:getUUID
,getUUIDOption
-varchar
:getString
,getStringOption
-varint
:getVarInt
,getVarIntOption
-list
:getList[T]
-set
:getSet[T]
-map
:getMap[K, V]
Collection getters
getList
,getSet
andgetMap
require to explicitly pass an appropriate item type:row.getList[String]("a_list") row.getList[Int]("a_list") row.getMap[Int, String]("a_map")
Generic
get
allows to automatically convert collections to other collection types. Supported containers: -scala.collection.immutable.List
-scala.collection.immutable.Set
-scala.collection.immutable.TreeSet
-scala.collection.immutable.Vector
-scala.collection.immutable.Map
-scala.collection.immutable.TreeMap
-scala.collection.Iterable
-scala.collection.IndexedSeq
-java.util.ArrayList
-java.util.HashSet
-java.util.HashMap
Example:
row.get[List[Int]]("a_list") row.get[Vector[Int]]("a_list") row.get[java.util.ArrayList[Int]]("a_list") row.get[TreeMap[Int, String]]("a_map")
Timestamps can be converted to other Date types by using generic
get
. Supported date types: - java.util.Date - java.sql.Date - org.joda.time.DateTime -
case class
CassandraRowMetadata(columnNames: IndexedSeq[String], resultSetColumnNames: Option[IndexedSeq[String]] = None, codecs: IndexedSeq[TypeCodec[AnyRef]] = null) extends Product with Serializable
All CassandraRows shared data
All CassandraRows shared data
- columnNames
row column names
- resultSetColumnNames
column names from java driver row result set, without connector aliases.
- codecs
cached java driver codecs to avoid registry lookups
- final class CassandraTableScanPairRDDFunctions[K, V] extends Serializable
- final class CassandraTableScanRDDFunctions[R] extends Serializable
-
sealed
trait
CollectionBehavior extends AnyRef
Insert behaviors for Collections.
-
case class
CollectionColumnName(columnName: String, alias: Option[String] = None, collectionBehavior: CollectionBehavior = CollectionOverwrite) extends ColumnRef with Product with Serializable
References a collection column by name with insert instructions
-
case class
ColumnName(columnName: String, alias: Option[String] = None) extends ColumnRef with Product with Serializable
References a column by name.
- implicit final class ColumnNameFunctions extends AnyVal
-
class
ColumnNotFoundException extends Exception
Thrown when the requested column does not exist in the result set.
-
sealed
trait
ColumnRef extends AnyRef
A column that can be selected from CQL results set by name.
- sealed trait ColumnSelector extends AnyRef
-
class
DataFrameFunctions extends Serializable
Provides Cassandra-specific methods on org.apache.spark.sql.DataFrame
-
case class
FunctionCallRef(columnName: String, actualParams: Seq[Either[ColumnRef, String]] = Seq.empty, alias: Option[String] = None) extends ColumnRef with Product with Serializable
References a function call *
- trait GettableByIndexData extends Serializable
- trait GettableData extends GettableByIndexData
- class PairRDDFunctions[K, V] extends Serializable
-
class
RDDFunctions[T] extends WritableToCassandra[T] with Serializable
Provides Cassandra-specific methods on RDD
- case class RowsInBatch(batchSize: Int) extends BatchSize with Product with Serializable
- trait ScalaGettableByIndexData extends GettableByIndexData
- trait ScalaGettableData extends ScalaGettableByIndexData with GettableData
- case class SomeColumns(columns: ColumnRef*) extends ColumnSelector with Product with Serializable
-
class
SparkContextFunctions extends Serializable
Provides Cassandra-specific methods on SparkContext
-
case class
TTL(columnName: String, alias: Option[String] = None) extends ColumnRef with Product with Serializable
References TTL of a column.
- final case class TupleValue(values: Any*) extends ScalaGettableByIndexData with Product with Serializable
- final case class UDTValue(columnNames: IndexedSeq[String], columnValues: IndexedSeq[AnyRef]) extends ScalaGettableData with Product with Serializable
-
case class
WriteTime(columnName: String, alias: Option[String] = None) extends ColumnRef with Product with Serializable
References write time of a column.
Value Members
- implicit def toCassandraTableScanFunctions[T](rdd: CassandraTableScanRDD[T]): CassandraTableScanRDDFunctions[T]
- implicit def toCassandraTableScanRDDPairFunctions[K, V](rdd: CassandraTableScanRDD[(K, V)]): CassandraTableScanPairRDDFunctions[K, V]
- implicit def toDataFrameFunctions(dataFrame: DataFrame): DataFrameFunctions
- implicit def toNamedColumnRef(columnName: String): ColumnName
- implicit def toPairRDDFunctions[K, V](rdd: RDD[(K, V)]): PairRDDFunctions[K, V]
- implicit def toRDDFunctions[T](rdd: RDD[T]): RDDFunctions[T]
- implicit def toSparkContextFunctions(sc: SparkContext): SparkContextFunctions
- object AllColumns extends ColumnSelector with Product with Serializable
- object BatchSize
- object CassandraRow extends Serializable
- object CassandraRowMetadata extends Serializable
- object CollectionAppend extends CollectionBehavior with Product with Serializable
- object CollectionOverwrite extends CollectionBehavior with Product with Serializable
- object CollectionPrepend extends CollectionBehavior with Product with Serializable
- object CollectionRemove extends CollectionBehavior with Product with Serializable
- object DocUtil
- object GettableData extends Serializable
- object PartitionKeyColumns extends ColumnSelector with Product with Serializable
- object PrimaryKeyColumns extends ColumnSelector with Product with Serializable
-
object
RowCountRef extends ColumnRef with Product with Serializable
References a row count value returned from SELECT count(*)
- object SomeColumns extends Serializable
- object TupleValue extends Serializable
- object UDTValue extends Serializable
Cassandra connector for Apache Spark. See documentation of package com.datastax.spark.connector.