connector

package connector

The root package of Cassandra connector for Apache Spark. Offers handy implicit conversions that add Cassandra-specific methods to SparkContext and RDD.

Call cassandraTable method on the SparkContext object to create a CassandraRDD exposing Cassandra tables as Spark RDDs.

Call RDDFunctions saveToCassandra function on any RDD to save distributed collection to a Cassandra table.

Example:

CREATE KEYSPACE test WITH replication = {'class': 'SimpleStrategy', 'replication_factor': 1 };
CREATE TABLE test.words (word text PRIMARY KEY, count int);
INSERT INTO test.words(word, count) VALUES ("and", 50);

import com.datastax.spark.connector._

val sparkMasterHost = "127.0.0.1"
val cassandraHost = "127.0.0.1"
val keyspace = "test"
val table = "words"

// Tell Spark the address of one Cassandra node:
val conf = new SparkConf(true).set("spark.cassandra.connection.host", cassandraHost)

// Connect to the Spark cluster:
val sc = new SparkContext("spark://" + sparkMasterHost + ":7077", "example", conf)

// Read the table and print its contents:
val rdd = sc.cassandraTable(keyspace, table)
rdd.toArray().foreach(println)

// Write two rows to the table:
val col = sc.parallelize(Seq(("of", 1200), ("the", "863")))
col.saveToCassandra(keyspace, table)

sc.stop()

Linear Supertypes

AnyRef, Any

Ordering

Alphabetic
By Inheritance

Inherited

connector
AnyRef
Any

Hide All
Show All

Visibility

Public
All

Type Members

sealed trait BatchSize extends AnyRef
case class BytesInBatch(batchSize: Int) extends BatchSize with Product with Serializable
final class CassandraRow extends ScalaGettableData with Serializable
Represents a single row fetched from Cassandra.
Represents a single row fetched from Cassandra. Offers getters to read individual fields by column name or column index. The getters try to convert value to desired type, whenever possible. Most of the column types can be converted to a String. For nullable columns, you should use the getXXXOption getters which convert nulls to None values, otherwise a NullPointerException would be thrown.
All getters throw an exception if column name/index is not found. Column indexes start at 0.
If the value cannot be converted to desired type, com.datastax.spark.connector.types.TypeConversionException is thrown.
Recommended getters for Cassandra types:
- ascii: getString, getStringOption - bigint: getLong, getLongOption - blob: getBytes, getBytesOption - boolean: getBool, getBoolOption - counter: getLong, getLongOption - decimal: getDecimal, getDecimalOption - double: getDouble, getDoubleOption - float: getFloat, getFloatOption - inet: getInet, getInetOption - int: getInt, getIntOption - text: getString, getStringOption - timestamp: getDate, getDateOption - timeuuid: getUUID, getUUIDOption - uuid: getUUID, getUUIDOption - varchar: getString, getStringOption - varint: getVarInt, getVarIntOption - list: getList[T] - set: getSet[T] - map: getMap[K, V]
Collection getters getList, getSet and getMap require to explicitly pass an appropriate item type:
```
row.getList[String]("a_list")
row.getList[Int]("a_list")
row.getMap[Int, String]("a_map")
```
Generic get allows to automatically convert collections to other collection types. Supported containers: - scala.collection.immutable.List - scala.collection.immutable.Set - scala.collection.immutable.TreeSet - scala.collection.immutable.Vector - scala.collection.immutable.Map - scala.collection.immutable.TreeMap - scala.collection.Iterable - scala.collection.IndexedSeq - java.util.ArrayList - java.util.HashSet - java.util.HashMap
Example:
```
row.get[List[Int]]("a_list")
row.get[Vector[Int]]("a_list")
row.get[java.util.ArrayList[Int]]("a_list")
row.get[TreeMap[Int, String]]("a_map")
```
Timestamps can be converted to other Date types by using generic get. Supported date types: - java.util.Date - java.sql.Date - org.joda.time.DateTime
case class CassandraRowMetadata(columnNames: IndexedSeq[String], resultSetColumnNames: Option[IndexedSeq[String]] = None, codecs: IndexedSeq[TypeCodec[AnyRef]] = null) extends Product with Serializable
All CassandraRows shared data
All CassandraRows shared data
columnNames
row column names
resultSetColumnNames
column names from java driver row result set, without connector aliases.
codecs
cached java driver codecs to avoid registry lookups
final class CassandraTableScanPairRDDFunctions[K, V] extends Serializable
final class CassandraTableScanRDDFunctions[R] extends Serializable
sealed trait CollectionBehavior extends AnyRef
Insert behaviors for Collections.
case class CollectionColumnName(columnName: String, alias: Option[String] = None, collectionBehavior: CollectionBehavior = CollectionOverwrite) extends ColumnRef with Product with Serializable
References a collection column by name with insert instructions
case class ColumnName(columnName: String, alias: Option[String] = None) extends ColumnRef with Product with Serializable
References a column by name.
implicit final class ColumnNameFunctions extends AnyVal
class ColumnNotFoundException extends Exception
Thrown when the requested column does not exist in the result set.
sealed trait ColumnRef extends AnyRef
A column that can be selected from CQL results set by name.
sealed trait ColumnSelector extends AnyRef
class DataFrameFunctions extends Serializable
Provides Cassandra-specific methods on org.apache.spark.sql.DataFrame
case class FunctionCallRef(columnName: String, actualParams: Seq[Either[ColumnRef, String]] = Seq.empty, alias: Option[String] = None) extends ColumnRef with Product with Serializable
References a function call *
trait GettableByIndexData extends Serializable
trait GettableData extends GettableByIndexData
class PairRDDFunctions[K, V] extends Serializable
class RDDFunctions[T] extends WritableToCassandra[T] with Serializable
Provides Cassandra-specific methods on RDD
case class RowsInBatch(batchSize: Int) extends BatchSize with Product with Serializable
trait ScalaGettableByIndexData extends GettableByIndexData
trait ScalaGettableData extends ScalaGettableByIndexData with GettableData
case class SomeColumns(columns: ColumnRef*) extends ColumnSelector with Product with Serializable
class SparkContextFunctions extends Serializable
Provides Cassandra-specific methods on SparkContext
case class TTL(columnName: String, alias: Option[String] = None) extends ColumnRef with Product with Serializable
References TTL of a column.
final case class TupleValue(values: Any*) extends ScalaGettableByIndexData with Product with Serializable
final case class UDTValue(columnNames: IndexedSeq[String], columnValues: IndexedSeq[AnyRef]) extends ScalaGettableData with Product with Serializable
case class WriteTime(columnName: String, alias: Option[String] = None) extends ColumnRef with Product with Serializable
References write time of a column.

Value Members

implicit def toCassandraTableScanFunctions[T](rdd: CassandraTableScanRDD[T]): CassandraTableScanRDDFunctions[T]
implicit def toCassandraTableScanRDDPairFunctions[K, V](rdd: CassandraTableScanRDD[(K, V)]): CassandraTableScanPairRDDFunctions[K, V]
implicit def toDataFrameFunctions(dataFrame: DataFrame): DataFrameFunctions
implicit def toNamedColumnRef(columnName: String): ColumnName
implicit def toPairRDDFunctions[K, V](rdd: RDD[(K, V)]): PairRDDFunctions[K, V]
implicit def toRDDFunctions[T](rdd: RDD[T]): RDDFunctions[T]
implicit def toSparkContextFunctions(sc: SparkContext): SparkContextFunctions
object AllColumns extends ColumnSelector with Product with Serializable
object BatchSize
object CassandraRow extends Serializable
object CassandraRowMetadata extends Serializable
object CollectionAppend extends CollectionBehavior with Product with Serializable
object CollectionOverwrite extends CollectionBehavior with Product with Serializable
object CollectionPrepend extends CollectionBehavior with Product with Serializable
object CollectionRemove extends CollectionBehavior with Product with Serializable
object DocUtil
object GettableData extends Serializable
object PartitionKeyColumns extends ColumnSelector with Product with Serializable
object PrimaryKeyColumns extends ColumnSelector with Product with Serializable
object RowCountRef extends ColumnRef with Product with Serializable
References a row count value returned from SELECT count(*)
object SomeColumns extends Serializable
object TupleValue extends Serializable
object UDTValue extends Serializable

Packages

connector

package connector

Type Members

Value Members

Inherited from AnyRef

Inherited from Any

Ungrouped

Packages

connector 

package connector

Type Members

Value Members

Inherited from AnyRef

Inherited from Any

Ungrouped

connector