the Spark StreamingContext
the SparkCassandraSettings from config
the demo data for a simple WordCount
Stops the ActorSystem, the Spark StreamingContext
and its underlying Spark system.
(nodeGuardian: StringAdd).self
(nodeGuardian: StringFormat).self
(nodeGuardian: ArrowAssoc[NodeGuardian]).x
(Since version 2.10.0) Use leftOfArrow
instead
(nodeGuardian: Ensuring[NodeGuardian]).x
(Since version 2.10.0) Use resultOfEnsuring
instead
The NodeGuardian actor is the root supervisor of this simple Akka application's ActorSystem node that you might deploy across a cluster.
Being an Akka supervisor actor, it would normally orchestrate its children and any fault tolerance policies. For a simple demo no policies are employed save that embedded, in the Akka actor API.
Demo data for a simple but classic WordCount:
The NodeGuardian spins up three child actors (not in this order):
1. Streamer A simple Akka actor which extends
com.datastax.spark.connector.streaming.TypedStreamingActor
and ultimately implements a SparkReceiver
. This simple receiver callsReceiver.pushBlock[T: ClassTag](data: T)
when messages of type
String
(for simplicity of a demo), are received. This would typically be data in some custom envelope of a Scala case class that is Serializable.2. Sender A simple Akka actor which generates a pre-set number of random tuples based on initial input
data
noted above, and sends each random tuple to the Streamer. The random messages are generated and sent to the stream every millisecond, with an initial wait of 2 milliseconds.3. Verification Calls the following on the
StreamingContext
(ssc) to know when the expected number of entries has been streamed to Spark, andscale
(the number of messages sent to the stream), computed, and saved to Cassandra:Where
data
represents the 3 words we computed, we assert the expected three columns were created: