structuredstreaming

Type Members

case class User(is: Long, name: String, age: Int, _eventType: Int) extends Product with Serializable

Value Members

object CDCExample

An example explaining CDC (change data capture) use case with SnappyData streaming sink.
An example explaining CDC (change data capture) use case with SnappyData streaming sink.
For CDC use case following two conditions should match: 1) The target table must be defined with key columns (for column tables) or primary keys ( for row table). 2) The input dataset must have an numeric column with name _eventType indicating type of the event. The value of this column is mapped with event type in the following manner:
0 - insert 1 - putInto 2 - delete
Based on the key values in the incoming dataset and the value of _eventType column the sink will decide which operation need to be performed for each record.
To run this on your local machine, you need to first run a Netcat server: $ nc -lk 9999
Example input data. Note that the last value from CSV record indicates the _eventType:
1,user1,23,0 2,user2,45,0 1,user1,23,2 2,user2,46,1
To run the example in local mode go to your SnappyData product distribution directory and execute the following command:
bin/run-example snappydata.structuredstreaming.CDCExample
object CSVFileSourceExampleWithSnappySink extends internal.Logging

An example of structured streaming depicting CSV file processing with Snappy sink.
An example of structured streaming depicting CSV file processing with Snappy sink.
Example input data:
Yin,31,Columbus,Ohio Michael,38,"San Jose",California
Usage: CSVFileSourceExampleWithSnappySink [checkpoint-directory] [input-directory]
[checkpoint-directory] Optional argument providing checkpoint directory where the state of the steaming query will be stored. Note that this directory needs to be deleted manually to reset the state of the streaming query. Default: CSVFileSourceExampleWithSnappySink directory in working directory. [input-directory] Optional argument pointing to input directory path where incoming CSV files should be dumped to get picked up for processing. Default: people.csv directory under resources
Example: $ bin/run-example snappydata.structuredstreaming.CSVFileSourceExampleWithSnappySink \ "checkpoint_dir" "CSV_input_dir"
object CSVKafkaSourceExampleWithSnappySink extends internal.Logging

An example of structured streaming depicting processing of JSON coming from kafka source using snappy sink.
An example of structured streaming depicting processing of JSON coming from kafka source using snappy sink.
Example input data:
Key: USA, Value: Yin,31,Columbus,Ohio Key: USA, Value: John,44,"San Jose",California
Usage: CSVKafkaSourceExampleWithSnappySink <kafka-brokers> <topics> [checkpoint-directory]
<kafka-brokers> Compulsory argument providing comma separate list of kafka brokers <topics> Compulsory argument providing comma separated list of kafka topics to subscribe [checkpoint-directory] Optional argument providing checkpoint directory where the state of the steaming query will be stored. Note that this directory needs to be deleted manually to reset the state of the streaming query. Default: CSVKafkaSourceExampleWithSnappySink directory under working directory.
Example: $ bin/run-example snappydata.structuredstreaming.CSVKafkaSourceExampleWithSnappySink \ "broker-1:9092,broker-2:9092" "topic1,topic2" "checkpoint_dir"
object JSONFileSourceExampleWithSnappySink extends SnappySQLJob with internal.Logging

An example of structured streaming depicting JSON file processing with Snappy sink.
An example of structured streaming depicting JSON file processing with Snappy sink.
This example can be run either in local mode (in which case the example runs collocated with Spark+SnappyData Store in the same JVM) or can be submitted as a job to an already running SnappyData cluster.
Example input data:
{"name":"Yin", "age":31, "address":{"city":"Columbus","state":"Ohio", "district" :"Cincinnati"}} {"name":"Michael", "age":38, "address":{"city":"San Jose", "state":"California", "lane" :"15"}}
Running locally:
Usage: JSONFileSourceExampleWithSnappySink [checkpoint-directory] [input-directory]
[checkpoint-directory] Optional argument providing checkpoint directory where the state of the steaming query will be stored. Note that this directory needs to be deleted manually to reset the state of the streaming query. Default: JSONFileSourceExampleWithSnappySink directory in working directory. [input-directory] Optional argument pointing to input directory path where incoming JSON files should be dumped to get picked up for processing. Default: people.json directory under resources
Example: $ bin/run-example snappydata.structuredstreaming.JSONFileSourceExampleWithSnappySink \ "checkpoint_dir" "JSON_input_dir"
Submitting as a snappy job to already running cluster: cd $SNAPPY_HOME bin/snappy-job.sh submit \ --app-name JSONFileSourceExampleWsithSnappySink \ --class org.apache.spark.examples.snappydata.structuredstreaming.JSONFileSourceExampleWithSnappySink \ --app-jar examples/jars/quickstart.jar \ --conf checkpoint-directory=<checkpoint directory> \ --conf input-directory= Note that the checkpoint directory and input directory are mandatory options while submitting snappy job. Check the status of your job id bin/snappy-job.sh status --lead [leadHost:port] --job-id [job-id] To stop the job: bin/snappy-job.sh stop --lead [leadHost:port] --job-id [job-id] The content of the sink table can be checked from snappy-sql using a select query: select * from people; Resetting the streaming query: To reset streaming query progress delete the checkpoint directory. While running this example from as snappy job, you will also need to clear the state from state table using following query: delete from app.snappysys_internal____sink_state_table where stream_query_id = 'query1';
object JSONKafkaSourceExampleWithSnappySink extends internal.Logging

An example of structured streaming depicting processing of JSON coming from kafka source using snappy sink.
An example of structured streaming depicting processing of JSON coming from kafka source using snappy sink.
Example input data: key: {"country" : "USA"}, value: {"name":"Adam", "age":21, "address":{"city":"Columbus","state":"Ohio"}} key: {"country" : "England"}, value: {"name":"John", "age":44, "address":{"city":"London"}} key: {"country" : "USA"}, value: {"name":"Carol", "age":37, "address":{"city":"San Diego", "state":"California"}}
Usage: JSONKafkaSourceExampleWithSnappySink <kafka-brokers> <topics> [checkpoint-directory]
<kafka-brokers> Compulsory argument providing comma separate list of kafka brokers <topics> Compulsory argument providing comma separated list of kafka topics to subscribe [checkpoint-directory] Optional argument providing checkpoint directory where the state of the steaming query will be stored. Note that this directory needs to be deleted manually to reset the state of the streaming query. Default: JSONKafkaSourceExampleWithSnappySink directory under working directory.
Example: $ bin/run-example snappydata.structuredstreaming.JSONKafkaSourceExampleWithSnappySink \ "broker-1:9092,broker-2:9092" "topic1,topic2" "checkpoint_dir"
object SocketSourceExample

An example showing usage of structured streaming with console sink.
An example showing usage of structured streaming with console sink.
To run this example on your local machine, you need to first start a Netcat server:
$ nc -lk 9999
Sample input data:
```
device1,45
device2,67
device3,35
```
To run the example in local mode go to your SnappyData product distribution directory and run the following command:
```
bin/run-example snappydata.structuredstreaming.SocketSourceExample
```
For more details on streaming with SnappyData refer to: http://snappydatainc.github.io/snappydata/programming_guide /stream_processing_using_sql/#stream-processing-using-sql
object SocketSourceExampleWithSnappySink extends internal.Logging

An example showing usage of structured streaming with SnappyData.
To run this on your local machine, you need to first start a Netcat server:
$ nc -lk 9999
An example showing usage of structured streaming with SnappyData.
To run this on your local machine, you need to first start a Netcat server:
$ nc -lk 9999
Sample input data:
```
device1,45
device2,67
device3,35
```
To run the example in local mode go to your SnappyData product distribution directory and run the following command:
```
bin/run-example snappydata.structuredstreaming.SocketSourceExampleWithSnappySink
```

package structuredstreaming

Type Members

case class User(is: Long, name: String, age: Int, _eventType: Int) extends Product with Serializable

Value Members

object CDCExample

object CSVFileSourceExampleWithSnappySink extends internal.Logging

object CSVKafkaSourceExampleWithSnappySink extends internal.Logging

object JSONFileSourceExampleWithSnappySink extends SnappySQLJob with internal.Logging

object JSONKafkaSourceExampleWithSnappySink extends internal.Logging

object SocketSourceExample

object SocketSourceExampleWithSnappySink extends internal.Logging

Ungrouped