An example explaining CDC (change data capture) use case with SnappyData streaming sink.
An example of structured streaming depicting CSV file processing with Snappy sink.
An example of structured streaming depicting CSV file processing with Snappy sink.
Example input data:
Yin,31,Columbus,Ohio Michael,38,"San Jose",California
Usage: CSVFileSourceExampleWithSnappySink [checkpoint-directory] [input-directory]
[checkpoint-directory] Optional argument providing checkpoint directory where the
state of the steaming query will be stored. Note that this
directory needs to be deleted manually to reset the state
of the streaming query.
Default: CSVFileSourceExampleWithSnappySink
directory
in working directory.
[input-directory] Optional argument pointing to input directory path where incoming
CSV files should be dumped to get picked up for processing.
Default: people.csv
directory under resources
Example: $ bin/run-example snappydata.structuredstreaming.CSVFileSourceExampleWithSnappySink \ "checkpoint_dir" "CSV_input_dir"
An example of structured streaming depicting processing of JSON coming from kafka source using snappy sink.
An example of structured streaming depicting processing of JSON coming from kafka source using snappy sink.
Example input data:
Key: USA, Value: Yin,31,Columbus,Ohio Key: USA, Value: John,44,"San Jose",California
Usage: CSVKafkaSourceExampleWithSnappySink <kafka-brokers> <topics> [checkpoint-directory]
<kafka-brokers> Compulsory argument providing comma separate list of kafka brokers
<topics> Compulsory argument providing comma separated list of kafka topics to subscribe
[checkpoint-directory] Optional argument providing checkpoint directory where the state of
the steaming query will be stored. Note that this directory needs to
be deleted manually to reset the state of the streaming query.
Default: CSVKafkaSourceExampleWithSnappySink
directory under
working directory.
Example: $ bin/run-example snappydata.structuredstreaming.CSVKafkaSourceExampleWithSnappySink \ "broker-1:9092,broker-2:9092" "topic1,topic2" "checkpoint_dir"
An example of structured streaming depicting JSON file processing with Snappy sink.
An example of structured streaming depicting JSON file processing with Snappy sink.
This example can be run either in local mode (in which case the example runs collocated with Spark+SnappyData Store in the same JVM) or can be submitted as a job to an already running SnappyData cluster.
Example input data:
{"name":"Yin", "age":31, "address":{"city":"Columbus","state":"Ohio", "district" :"Cincinnati"}} {"name":"Michael", "age":38, "address":{"city":"San Jose", "state":"California", "lane" :"15"}}
Running locally:
Usage: JSONFileSourceExampleWithSnappySink [checkpoint-directory] [input-directory]
[checkpoint-directory] Optional argument providing checkpoint directory where the
state of the steaming query will be stored. Note that this
directory needs to be deleted manually to reset the state
of the streaming query.
Default: JSONFileSourceExampleWithSnappySink
directory
in working directory.
[input-directory] Optional argument pointing to input directory path where incoming
JSON files should be dumped to get picked up for processing.
Default: people.json
directory under resources
Example: $ bin/run-example snappydata.structuredstreaming.JSONFileSourceExampleWithSnappySink \ "checkpoint_dir" "JSON_input_dir"
Submitting as a snappy job to already running cluster: cd $SNAPPY_HOME bin/snappy-job.sh submit \ --app-name JSONFileSourceExampleWsithSnappySink \ --class org.apache.spark.examples.snappydata.structuredstreaming.JSONFileSourceExampleWithSnappySink \ --app-jar examples/jars/quickstart.jar \ --conf checkpoint-directory=<checkpoint directory> \ --conf input-directory= Note that the checkpoint directory and input directory are mandatory options while submitting snappy job. Check the status of your job id bin/snappy-job.sh status --lead [leadHost:port] --job-id [job-id] To stop the job: bin/snappy-job.sh stop --lead [leadHost:port] --job-id [job-id] The content of the sink table can be checked from snappy-sql using a select query: select * from people; Resetting the streaming query: To reset streaming query progress delete the checkpoint directory. While running this example from as snappy job, you will also need to clear the state from state table using following query: delete from app.snappysys_internal____sink_state_table where stream_query_id = 'query1';
An example of structured streaming depicting processing of JSON coming from kafka source using snappy sink.
An example of structured streaming depicting processing of JSON coming from kafka source using snappy sink.
Example input data: key: {"country" : "USA"}, value: {"name":"Adam", "age":21, "address":{"city":"Columbus","state":"Ohio"}} key: {"country" : "England"}, value: {"name":"John", "age":44, "address":{"city":"London"}} key: {"country" : "USA"}, value: {"name":"Carol", "age":37, "address":{"city":"San Diego", "state":"California"}}
Usage: JSONKafkaSourceExampleWithSnappySink <kafka-brokers> <topics> [checkpoint-directory]
<kafka-brokers> Compulsory argument providing comma separate list of kafka brokers
<topics> Compulsory argument providing comma separated list of kafka topics to subscribe
[checkpoint-directory] Optional argument providing checkpoint directory where the state of
the steaming query will be stored. Note that this directory needs to
be deleted manually to reset the state of the streaming query.
Default: JSONKafkaSourceExampleWithSnappySink
directory under
working directory.
Example: $ bin/run-example snappydata.structuredstreaming.JSONKafkaSourceExampleWithSnappySink \ "broker-1:9092,broker-2:9092" "topic1,topic2" "checkpoint_dir"
An example showing usage of structured streaming with console sink.
An example showing usage of structured streaming with console sink.
To run this example on your local machine, you need to first start a Netcat server:
$ nc -lk 9999
Sample input data:
device1,45 device2,67 device3,35
To run the example in local mode go to your SnappyData product distribution directory and run the following command:
bin/run-example snappydata.structuredstreaming.SocketSourceExample
For more details on streaming with SnappyData refer to: http://snappydatainc.github.io/snappydata/programming_guide /stream_processing_using_sql/#stream-processing-using-sql
An example showing usage of structured streaming with SnappyData.
To run this on your local machine, you need to first start a Netcat server:
$ nc -lk 9999
An example showing usage of structured streaming with SnappyData.
To run this on your local machine, you need to first start a Netcat server:
$ nc -lk 9999
Sample input data:
device1,45 device2,67 device3,35To run the example in local mode go to your SnappyData product distribution directory and run the following command:
bin/run-example snappydata.structuredstreaming.SocketSourceExampleWithSnappySink
An example explaining CDC (change data capture) use case with SnappyData streaming sink.
For CDC use case following two conditions should match: 1) The target table must be defined with key columns (for column tables) or primary keys ( for row table). 2) The input dataset must have an numeric column with name
_eventType
indicating type of the event. The value of this column is mapped with event type in the following manner:0 - insert 1 - putInto 2 - delete
Based on the key values in the incoming dataset and the value of
_eventType
column the sink will decide which operation need to be performed for each record.To run this on your local machine, you need to first run a Netcat server:
$ nc -lk 9999
Example input data. Note that the last value from CSV record indicates the
_eventType
:1,user1,23,0 2,user2,45,0 1,user1,23,2 2,user2,46,1
To run the example in local mode go to your SnappyData product distribution directory and execute the following command:
bin/run-example snappydata.structuredstreaming.CDCExample