Package

org.apache.spark.examples.snappydata

structuredstreaming

Permalink

package structuredstreaming

Visibility
  1. Public
  2. All

Type Members

  1. case class User(is: Long, name: String, age: Int, _eventType: Int) extends Product with Serializable

    Permalink

Value Members

  1. object CDCExample

    Permalink

    An example explaining CDC (change data capture) use case with SnappyData streaming sink.

    An example explaining CDC (change data capture) use case with SnappyData streaming sink.

    For CDC use case following two conditions should match: 1) The target table must be defined with key columns (for column tables) or primary keys ( for row table). 2) The input dataset must have an numeric column with name _eventType indicating type of the event. The value of this column is mapped with event type in the following manner:

    0 - insert 1 - putInto 2 - delete

    Based on the key values in the incoming dataset and the value of _eventType column the sink will decide which operation need to be performed for each record.

    To run this on your local machine, you need to first run a Netcat server: $ nc -lk 9999

    Example input data. Note that the last value from CSV record indicates the _eventType:

    1,user1,23,0 2,user2,45,0 1,user1,23,2 2,user2,46,1

    To run the example in local mode go to your SnappyData product distribution directory and execute the following command:

    bin/run-example snappydata.structuredstreaming.CDCExample

  2. object CSVFileSourceExampleWithSnappySink extends internal.Logging

    Permalink

    An example of structured streaming depicting CSV file processing with Snappy sink.

    An example of structured streaming depicting CSV file processing with Snappy sink.

    Example input data:

    Yin,31,Columbus,Ohio Michael,38,"San Jose",California

    Usage: CSVFileSourceExampleWithSnappySink [checkpoint-directory] [input-directory]

    [checkpoint-directory] Optional argument providing checkpoint directory where the state of the steaming query will be stored. Note that this directory needs to be deleted manually to reset the state of the streaming query. Default: CSVFileSourceExampleWithSnappySink directory in working directory. [input-directory] Optional argument pointing to input directory path where incoming CSV files should be dumped to get picked up for processing. Default: people.csv directory under resources

    Example: $ bin/run-example snappydata.structuredstreaming.CSVFileSourceExampleWithSnappySink \ "checkpoint_dir" "CSV_input_dir"

  3. object CSVKafkaSourceExampleWithSnappySink extends internal.Logging

    Permalink

    An example of structured streaming depicting processing of JSON coming from kafka source using snappy sink.

    An example of structured streaming depicting processing of JSON coming from kafka source using snappy sink.

    Example input data:

    Key: USA, Value: Yin,31,Columbus,Ohio Key: USA, Value: John,44,"San Jose",California

    Usage: CSVKafkaSourceExampleWithSnappySink <kafka-brokers> <topics> [checkpoint-directory]

    <kafka-brokers> Compulsory argument providing comma separate list of kafka brokers <topics> Compulsory argument providing comma separated list of kafka topics to subscribe [checkpoint-directory] Optional argument providing checkpoint directory where the state of the steaming query will be stored. Note that this directory needs to be deleted manually to reset the state of the streaming query. Default: CSVKafkaSourceExampleWithSnappySink directory under working directory.

    Example: $ bin/run-example snappydata.structuredstreaming.CSVKafkaSourceExampleWithSnappySink \ "broker-1:9092,broker-2:9092" "topic1,topic2" "checkpoint_dir"

  4. object JSONFileSourceExampleWithSnappySink extends SnappySQLJob with internal.Logging

    Permalink

    An example of structured streaming depicting JSON file processing with Snappy sink.

    An example of structured streaming depicting JSON file processing with Snappy sink.

    This example can be run either in local mode (in which case the example runs collocated with Spark+SnappyData Store in the same JVM) or can be submitted as a job to an already running SnappyData cluster.

    Example input data:

    {"name":"Yin", "age":31, "address":{"city":"Columbus","state":"Ohio", "district" :"Cincinnati"}} {"name":"Michael", "age":38, "address":{"city":"San Jose", "state":"California", "lane" :"15"}}

    Running locally:

    Usage: JSONFileSourceExampleWithSnappySink [checkpoint-directory] [input-directory]

    [checkpoint-directory] Optional argument providing checkpoint directory where the state of the steaming query will be stored. Note that this directory needs to be deleted manually to reset the state of the streaming query. Default: JSONFileSourceExampleWithSnappySink directory in working directory. [input-directory] Optional argument pointing to input directory path where incoming JSON files should be dumped to get picked up for processing. Default: people.json directory under resources

    Example: $ bin/run-example snappydata.structuredstreaming.JSONFileSourceExampleWithSnappySink \ "checkpoint_dir" "JSON_input_dir"

    Submitting as a snappy job to already running cluster: cd $SNAPPY_HOME bin/snappy-job.sh submit \ --app-name JSONFileSourceExampleWsithSnappySink \ --class org.apache.spark.examples.snappydata.structuredstreaming.JSONFileSourceExampleWithSnappySink \ --app-jar examples/jars/quickstart.jar \ --conf checkpoint-directory=<checkpoint directory> \ --conf input-directory= Note that the checkpoint directory and input directory are mandatory options while submitting snappy job. Check the status of your job id bin/snappy-job.sh status --lead [leadHost:port] --job-id [job-id] To stop the job: bin/snappy-job.sh stop --lead [leadHost:port] --job-id [job-id] The content of the sink table can be checked from snappy-sql using a select query: select * from people; Resetting the streaming query: To reset streaming query progress delete the checkpoint directory. While running this example from as snappy job, you will also need to clear the state from state table using following query: delete from app.snappysys_internal____sink_state_table where stream_query_id = 'query1';

  5. object JSONKafkaSourceExampleWithSnappySink extends internal.Logging

    Permalink

    An example of structured streaming depicting processing of JSON coming from kafka source using snappy sink.

    An example of structured streaming depicting processing of JSON coming from kafka source using snappy sink.

    Example input data: key: {"country" : "USA"}, value: {"name":"Adam", "age":21, "address":{"city":"Columbus","state":"Ohio"}} key: {"country" : "England"}, value: {"name":"John", "age":44, "address":{"city":"London"}} key: {"country" : "USA"}, value: {"name":"Carol", "age":37, "address":{"city":"San Diego", "state":"California"}}

    Usage: JSONKafkaSourceExampleWithSnappySink <kafka-brokers> <topics> [checkpoint-directory]

    <kafka-brokers> Compulsory argument providing comma separate list of kafka brokers <topics> Compulsory argument providing comma separated list of kafka topics to subscribe [checkpoint-directory] Optional argument providing checkpoint directory where the state of the steaming query will be stored. Note that this directory needs to be deleted manually to reset the state of the streaming query. Default: JSONKafkaSourceExampleWithSnappySink directory under working directory.

    Example: $ bin/run-example snappydata.structuredstreaming.JSONKafkaSourceExampleWithSnappySink \ "broker-1:9092,broker-2:9092" "topic1,topic2" "checkpoint_dir"

  6. object SocketSourceExample

    Permalink

    An example showing usage of structured streaming with console sink.

    An example showing usage of structured streaming with console sink.

    To run this example on your local machine, you need to first start a Netcat server:
    $ nc -lk 9999

    Sample input data:

    device1,45
    device2,67
    device3,35
    

    To run the example in local mode go to your SnappyData product distribution directory and run the following command:

    bin/run-example snappydata.structuredstreaming.SocketSourceExample
    

    For more details on streaming with SnappyData refer to: http://snappydatainc.github.io/snappydata/programming_guide /stream_processing_using_sql/#stream-processing-using-sql

  7. object SocketSourceExampleWithSnappySink extends internal.Logging

    Permalink

    An example showing usage of structured streaming with SnappyData.
    To run this on your local machine, you need to first start a Netcat server:
    $ nc -lk 9999

    An example showing usage of structured streaming with SnappyData.
    To run this on your local machine, you need to first start a Netcat server:
    $ nc -lk 9999

    Sample input data:

    device1,45
    device2,67
    device3,35
    
    To run the example in local mode go to your SnappyData product distribution directory and run the following command:
    bin/run-example snappydata.structuredstreaming.SocketSourceExampleWithSnappySink
    

Ungrouped