io.gearpump.streaming

source

package source

Visibility
  1. Public
  2. All

Type Members

  1. trait DataSource extends Serializable

    interface to implement custom source where data is read into the system.

    interface to implement custom source where data is read into the system. a DataSource could be a message queue like kafka or simply data generation source.

    an example would be like

    GenStringSource extends DataSource {
    
      def open(context: TaskContext, startTime: Option[TimeStamp]): Unit = {}
    
      def read(batchSize: Int): List[Message] = {
        List.fill(batchSize)(Message("message"))
      }
    
      def close(): Unit = {}
    }

    subclass is required to be serializable

  2. class DataSourceTask extends Task

    general task that runs any DataSource see DataSourceProcessor for its usage

    general task that runs any DataSource see DataSourceProcessor for its usage

    DataSourceTask calls

    • DataSource.open in onStart and pass in TaskContext and application start time
    • DataSource.read in each onNext, which reads a batch of messages whose size are defined by gearpump.source.read.batch.size.
    • DataSource.close in onStop
  3. class DefaultTimeStampFilter extends TimeStampFilter

    default TimeStampFilter that filters out messages with smaller timestamps

Value Members

  1. object DataSourceConfig

  2. object DataSourceProcessor

    utility that helps user to create a DAG starting with DataSourceTask user should pass in a DataSource

    utility that helps user to create a DAG starting with DataSourceTask user should pass in a DataSource

    here is an example to build a DAG that reads from Kafka source followed by word count

    val source = new KafkaSource()
    val sourceProcessor =  DataSourceProcessor(source, 1)
    val split = Processor[Split](1)
    val sum = Processor[Sum](1)
    val dag = sourceProcessor ~> split ~> sum
  3. object DataSourceTask

Ungrouped