interface to implement custom source where data is read into the system.
general task that runs any DataSource see DataSourceProcessor for its usage
general task that runs any DataSource see DataSourceProcessor for its usage
DataSourceTask calls
DataSource.open
in onStart
and pass in TaskContext and application start timeDataSource.read
in each onNext
, which reads a batch of messages whose size are defined by
gearpump.source.read.batch.size
.DataSource.close
in onStop
default TimeStampFilter that filters out messages with smaller timestamps
utility that helps user to create a DAG starting with DataSourceTask user should pass in a DataSource
utility that helps user to create a DAG starting with DataSourceTask user should pass in a DataSource
here is an example to build a DAG that reads from Kafka source followed by word count
val source = new KafkaSource() val sourceProcessor = DataSourceProcessor(source, 1) val split = Processor[Split](1) val sum = Processor[Sum](1) val dag = sourceProcessor ~> split ~> sum
interface to implement custom source where data is read into the system. a DataSource could be a message queue like kafka or simply data generation source.
an example would be like
subclass is required to be serializable