case class ConcurrentStream[A](kafkaData: Observable[A], asyncScheduler: Scheduler, kafkaFacade: KafkaFacade, minCommitFrequency: Int, awaitJobTimeout: FiniteDuration = 10.seconds, retryDuration: FiniteDuration = 50.milliseconds)(implicit hasOffset: HasOffset[A]) extends StrictLogging with AutoCloseable with Product with Serializable
Kafka consumer access is enforced to be single-threaded, and you can understand why.
Suppose a consumer was to read ten messages from kafka and send off ten async requests.
If the tenth request happened to come back first and commit its offset, then what about the other nine which might fail?
On the flip-side, if we were to block on each async call for every message, that would be a performance killer, and unnecessary if the calls are idempotent.
To enable async handling/commits, we just need to ensure we cater for this case:
msg1 -------------->+ | msg2 ----->+ | | !bang! ok <-----+ | | <- onFailure --+
we shouldn't commit the offset for msg2, even though it succeeded first.
The way we handle this is by having the futures drive a ConcurrentSubject of offsets zipped with the messages we receive from Kafka.
msg1 --------------> ??? msg2 --------------> ??? msg3 --------------> ??? msg4 --------------> ??? msg5 --------------> ??? msg6 --------------> ??? ... some mixed order - just ensuring we do get either a failure or a success for each result msg6 <--------- ??? msg2 <--------- ??? msg5 <--------- ??? msg1 <--------- ??? // here we can commit up to offset 2 as 1 and 2 have returned
- A
the messages in the kafka feed (typically AckableRecords)
- kafkaData
the data coming from kafka
- asyncScheduler
the scheduler to use in running tasks
- kafkaFacade
our means of committing offsets to kafka
- minCommitFrequency
how frequently we'll try and commit offsets to kafka. Set to 0 to commit as soon as tasks complete successfully
- awaitJobTimeout
the amount of time to wait for the last job to complete when trying to commit the last position back to kafka
- retryDuration
the "poll time" when checking for the result of the final task
- Alphabetic
- By Inheritance
- ConcurrentStream
- Serializable
- Product
- Equals
- AutoCloseable
- StrictLogging
- AnyRef
- Any
- Hide All
- Show All
- Public
- All
Instance Constructors
- new ConcurrentStream(kafkaData: Observable[A], asyncScheduler: Scheduler, kafkaFacade: KafkaFacade, minCommitFrequency: Int, awaitJobTimeout: FiniteDuration = 10.seconds, retryDuration: FiniteDuration = 50.milliseconds)(implicit hasOffset: HasOffset[A])
- kafkaData
the data coming from kafka
- asyncScheduler
the scheduler to use in running tasks
- kafkaFacade
our means of committing offsets to kafka
- minCommitFrequency
how frequently we'll try and commit offsets to kafka. Set to 0 to commit as soon as tasks complete successfully
- awaitJobTimeout
the amount of time to wait for the last job to complete when trying to commit the last position back to kafka
- retryDuration
the "poll time" when checking for the result of the final task
Value Members
- final def !=(arg0: Any): Boolean
- Definition Classes
- AnyRef → Any
- final def ##(): Int
- Definition Classes
- AnyRef → Any
- final def ==(arg0: Any): Boolean
- Definition Classes
- AnyRef → Any
- def as[B](implicit ev: =:=[A, AckBytes], decoder: ByteArrayDecoder[B], newHasOffset: HasOffset[B]): ConcurrentStream[B]
Converts the kafkaData to a 'B' type, provided there is a 'HasOffset' for B so we know what to commit back to Kafka
- final def asInstanceOf[T0]: T0
- Definition Classes
- Any
- val asyncScheduler: Scheduler
- val awaitJobTimeout: FiniteDuration
- def clone(): AnyRef
- Attributes
- protected[java.lang]
- Definition Classes
- AnyRef
- Annotations
- @throws(classOf[java.lang.CloneNotSupportedException]) @native()
- def close(): Unit
- Definition Classes
- ConcurrentStream → AutoCloseable
- def compute[B, C](parallelism: Int)(compute: (B) => Task[C])(implicit ev: =:=[A, AckBytes], decoder: ByteArrayDecoder[B]): Observable[ComputeResult[B, C]]
Convenience method which combines the 'decode' and 'loadBalance' functions
- def decode[B](implicit ev: =:=[A, AckBytes], decoder: ByteArrayDecoder[B]): ConcurrentStream[AckableRecord[B]]
Convenience method for 'map' which uses a decoder to B.
Convenience method for 'map' which uses a decoder to B. This maps inner the kafkaData to the B type, but still wrapped in an AckableRecord
- returns
a ConcurrentStream of type 'B'
- final def eq(arg0: AnyRef): Boolean
- Definition Classes
- AnyRef
- def finalize(): Unit
- Attributes
- protected[java.lang]
- Definition Classes
- AnyRef
- Annotations
- @throws(classOf[java.lang.Throwable])
- final def getClass(): Class[_ <: AnyRef]
- Definition Classes
- AnyRef → Any
- Annotations
- @native()
- implicit val hasOffset: HasOffset[A]
- final def isInstanceOf[T0]: Boolean
- Definition Classes
- Any
- val kafkaData: Observable[A]
- val kafkaFacade: KafkaFacade
- def loadBalance[B](parallelism: Int)(runJobOnNext: (A) => Task[B]): Observable[(ZipOffset, A, B)]
Exposes a means to execute a task on each input from the kafka data.
Exposes a means to execute a task on each input from the kafka data.
The tasks can be run in parallel, but the kafka offsets are only committed when *all* the tasks have completed successfully.
Tasks in error will cause the Observable to fail, so if you want to continue consuming from Kafka after failures you will need to ensure the Tasks have adequate error handling/retry logic built in.
Example:
Consider we kick off async processes after having consumed Kafka messages A,B,C,D and E :
A ---- start job on A -------------------+ | B ---- start job on B ---------------+ | | | C ---- start job on C -----------+ | | | | | D ---- start job on D ---+---+ | | | | | | | E ---- start job on E ----+ | | | | | | | | | 1: (c) <------------------+--+---+ | | | | | | 2: (b) <------------------+--+-------+ | | | | 3: (a) <------------------+--+-----------+ | | 4: (e) <---- !BANG! ------+ | | 5: (d) <---------------------+
So, we've kicked off 5 jobs based on the first 5 messages, all on different threads.
1: At this point, received a successful response from the third message (C). We DON'T commit the offset back to kafka, because the tasks from A or B may yet fail, in which case we expect the Observable stream to fail with their error, and thus have messages A, B, etc replayed upon reconnect.
We DO however emit a tuple message of (2, C, c) -- e.g. the local offset index '2', input message C and 'c' result from the task.
The results are emitted in the order of the first tasks to complete, not necessarily their input order. If the downstream systems need to reconstruct the original order, they can either broadcast to many from the original kafka stream or use ContiguousOrdering to put the messages back in kafka received order.
2: Next we receive a successful response from the second message (B). Just as before, we're still waiting on result A so DON'T commit the offset, but DO emit a record (1,B,b)
3: Finally we get our first response back from message A. We've now received messages A,B and C and so can commit the 'C' offset to Kafka (not just A). If we die and reconnect now, we should start from message C. We also emit message (0,A,a)
4: We get the error response from E (e) so we end the stream in error. If we had received the response from message 'D' instead we wouldn't tried to commit that offset to kafka as it was the next index after 'E'. Too bad, so sad - we error our stream and presumably our app dies or at least we close our kafka connection.
5: Our stream is errored, so this response is simply ignored.
- parallelism
the parallelism to use when executing the jobs
- runJobOnNext
our job logic - the tasks to execute on each kafka input message
- val logger: Logger
- Attributes
- protected
- Definition Classes
- StrictLogging
- def map[B](f: (A) => B)(implicit newHasOffset: HasOffset[B]): ConcurrentStream[B]
Maps the kafkaData to the B type, but still wrapped in an AckableRecord
- val minCommitFrequency: Int
- final def ne(arg0: AnyRef): Boolean
- Definition Classes
- AnyRef
- final def notify(): Unit
- Definition Classes
- AnyRef
- Annotations
- @native()
- final def notifyAll(): Unit
- Definition Classes
- AnyRef
- Annotations
- @native()
- def productElementNames: Iterator[String]
- Definition Classes
- Product
- val retryDuration: FiniteDuration
- final def synchronized[T0](arg0: => T0): T0
- Definition Classes
- AnyRef
- final def wait(): Unit
- Definition Classes
- AnyRef
- Annotations
- @throws(classOf[java.lang.InterruptedException])
- final def wait(arg0: Long, arg1: Int): Unit
- Definition Classes
- AnyRef
- Annotations
- @throws(classOf[java.lang.InterruptedException])
- final def wait(arg0: Long): Unit
- Definition Classes
- AnyRef
- Annotations
- @throws(classOf[java.lang.InterruptedException]) @native()