ConcurrentStream

Companion object ConcurrentStream

case class ConcurrentStream[A](kafkaData: Observable[A], asyncScheduler: Scheduler, kafkaFacade: KafkaFacade, minCommitFrequency: Int, awaitJobTimeout: FiniteDuration = 10.seconds, retryDuration: FiniteDuration = 50.milliseconds)(implicit hasOffset: HasOffset[A]) extends StrictLogging with AutoCloseable with Product with Serializable

Kafka consumer access is enforced to be single-threaded, and you can understand why.

Suppose a consumer was to read ten messages from kafka and send off ten async requests.

If the tenth request happened to come back first and commit its offset, then what about the other nine which might fail?

On the flip-side, if we were to block on each async call for every message, that would be a performance killer, and unnecessary if the calls are idempotent.

To enable async handling/commits, we just need to ensure we cater for this case:

msg1 -------------->+
                    |
msg2 ----->+        |
           |      !bang!
 ok  <-----+        |
                    |
     <- onFailure --+

we shouldn't commit the offset for msg2, even though it succeeded first.

The way we handle this is by having the futures drive a ConcurrentSubject of offsets zipped with the messages we receive from Kafka.

msg1 --------------> ???
msg2 --------------> ???
msg3 --------------> ???
msg4 --------------> ???
msg5 --------------> ???
msg6 --------------> ???

... some mixed order - just ensuring we do get either a failure or a success for each result

msg6 <--------- ???
msg2 <--------- ???
msg5 <--------- ???
msg1 <--------- ??? // here we can commit up to offset 2 as 1 and 2 have returned

A: the messages in the kafka feed (typically AckableRecords)
kafkaData: the data coming from kafka
asyncScheduler: the scheduler to use in running tasks
kafkaFacade: our means of committing offsets to kafka
minCommitFrequency: how frequently we'll try and commit offsets to kafka. Set to 0 to commit as soon as tasks complete successfully
awaitJobTimeout: the amount of time to wait for the last job to complete when trying to commit the last position back to kafka
retryDuration: the "poll time" when checking for the result of the final task

Linear Supertypes

Serializable, Product, Equals, AutoCloseable, StrictLogging, AnyRef, Any

Ordering

Alphabetic
By Inheritance

Inherited

ConcurrentStream
Serializable
Product
Equals
AutoCloseable
StrictLogging
AnyRef
Any

Hide All
Show All

Visibility

Public
All

Instance Constructors

new ConcurrentStream(kafkaData: Observable[A], asyncScheduler: Scheduler, kafkaFacade: KafkaFacade, minCommitFrequency: Int, awaitJobTimeout: FiniteDuration = 10.seconds, retryDuration: FiniteDuration = 50.milliseconds)(implicit hasOffset: HasOffset[A])
kafkaData
the data coming from kafka
asyncScheduler
the scheduler to use in running tasks
kafkaFacade
our means of committing offsets to kafka
minCommitFrequency
how frequently we'll try and commit offsets to kafka. Set to 0 to commit as soon as tasks complete successfully
awaitJobTimeout
the amount of time to wait for the last job to complete when trying to commit the last position back to kafka
retryDuration
the "poll time" when checking for the result of the final task

Value Members

final def !=(arg0: Any): Boolean
Definition Classes
AnyRef → Any
final def ##(): Int
Definition Classes
AnyRef → Any
final def ==(arg0: Any): Boolean
Definition Classes
AnyRef → Any
def as[B](implicit ev: =:=[A, AckBytes], decoder: ByteArrayDecoder[B], newHasOffset: HasOffset[B]): ConcurrentStream[B]
Converts the kafkaData to a 'B' type, provided there is a 'HasOffset' for B so we know what to commit back to Kafka
final def asInstanceOf[T0]: T0
Definition Classes
Any
val asyncScheduler: Scheduler
val awaitJobTimeout: FiniteDuration
def clone(): AnyRef
Attributes
protected[java.lang]
Definition Classes
AnyRef
Annotations
@throws(classOf[java.lang.CloneNotSupportedException]) @native()
def close(): Unit
Definition Classes
ConcurrentStream → AutoCloseable
def compute[B, C](parallelism: Int)(compute: (B) => Task[C])(implicit ev: =:=[A, AckBytes], decoder: ByteArrayDecoder[B]): Observable[ComputeResult[B, C]]
Convenience method which combines the 'decode' and 'loadBalance' functions
def decode[B](implicit ev: =:=[A, AckBytes], decoder: ByteArrayDecoder[B]): ConcurrentStream[AckableRecord[B]]
Convenience method for 'map' which uses a decoder to B.
Convenience method for 'map' which uses a decoder to B. This maps inner the kafkaData to the B type, but still wrapped in an AckableRecord
returns
a ConcurrentStream of type 'B'
final def eq(arg0: AnyRef): Boolean
Definition Classes
AnyRef
def finalize(): Unit
Attributes
protected[java.lang]
Definition Classes
AnyRef
Annotations
@throws(classOf[java.lang.Throwable])
final def getClass(): Class[_ <: AnyRef]
Definition Classes
AnyRef → Any
Annotations
@native()
implicit val hasOffset: HasOffset[A]
final def isInstanceOf[T0]: Boolean
Definition Classes
Any
val kafkaData: Observable[A]
val kafkaFacade: KafkaFacade
def loadBalance[B](parallelism: Int)(runJobOnNext: (A) => Task[B]): Observable[(ZipOffset, A, B)]
Exposes a means to execute a task on each input from the kafka data.
Exposes a means to execute a task on each input from the kafka data.
The tasks can be run in parallel, but the kafka offsets are only committed when *all* the tasks have completed successfully.
Tasks in error will cause the Observable to fail, so if you want to continue consuming from Kafka after failures you will need to ensure the Tasks have adequate error handling/retry logic built in.
Example:
Consider we kick off async processes after having consumed Kafka messages A,B,C,D and E :
```
A ---- start job on A -------------------+
                                         |
B ---- start job on B ---------------+   |
                                     |   |
C ---- start job on C -----------+   |   |
                                 |   |   |
D ---- start job on D ---+---+   |   |   |
                             |   |   |   |
E ---- start job on E ----+  |   |   |   |
                          |  |   |   |   |
1: (c) <------------------+--+---+   |   |
                          |  |       |   |
2: (b) <------------------+--+-------+   |
                          |  |           |
3: (a) <------------------+--+-----------+
                          |  |
4: (e) <---- !BANG! ------+  |
                             |
5: (d) <---------------------+
```
So, we've kicked off 5 jobs based on the first 5 messages, all on different threads.
1: At this point, received a successful response from the third message (C). We DON'T commit the offset back to kafka, because the tasks from A or B may yet fail, in which case we expect the Observable stream to fail with their error, and thus have messages A, B, etc replayed upon reconnect.
We DO however emit a tuple message of (2, C, c) -- e.g. the local offset index '2', input message C and 'c' result from the task.
The results are emitted in the order of the first tasks to complete, not necessarily their input order. If the downstream systems need to reconstruct the original order, they can either broadcast to many from the original kafka stream or use ContiguousOrdering to put the messages back in kafka received order.
2: Next we receive a successful response from the second message (B). Just as before, we're still waiting on result A so DON'T commit the offset, but DO emit a record (1,B,b)
3: Finally we get our first response back from message A. We've now received messages A,B and C and so can commit the 'C' offset to Kafka (not just A). If we die and reconnect now, we should start from message C. We also emit message (0,A,a)
4: We get the error response from E (e) so we end the stream in error. If we had received the response from message 'D' instead we wouldn't tried to commit that offset to kafka as it was the next index after 'E'. Too bad, so sad - we error our stream and presumably our app dies or at least we close our kafka connection.
5: Our stream is errored, so this response is simply ignored.
parallelism
the parallelism to use when executing the jobs
runJobOnNext
our job logic - the tasks to execute on each kafka input message
val logger: Logger
Attributes
protected
Definition Classes
StrictLogging
def map[B](f: (A) => B)(implicit newHasOffset: HasOffset[B]): ConcurrentStream[B]
Maps the kafkaData to the B type, but still wrapped in an AckableRecord
val minCommitFrequency: Int
final def ne(arg0: AnyRef): Boolean
Definition Classes
AnyRef
final def notify(): Unit
Definition Classes
AnyRef
Annotations
@native()
final def notifyAll(): Unit
Definition Classes
AnyRef
Annotations
@native()
def productElementNames: Iterator[String]
Definition Classes
Product
val retryDuration: FiniteDuration
final def synchronized[T0](arg0: => T0): T0
Definition Classes
AnyRef
final def wait(): Unit
Definition Classes
AnyRef
Annotations
@throws(classOf[java.lang.InterruptedException])
final def wait(arg0: Long, arg1: Int): Unit
Definition Classes
AnyRef
Annotations
@throws(classOf[java.lang.InterruptedException])
final def wait(arg0: Long): Unit
Definition Classes
AnyRef
Annotations
@throws(classOf[java.lang.InterruptedException]) @native()

Packages

ConcurrentStream

Companion object ConcurrentStream

Instance Constructors

Value Members

Inherited from Serializable

Inherited from Product

Inherited from Equals

Inherited from AutoCloseable

Inherited from StrictLogging

Inherited from AnyRef

Inherited from Any

Ungrouped

Packages

ConcurrentStream

Companion object ConcurrentStream

Instance Constructors

Value Members

Inherited from Serializable

Inherited from Product

Inherited from Equals

Inherited from AutoCloseable

Inherited from StrictLogging

Inherited from AnyRef

Inherited from Any

Ungrouped

ConcurrentStream