Packages

case class ConcurrentStream[A](kafkaData: Observable[A], asyncScheduler: Scheduler, kafkaFacade: KafkaFacade, minCommitFrequency: Int, awaitJobTimeout: FiniteDuration = 10.seconds, retryDuration: FiniteDuration = 50.milliseconds)(implicit hasOffset: HasOffset[A]) extends StrictLogging with AutoCloseable with Product with Serializable

Kafka consumer access is enforced to be single-threaded, and you can understand why.

Suppose a consumer was to read ten messages from kafka and send off ten async requests.

If the tenth request happened to come back first and commit its offset, then what about the other nine which might fail?

On the flip-side, if we were to block on each async call for every message, that would be a performance killer, and unnecessary if the calls are idempotent.

To enable async handling/commits, we just need to ensure we cater for this case:

msg1 -------------->+
                    |
msg2 ----->+        |
           |      !bang!
 ok  <-----+        |
                    |
     <- onFailure --+

we shouldn't commit the offset for msg2, even though it succeeded first.

The way we handle this is by having the futures drive a ConcurrentSubject of offsets zipped with the messages we receive from Kafka.

msg1 --------------> ???
msg2 --------------> ???
msg3 --------------> ???
msg4 --------------> ???
msg5 --------------> ???
msg6 --------------> ???

... some mixed order - just ensuring we do get either a failure or a success for each result

msg6 <--------- ???
msg2 <--------- ???
msg5 <--------- ???
msg1 <--------- ??? // here we can commit up to offset 2 as 1 and 2 have returned
A

the messages in the kafka feed (typically AckableRecords)

kafkaData

the data coming from kafka

asyncScheduler

the scheduler to use in running tasks

kafkaFacade

our means of committing offsets to kafka

minCommitFrequency

how frequently we'll try and commit offsets to kafka. Set to 0 to commit as soon as tasks complete successfully

awaitJobTimeout

the amount of time to wait for the last job to complete when trying to commit the last position back to kafka

retryDuration

the "poll time" when checking for the result of the final task

Linear Supertypes
Serializable, Product, Equals, AutoCloseable, StrictLogging, AnyRef, Any
Ordering
  1. Alphabetic
  2. By Inheritance
Inherited
  1. ConcurrentStream
  2. Serializable
  3. Product
  4. Equals
  5. AutoCloseable
  6. StrictLogging
  7. AnyRef
  8. Any
  1. Hide All
  2. Show All
Visibility
  1. Public
  2. All

Instance Constructors

  1. new ConcurrentStream(kafkaData: Observable[A], asyncScheduler: Scheduler, kafkaFacade: KafkaFacade, minCommitFrequency: Int, awaitJobTimeout: FiniteDuration = 10.seconds, retryDuration: FiniteDuration = 50.milliseconds)(implicit hasOffset: HasOffset[A])

    kafkaData

    the data coming from kafka

    asyncScheduler

    the scheduler to use in running tasks

    kafkaFacade

    our means of committing offsets to kafka

    minCommitFrequency

    how frequently we'll try and commit offsets to kafka. Set to 0 to commit as soon as tasks complete successfully

    awaitJobTimeout

    the amount of time to wait for the last job to complete when trying to commit the last position back to kafka

    retryDuration

    the "poll time" when checking for the result of the final task

Value Members

  1. final def !=(arg0: Any): Boolean
    Definition Classes
    AnyRef → Any
  2. final def ##(): Int
    Definition Classes
    AnyRef → Any
  3. final def ==(arg0: Any): Boolean
    Definition Classes
    AnyRef → Any
  4. def as[B](implicit ev: =:=[A, AckBytes], decoder: ByteArrayDecoder[B], newHasOffset: HasOffset[B]): ConcurrentStream[B]

    Converts the kafkaData to a 'B' type, provided there is a 'HasOffset' for B so we know what to commit back to Kafka

  5. final def asInstanceOf[T0]: T0
    Definition Classes
    Any
  6. val asyncScheduler: Scheduler
  7. val awaitJobTimeout: FiniteDuration
  8. def clone(): AnyRef
    Attributes
    protected[java.lang]
    Definition Classes
    AnyRef
    Annotations
    @throws(classOf[java.lang.CloneNotSupportedException]) @native()
  9. def close(): Unit
    Definition Classes
    ConcurrentStream → AutoCloseable
  10. def compute[B, C](parallelism: Int)(compute: (B) => Task[C])(implicit ev: =:=[A, AckBytes], decoder: ByteArrayDecoder[B]): Observable[ComputeResult[B, C]]

    Convenience method which combines the 'decode' and 'loadBalance' functions

  11. def decode[B](implicit ev: =:=[A, AckBytes], decoder: ByteArrayDecoder[B]): ConcurrentStream[AckableRecord[B]]

    Convenience method for 'map' which uses a decoder to B.

    Convenience method for 'map' which uses a decoder to B. This maps inner the kafkaData to the B type, but still wrapped in an AckableRecord

    returns

    a ConcurrentStream of type 'B'

  12. final def eq(arg0: AnyRef): Boolean
    Definition Classes
    AnyRef
  13. def finalize(): Unit
    Attributes
    protected[java.lang]
    Definition Classes
    AnyRef
    Annotations
    @throws(classOf[java.lang.Throwable])
  14. final def getClass(): Class[_ <: AnyRef]
    Definition Classes
    AnyRef → Any
    Annotations
    @native()
  15. implicit val hasOffset: HasOffset[A]
  16. final def isInstanceOf[T0]: Boolean
    Definition Classes
    Any
  17. val kafkaData: Observable[A]
  18. val kafkaFacade: KafkaFacade
  19. def loadBalance[B](parallelism: Int)(runJobOnNext: (A) => Task[B]): Observable[(ZipOffset, A, B)]

    Exposes a means to execute a task on each input from the kafka data.

    Exposes a means to execute a task on each input from the kafka data.

    The tasks can be run in parallel, but the kafka offsets are only committed when *all* the tasks have completed successfully.

    Tasks in error will cause the Observable to fail, so if you want to continue consuming from Kafka after failures you will need to ensure the Tasks have adequate error handling/retry logic built in.

    Example:

    Consider we kick off async processes after having consumed Kafka messages A,B,C,D and E :

    A ---- start job on A -------------------+
                                             |
    B ---- start job on B ---------------+   |
                                         |   |
    C ---- start job on C -----------+   |   |
                                     |   |   |
    D ---- start job on D ---+---+   |   |   |
                                 |   |   |   |
    E ---- start job on E ----+  |   |   |   |
                              |  |   |   |   |
    1: (c) <------------------+--+---+   |   |
                              |  |       |   |
    2: (b) <------------------+--+-------+   |
                              |  |           |
    3: (a) <------------------+--+-----------+
                              |  |
    4: (e) <---- !BANG! ------+  |
                                 |
    5: (d) <---------------------+

    So, we've kicked off 5 jobs based on the first 5 messages, all on different threads.

    1: At this point, received a successful response from the third message (C). We DON'T commit the offset back to kafka, because the tasks from A or B may yet fail, in which case we expect the Observable stream to fail with their error, and thus have messages A, B, etc replayed upon reconnect.

    We DO however emit a tuple message of (2, C, c) -- e.g. the local offset index '2', input message C and 'c' result from the task.

    The results are emitted in the order of the first tasks to complete, not necessarily their input order. If the downstream systems need to reconstruct the original order, they can either broadcast to many from the original kafka stream or use ContiguousOrdering to put the messages back in kafka received order.

    2: Next we receive a successful response from the second message (B). Just as before, we're still waiting on result A so DON'T commit the offset, but DO emit a record (1,B,b)

    3: Finally we get our first response back from message A. We've now received messages A,B and C and so can commit the 'C' offset to Kafka (not just A). If we die and reconnect now, we should start from message C. We also emit message (0,A,a)

    4: We get the error response from E (e) so we end the stream in error. If we had received the response from message 'D' instead we wouldn't tried to commit that offset to kafka as it was the next index after 'E'. Too bad, so sad - we error our stream and presumably our app dies or at least we close our kafka connection.

    5: Our stream is errored, so this response is simply ignored.

    parallelism

    the parallelism to use when executing the jobs

    runJobOnNext

    our job logic - the tasks to execute on each kafka input message

  20. val logger: Logger
    Attributes
    protected
    Definition Classes
    StrictLogging
  21. def map[B](f: (A) => B)(implicit newHasOffset: HasOffset[B]): ConcurrentStream[B]

    Maps the kafkaData to the B type, but still wrapped in an AckableRecord

  22. val minCommitFrequency: Int
  23. final def ne(arg0: AnyRef): Boolean
    Definition Classes
    AnyRef
  24. final def notify(): Unit
    Definition Classes
    AnyRef
    Annotations
    @native()
  25. final def notifyAll(): Unit
    Definition Classes
    AnyRef
    Annotations
    @native()
  26. def productElementNames: Iterator[String]
    Definition Classes
    Product
  27. val retryDuration: FiniteDuration
  28. final def synchronized[T0](arg0: => T0): T0
    Definition Classes
    AnyRef
  29. final def wait(): Unit
    Definition Classes
    AnyRef
    Annotations
    @throws(classOf[java.lang.InterruptedException])
  30. final def wait(arg0: Long, arg1: Int): Unit
    Definition Classes
    AnyRef
    Annotations
    @throws(classOf[java.lang.InterruptedException])
  31. final def wait(arg0: Long): Unit
    Definition Classes
    AnyRef
    Annotations
    @throws(classOf[java.lang.InterruptedException]) @native()

Inherited from Serializable

Inherited from Product

Inherited from Equals

Inherited from AutoCloseable

Inherited from StrictLogging

Inherited from AnyRef

Inherited from Any

Ungrouped