A Fiber
that can be used to cancel the underlying consumer, or
wait for it to complete.
A Fiber
that can be used to cancel the underlying consumer, or
wait for it to complete. If you're using stream, or any other
provided stream in KafkaConsumer, these will be automatically
interrupted when the underlying consumer has been cancelled or
when it finishes with an exception.
Whenever cancel
is invoked, an attempt will be made to stop the
underlying consumer. The cancel
operation will not wait for the
consumer to shutdown. If you also want to wait for the shutdown
to complete, you can use join
. Note that join
is guaranteed
to complete after consumer shutdown, even when the consumer is
cancelled with cancel
.
This Fiber
instance is usually only required if the consumer
needs to be cancelled due to some external condition, or when an
external process needs to be cancelled whenever the consumer has
shut down. Most of the time, when you're only using the streams
provided by KafkaConsumer, there is no need to use this.
Stream
where the elements themselves are Stream
s which continually
request records for a single partition.
Stream
where the elements themselves are Stream
s which continually
request records for a single partition. These Stream
s will have to be
processed in parallel, using parJoin
or parJoinUnbounded
. Note that
when using parJoin(n)
and n
is smaller than the number of currently
assigned partitions, then there will be assigned partitions which won't
be processed. For that reason, prefer parJoinUnbounded
and the actual
limit will be the number of assigned partitions.
If you do not want to process all partitions in parallel, then you
can use stream instead, where records for all partitions are in
a single Stream
.
you have to first use subscribe
to subscribe the consumer
before using this Stream
. If you forgot to subscribe, there
will be a NotSubscribedException raised in the Stream
.
Stream
where the elements are Kafka messages and where ordering
is guaranteed per topic-partition.
Stream
where the elements are Kafka messages and where ordering
is guaranteed per topic-partition. Parallelism can be achieved on
record-level, using for example parEvalMap
. For partition-level
parallelism, use partitionedStream, where all partitions need
to be processed in parallel.
The Stream
works by continually making requests for records on
every assigned partition, waiting for records to come back on all
partitions, or up to ConsumerSettings#fetchTimeout. Records can
be processed as soon as they are received, without waiting on other
partition requests, but a second request for the same partition will
wait for outstanding fetches to complete or timeout before being sent.
you have to first use subscribe
to subscribe the consumer
before using this Stream
. If you forgot to subscribe, there
will be a NotSubscribedException raised in the Stream
.
Subscribes the consumer to the topics matching the specified Regex
.
Subscribes the consumer to the topics matching the specified Regex
.
Note that you have to use one of the subscribe
functions before you
can use any of the provided Stream
s, or a NotSubscribedException
will be raised in the Stream
s.
the regex to which matching topics should be subscribed
Subscribes the consumer to the specified topics.
Subscribes the consumer to the specified topics. Note that you have to
use one of the subscribe
functions to subscribe to one or more topics
before using any of the provided Stream
s, or a NotSubscribedException
will be raised in the Stream
s.
the topics to which the consumer should subscribe
KafkaConsumer represents a consumer of Kafka messages, with the ability to
subscribe
to topics, start a single top-level stream, and optionally control it via the provided fiber instance.The following top-level streams are provided.
- stream provides a single stream of messages, where the order of records is guaranteed per topic-partition.
- partitionedStream provides a stream with elements as streams that continually request records for a single partition. Order is guaranteed per topic-partition, but all assigned partitions will have to be processed in parallel.
For the streams, records are wrapped in CommittableMessages which provide CommittableOffsets with the ability to commit record offsets to Kafka. For performance reasons, offsets are usually committed in batches using CommittableOffsetBatch. Provided
Sink
s, like commitBatch or commitBatchWithin are available for batch committing offsets. If you are not committing offsets to Kafka, you can simply discard the CommittableOffset, and only make use of the record.While it's technically possible to start more than one stream from a single KafkaConsumer, it is generally not recommended as there is no guarantee which stream will receive which records, and there might be an overlap, in terms of duplicate messages, between the two streams. If a first stream completes, possibly with error, there's no guarantee the stream has processed all of the messages it received, and a second stream from the same KafkaConsumer might not be able to pick up where the first one left off. Therefore, only create a single top-level stream per KafkaConsumer, and if you want to start a new stream if the first one finishes, let the KafkaConsumer shutdown and create a new one.