|
|||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
java.lang.Object org.apache.kafka.clients.consumer.KafkaConsumer
public class KafkaConsumer
A Kafka client that consumes records from a Kafka cluster.
The consumer is thread safe and should generally be shared among all threads for best performance.
The consumer is single threaded and multiplexes I/O over TCP connections to each of the brokers it needs to communicate with. Failure to close the consumer after use will leak these resources.
private Map<TopicPartition, Long> process(Map<String, ConsumerRecords> records) {
Map<TopicPartition, Long> processedOffsets = new HashMap<TopicPartition, Long>();
for(Entry<String, ConsumerRecords> recordMetadata : records.entrySet()) {
List<ConsumerRecord> recordsPerTopic = recordMetadata.getValue().records();
for(int i = 0;i < recordsPerTopic.size();i++) {
ConsumerRecord record = recordsPerTopic.get(i);
// process record
try {
processedOffsets.put(record.topicAndpartition(), record.offset());
} catch (Exception e) {
e.printStackTrace();
}
}
}
return processedOffsets;
}
This example demonstrates how the consumer can be used to leverage Kafka's group management functionality for automatic consumer load balancing and failover. This example assumes that the offsets are stored in Kafka and are automatically committed periodically, as controlled by the auto.commit.interval.ms config
Properties props = new Properties();
props.put("metadata.broker.list", "localhost:9092");
props.put("group.id", "test");
props.put("session.timeout.ms", "1000");
props.put("enable.auto.commit", "true");
props.put("auto.commit.interval.ms", "10000");
KafkaConsumer consumer = new KafkaConsumer(props);
consumer.subscribe("foo", "bar");
boolean isRunning = true;
while(isRunning) {
Map<String, ConsumerRecords> records = consumer.poll(100);
process(records);
}
consumer.close();
This example demonstrates how the consumer can be used to leverage Kafka's group management functionality for automatic consumer load
balancing and failover. This example assumes that the offsets are stored in Kafka and are manually committed using
the commit(boolean) API. This example also demonstrates rewinding the consumer's offsets if processing of the consumed
messages fails. Note that this method of rewinding offsets using seek(offsets)
is only useful for rewinding the offsets
of the current consumer instance. As such, this will not trigger a rebalance or affect the fetch offsets for the other consumer instances.
Properties props = new Properties();
props.put("metadata.broker.list", "localhost:9092");
props.put("group.id", "test");
props.put("session.timeout.ms", "1000");
props.put("enable.auto.commit", "false");
KafkaConsumer consumer = new KafkaConsumer(props);
consumer.subscribe("foo", "bar");
int commitInterval = 100;
int numRecords = 0;
boolean isRunning = true;
Map<TopicPartition, Long> consumedOffsets = new HashMap<TopicPartition, Long>();
while(isRunning) {
Map<String, ConsumerRecords> records = consumer.poll(100);
try {
Map<TopicPartition, Long> lastConsumedOffsets = process(records);
consumedOffsets.putAll(lastConsumedOffsets);
numRecords += records.size();
// commit offsets for all partitions of topics foo, bar synchronously, owned by this consumer instance
if(numRecords % commitInterval == 0)
consumer.commit(false);
} catch(Exception e) {
try {
// rewind consumer's offsets for failed partitions
// assume failedPartitions() returns the list of partitions for which the processing of the last batch of messages failed
List<TopicPartition> failedPartitions = failedPartitions();
Map<TopicPartition, Long> offsetsToRewindTo = new HashMap<TopicPartition, Long>();
for(TopicPartition failedPartition : failedPartitions) {
// rewind to the last consumed offset for the failed partition. Since process() failed for this partition, the consumed offset
// should still be pointing to the last successfully processed offset and hence is the right offset to rewind consumption to.
offsetsToRewindTo.put(failedPartition, consumedOffsets.get(failedPartition));
}
// seek to new offsets only for partitions that failed the last process()
consumer.seek(offsetsToRewindTo);
} catch(Exception e) { break; } // rewind failed
}
}
consumer.close();
This example demonstrates how to rewind the offsets of the entire consumer group. It is assumed that the user has chosen to use Kafka's group management functionality for automatic consumer load balancing and failover. This example also assumes that the offsets are stored in Kafka. If group management is used, the right place to systematically rewind offsets for every consumer instance is inside the ConsumerRebalanceCallback. The onPartitionsAssigned callback is invoked after the consumer is assigned a new set of partitions on rebalance and before the consumption restarts post rebalance. This is the right place to supply the newly rewound offsets to the consumer. It is recommended that if you foresee the requirement to ever reset the consumer's offsets in the presence of group management, that you always configure the consumer to use the ConsumerRebalanceCallback with a flag that protects whether or not the offset rewind logic is used. This method of rewinding offsets is useful if you notice an issue with your message processing after successful consumption and offset commit. And you would like to rewind the offsets for the entire consumer group as part of rolling out a fix to your processing logic. In this case, you would configure each of your consumer instances with the offset rewind configuration flag turned on and bounce each consumer instance in a rolling restart fashion. Each restart will trigger a rebalance and eventually all consumer instances would have rewound the offsets for the partitions they own, effectively rewinding the offsets for the entire consumer group.
Properties props = new Properties();
props.put("metadata.broker.list", "localhost:9092");
props.put("group.id", "test");
props.put("session.timeout.ms", "1000");
props.put("enable.auto.commit", "false");
KafkaConsumer consumer = new KafkaConsumer(props,
new ConsumerRebalanceCallback() {
boolean rewindOffsets = true; // should be retrieved from external application config
public void onPartitionsAssigned(Consumer consumer, Collection<TopicPartition> partitions) {
Map<TopicPartition, Long> latestCommittedOffsets = consumer.committed(partitions);
if(rewindOffsets)
Map<TopicPartition, Long> newOffsets = rewindOffsets(latestCommittedOffsets, 100);
consumer.seek(newOffsets);
}
public void onPartitionsRevoked(Consumer consumer, Collection<TopicPartition> partitions) {
consumer.commit(true);
}
// this API rewinds every partition back by numberOfMessagesToRewindBackTo messages
private Map<TopicPartition, Long> rewindOffsets(Map<TopicPartition, Long> currentOffsets,
long numberOfMessagesToRewindBackTo) {
Map<TopicPartition, Long> newOffsets = new HashMap<TopicPartition, Long>();
for(Map.Entry<TopicPartition, Long> offset : currentOffsets.entrySet())
newOffsets.put(offset.getKey(), offset.getValue() - numberOfMessagesToRewindBackTo);
return newOffsets;
}
});
consumer.subscribe("foo", "bar");
int commitInterval = 100;
int numRecords = 0;
boolean isRunning = true;
Map<TopicPartition, Long> consumedOffsets = new HashMap<TopicPartition, Long>();
while(isRunning) {
Map<String, ConsumerRecords> records = consumer.poll(100);
Map<TopicPartition, Long> lastConsumedOffsets = process(records);
consumedOffsets.putAll(lastConsumedOffsets);
numRecords += records.size();
// commit offsets for all partitions of topics foo, bar synchronously, owned by this consumer instance
if(numRecords % commitInterval == 0)
consumer.commit(consumedOffsets, true);
}
consumer.commit(true);
consumer.close();
This example demonstrates how the consumer can be used to leverage Kafka's group management functionality along with custom offset storage.
In this example, the assumption made is that the user chooses to store the consumer offsets outside Kafka. This requires the user to
plugin logic for retrieving the offsets from a custom store and provide the offsets to the consumer in the ConsumerRebalanceCallback
callback. The onPartitionsAssigned callback is invoked after the consumer is assigned a new set of partitions on rebalance and
before the consumption restarts post rebalance. This is the right place to supply offsets from a custom store to the consumer.
Similarly, the user would also be required to plugin logic for storing the consumer's offsets to a custom store. The onPartitionsRevoked callback is invoked right after the consumer has stopped fetching data and before the partition ownership changes. This is the right place to commit the offsets for the current set of partitions owned by the consumer.
Properties props = new Properties();
props.put("metadata.broker.list", "localhost:9092");
props.put("group.id", "test");
props.put("session.timeout.ms", "1000");
props.put("enable.auto.commit", "false"); // since enable.auto.commit only applies to Kafka based offset storage
KafkaConsumer consumer = new KafkaConsumer(props,
new ConsumerRebalanceCallback() {
public void onPartitionsAssigned(Consumer consumer, Collection<TopicPartition> partitions) {
Map<TopicPartition, Long> lastCommittedOffsets = getLastCommittedOffsetsFromCustomStore(partitions);
consumer.seek(lastCommittedOffsets);
}
public void onPartitionsRevoked(Consumer consumer, Collection<TopicPartition> partitions) {
Map<TopicPartition, Long> offsets = getLastConsumedOffsets(partitions);
commitOffsetsToCustomStore(offsets);
}
// following APIs should be implemented by the user for custom offset management
private Map<TopicPartition, Long> getLastCommittedOffsetsFromCustomStore(Collection<TopicPartition> partitions) {
return null;
}
private Map<TopicPartition, Long> getLastConsumedOffsets(Collection<TopicPartition> partitions) { return null; }
private void commitOffsetsToCustomStore(Map<TopicPartition, Long> offsets) {}
});
consumer.subscribe("foo", "bar");
int commitInterval = 100;
int numRecords = 0;
boolean isRunning = true;
Map<TopicPartition, Long> consumedOffsets = new HashMap<TopicPartition, Long>();
while(isRunning) {
Map<String, ConsumerRecords> records = consumer.poll(100);
Map<TopicPartition, Long> lastConsumedOffsets = process(records);
consumedOffsets.putAll(lastConsumedOffsets);
numRecords += records.size();
// commit offsets for all partitions of topics foo, bar synchronously, owned by this consumer instance
if(numRecords % commitInterval == 0)
commitOffsetsToCustomStore(consumedOffsets);
}
consumer.commit(true);
consumer.close();
This example demonstrates how the consumer can be used to subscribe to specific partitions of certain topics and consume upto the latest
available message for each of those partitions before shutting down. When used to subscribe to specific partitions, the user foregoes
the group management functionality and instead relies on manually configuring the consumer instances to subscribe to a set of partitions.
This example assumes that the user chooses to use Kafka based offset storage. The user still has to specify a group.id to use Kafka
based offset management. However, session.timeout.ms is not required since the Kafka consumer only does automatic failover when group
management is used.
Properties props = new Properties();
props.put("metadata.broker.list", "localhost:9092");
props.put("group.id", "test");
props.put("enable.auto.commit", "true");
props.put("auto.commit.interval.ms", "10000");
KafkaConsumer consumer = new KafkaConsumer(props);
// subscribe to some partitions of topic foo
TopicPartition partition0 = new TopicPartition("foo", 0);
TopicPartition partition1 = new TopicPartition("foo", 1);
TopicPartition[] partitions = new TopicPartition[2];
partitions[0] = partition0;
partitions[1] = partition1;
consumer.subscribe(partitions);
// find the last committed offsets for partitions 0,1 of topic foo
Map<TopicPartition, Long> lastCommittedOffsets = consumer.committed(Arrays.asList(partitions));
// seek to the last committed offsets to avoid duplicates
consumer.seek(lastCommittedOffsets);
// find the offsets of the latest available messages to know where to stop consumption
Map<TopicPartition, Long> latestAvailableOffsets = consumer.offsetsBeforeTime(-2, Arrays.asList(partitions));
boolean isRunning = true;
Map<TopicPartition, Long> consumedOffsets = new HashMap<TopicPartition, Long>();
while(isRunning) {
Map<String, ConsumerRecords> records = consumer.poll(100);
Map<TopicPartition, Long> lastConsumedOffsets = process(records);
consumedOffsets.putAll(lastConsumedOffsets);
for(TopicPartition partition : partitions) {
if(consumedOffsets.get(partition) >= latestAvailableOffsets.get(partition))
isRunning = false;
else
isRunning = true;
}
}
consumer.commit(true);
consumer.close();
This example demonstrates how the consumer can be used to subscribe to specific partitions of certain topics and consume upto the latest
available message for each of those partitions before shutting down. When used to subscribe to specific partitions, the user foregoes
the group management functionality and instead relies on manually configuring the consumer instances to subscribe to a set of partitions.
This example assumes that the user chooses to use custom offset storage.
Properties props = new Properties();
props.put("metadata.broker.list", "localhost:9092");
KafkaConsumer consumer = new KafkaConsumer(props);
// subscribe to some partitions of topic foo
TopicPartition partition0 = new TopicPartition("foo", 0);
TopicPartition partition1 = new TopicPartition("foo", 1);
TopicPartition[] partitions = new TopicPartition[2];
partitions[0] = partition0;
partitions[1] = partition1;
consumer.subscribe(partitions);
Map<TopicPartition, Long> lastCommittedOffsets = getLastCommittedOffsetsFromCustomStore();
// seek to the last committed offsets to avoid duplicates
consumer.seek(lastCommittedOffsets);
// find the offsets of the latest available messages to know where to stop consumption
Map<TopicPartition, Long> latestAvailableOffsets = consumer.offsetsBeforeTime(-2, Arrays.asList(partitions));
boolean isRunning = true;
Map<TopicPartition, Long> consumedOffsets = new HashMap<TopicPartition, Long>();
while(isRunning) {
Map<String, ConsumerRecords> records = consumer.poll(100);
Map<TopicPartition, Long> lastConsumedOffsets = process(records);
consumedOffsets.putAll(lastConsumedOffsets);
// commit offsets for partitions 0,1 for topic foo to custom store
commitOffsetsToCustomStore(consumedOffsets);
for(TopicPartition partition : partitions) {
if(consumedOffsets.get(partition) >= latestAvailableOffsets.get(partition))
isRunning = false;
else
isRunning = true;
}
}
commitOffsetsToCustomStore(consumedOffsets);
consumer.close();
Constructor Summary | |
---|---|
KafkaConsumer(java.util.Map<java.lang.String,java.lang.Object> configs)
A consumer is instantiated by providing a set of key-value pairs as configuration. |
|
KafkaConsumer(java.util.Map<java.lang.String,java.lang.Object> configs,
ConsumerRebalanceCallback callback)
A consumer is instantiated by providing a set of key-value pairs as configuration and a ConsumerRebalanceCallback
implementation |
|
KafkaConsumer(java.util.Properties properties)
A consumer is instantiated by providing a Properties object as configuration. |
|
KafkaConsumer(java.util.Properties properties,
ConsumerRebalanceCallback callback)
A consumer is instantiated by providing a Properties object as configuration and a
ConsumerRebalanceCallback implementation. |
Method Summary | |
---|---|
void |
close()
Close this consumer |
OffsetMetadata |
commit(boolean sync)
Commits offsets returned on the last poll() for the subscribed list of topics and
partitions. |
OffsetMetadata |
commit(java.util.Map<TopicPartition,java.lang.Long> offsets,
boolean sync)
Commits the specified offsets for the specified list of topics and partitions to Kafka. |
java.util.Map<TopicPartition,java.lang.Long> |
committed(java.util.Collection<TopicPartition> partitions)
Fetches the last committed offsets of partitions that the consumer currently consumes. |
java.util.Map<java.lang.String,? extends Metric> |
metrics()
Return a map of metrics maintained by the consumer |
java.util.Map<TopicPartition,java.lang.Long> |
offsetsBeforeTime(long timestamp,
java.util.Collection<TopicPartition> partitions)
Fetches offsets before a certain timestamp. |
java.util.Map<java.lang.String,ConsumerRecords> |
poll(long timeout)
Fetches data for the topics or partitions specified using one of the subscribe APIs. |
java.util.Map<TopicPartition,java.lang.Long> |
position(java.util.Collection<TopicPartition> partitions)
Returns the fetch position of the next message for the specified topic partition to be used on the next poll() |
void |
seek(java.util.Map<TopicPartition,java.lang.Long> offsets)
Overrides the fetch offsets that the consumer will use on the next poll(timeout) . |
void |
subscribe(java.lang.String... topics)
Incrementally subscribes to the given list of topics and uses the consumer's group management functionality |
void |
subscribe(TopicPartition... partitions)
Incrementally subscribes to a specific topic partition and does not use the consumer's group management functionality. |
void |
unsubscribe(java.lang.String... topics)
Unsubscribe from the specific topics. |
void |
unsubscribe(TopicPartition... partitions)
Unsubscribe from the specific topic partitions. |
Methods inherited from class java.lang.Object |
---|
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
Constructor Detail |
---|
public KafkaConsumer(java.util.Map<java.lang.String,java.lang.Object> configs)
Valid configuration strings are documented at ConsumerConfig
configs
- The consumer configspublic KafkaConsumer(java.util.Map<java.lang.String,java.lang.Object> configs, ConsumerRebalanceCallback callback)
ConsumerRebalanceCallback
implementation
Valid configuration strings are documented at ConsumerConfig
configs
- The consumer configscallback
- A callback interface that the user can implement to manage customized offsets on the start and end of
every rebalance operation.public KafkaConsumer(java.util.Properties properties)
Properties
object as configuration.
Valid configuration strings are documented at ConsumerConfig
public KafkaConsumer(java.util.Properties properties, ConsumerRebalanceCallback callback)
Properties
object as configuration and a
ConsumerRebalanceCallback
implementation.
Valid configuration strings are documented at ConsumerConfig
properties
- The consumer configuration propertiescallback
- A callback interface that the user can implement to manage customized offsets on the start and end of
every rebalance operation.Method Detail |
---|
public void subscribe(java.lang.String... topics)
As part of group management, the consumer will keep track of the list of consumers that belong to a particular group and will trigger a rebalance operation if one of the following events trigger -
subscribe
in interface Consumer
topics
- A variable list of topics that the consumer wants to subscribe topublic void subscribe(TopicPartition... partitions)
subscribe
in interface Consumer
partitions
- Partitions to incrementally subscribe topublic void unsubscribe(java.lang.String... topics)
poll()
onwards
unsubscribe
in interface Consumer
topics
- Topics to unsubscribe frompublic void unsubscribe(TopicPartition... partitions)
poll()
onwards
unsubscribe
in interface Consumer
partitions
- Partitions to unsubscribe frompublic java.util.Map<java.lang.String,ConsumerRecords> poll(long timeout)
The offset used for fetching the data is governed by whether or not seek(offsets)
is used. If seek(offsets)
is used, it will use the specified offsets on startup and
on every rebalance, to consume data from that offset sequentially on every poll. If not, it will use the last checkpointed offset
using commit(offsets, sync)
for the subscribed list of partitions.
poll
in interface Consumer
timeout
- The time, in milliseconds, spent waiting in poll if data is not available. If 0, waits indefinitely. Must not be negative
public OffsetMetadata commit(java.util.Map<TopicPartition,java.lang.Long> offsets, boolean sync)
This commits offsets only to Kafka. The offsets committed using this API will be used on the first fetch after every rebalance and also on startup. As such, if you need to store offsets in anything other than Kafka, this API should not be used.
commit
in interface Consumer
offsets
- The list of offsets per partition that should be committed to Kafka.sync
- If true, commit will block until the consumer receives an acknowledgment
OffsetMetadata
object that contains the partition, offset and a corresponding error code. Returns null
if the sync flag is set to false.public OffsetMetadata commit(boolean sync)
poll()
for the subscribed list of topics and
partitions.
This commits offsets only to Kafka. The offsets committed using this API will be used on the first fetch after every rebalance and also on startup. As such, if you need to store offsets in anything other than Kafka, this API should not be used.
commit
in interface Consumer
sync
- If true, commit will block until the consumer receives an acknowledgment
OffsetMetadata
object that contains the partition, offset and a corresponding error code. Returns null
if the sync flag is set to false.public void seek(java.util.Map<TopicPartition,java.lang.Long> offsets)
poll(timeout)
. If this API is invoked
for the same partition more than once, the latest offset will be used on the next poll(). Note that you may lose data if this API is
arbitrarily used in the middle of consumption, to reset the fetch offsets
seek
in interface Consumer
offsets
- The map of fetch positions per topic and partitionpublic java.util.Map<TopicPartition,java.lang.Long> position(java.util.Collection<TopicPartition> partitions)
poll()
position
in interface Consumer
partitions
- Partitions for which the fetch position will be returned
poll()
public java.util.Map<TopicPartition,java.lang.Long> committed(java.util.Collection<TopicPartition> partitions)
seek(offsets)
to rewind consumption of data.
committed
in interface Consumer
partitions
- The list of partitions to return the last committed offset for
commit(sync)
public java.util.Map<TopicPartition,java.lang.Long> offsetsBeforeTime(long timestamp, java.util.Collection<TopicPartition> partitions)
offsetsBeforeTime
in interface Consumer
partitions
- The list of partitions for which the offsets are returnedtimestamp
- The unix timestamp. Value -1 indicates earliest available timestamp. Value -2 indicates latest available timestamp.
public java.util.Map<java.lang.String,? extends Metric> metrics()
Consumer
metrics
in interface Consumer
public void close()
Consumer
close
in interface java.io.Closeable
close
in interface Consumer
|
|||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |