org.apache.kafka.clients.consumer
Class KafkaConsumer

java.lang.Object
  extended by org.apache.kafka.clients.consumer.KafkaConsumer
All Implemented Interfaces:
java.io.Closeable, Consumer

public class KafkaConsumer
extends java.lang.Object
implements Consumer

A Kafka client that consumes records from a Kafka cluster.

The consumer is thread safe and should generally be shared among all threads for best performance.

The consumer is single threaded and multiplexes I/O over TCP connections to each of the brokers it needs to communicate with. Failure to close the consumer after use will leak these resources.

Usage Examples

The consumer APIs offer flexibility to cover a variety of consumption use cases. Following are some examples to demonstrate the correct use of the available APIs. Each of the examples assumes the presence of a user implemented process() method that processes a given batch of messages and returns the offset of the latest processed message per partition. Note that process() is not part of the consumer API and is only used as a convenience method to demonstrate the different use cases of the consumer APIs. Here is a sample implementation of such a process() method.
 private Map<TopicPartition, Long> process(Map<String, ConsumerRecords> records) {
     Map<TopicPartition, Long> processedOffsets = new HashMap<TopicPartition, Long>();
     for(Entry<String, ConsumerRecords> recordMetadata : records.entrySet()) {
          List<ConsumerRecord> recordsPerTopic = recordMetadata.getValue().records();
          for(int i = 0;i < recordsPerTopic.size();i++) {
               ConsumerRecord record = recordsPerTopic.get(i);
               // process record
               try {
               	processedOffsets.put(record.topicAndpartition(), record.offset());
               } catch (Exception e) {
               	e.printStackTrace();
               }               
          }
     }
     return processedOffsets; 
 }
 
 

This example demonstrates how the consumer can be used to leverage Kafka's group management functionality for automatic consumer load balancing and failover. This example assumes that the offsets are stored in Kafka and are automatically committed periodically, as controlled by the auto.commit.interval.ms config

 Properties props = new Properties();
 props.put("metadata.broker.list", "localhost:9092");
 props.put("group.id", "test");
 props.put("session.timeout.ms", "1000");
 props.put("enable.auto.commit", "true");
 props.put("auto.commit.interval.ms", "10000");
 KafkaConsumer consumer = new KafkaConsumer(props);
 consumer.subscribe("foo", "bar");
 boolean isRunning = true;
 while(isRunning) {
   Map<String, ConsumerRecords> records = consumer.poll(100);
   process(records);
 }
 consumer.close();
 
 
This example demonstrates how the consumer can be used to leverage Kafka's group management functionality for automatic consumer load balancing and failover. This example assumes that the offsets are stored in Kafka and are manually committed using the commit(boolean) API. This example also demonstrates rewinding the consumer's offsets if processing of the consumed messages fails. Note that this method of rewinding offsets using seek(offsets) is only useful for rewinding the offsets of the current consumer instance. As such, this will not trigger a rebalance or affect the fetch offsets for the other consumer instances.
 Properties props = new Properties();
 props.put("metadata.broker.list", "localhost:9092");
 props.put("group.id", "test");
 props.put("session.timeout.ms", "1000");
 props.put("enable.auto.commit", "false");
 KafkaConsumer consumer = new KafkaConsumer(props);
 consumer.subscribe("foo", "bar");
 int commitInterval = 100;
 int numRecords = 0;
 boolean isRunning = true;
 Map<TopicPartition, Long> consumedOffsets = new HashMap<TopicPartition, Long>();
 while(isRunning) {
     Map<String, ConsumerRecords> records = consumer.poll(100);
     try {
         Map<TopicPartition, Long> lastConsumedOffsets = process(records);
         consumedOffsets.putAll(lastConsumedOffsets);
         numRecords += records.size();
         // commit offsets for all partitions of topics foo, bar synchronously, owned by this consumer instance
         if(numRecords % commitInterval == 0) 
           consumer.commit(false);
     } catch(Exception e) {
         try {
             // rewind consumer's offsets for failed partitions
             // assume failedPartitions() returns the list of partitions for which the processing of the last batch of messages failed
             List<TopicPartition> failedPartitions = failedPartitions();   
             Map<TopicPartition, Long> offsetsToRewindTo = new HashMap<TopicPartition, Long>();
             for(TopicPartition failedPartition : failedPartitions) {
                 // rewind to the last consumed offset for the failed partition. Since process() failed for this partition, the consumed offset
                 // should still be pointing to the last successfully processed offset and hence is the right offset to rewind consumption to.
                 offsetsToRewindTo.put(failedPartition, consumedOffsets.get(failedPartition));
             }
             // seek to new offsets only for partitions that failed the last process()
             consumer.seek(offsetsToRewindTo);
         } catch(Exception e) {  break; } // rewind failed
     }
 }         
 consumer.close();
 
 

This example demonstrates how to rewind the offsets of the entire consumer group. It is assumed that the user has chosen to use Kafka's group management functionality for automatic consumer load balancing and failover. This example also assumes that the offsets are stored in Kafka. If group management is used, the right place to systematically rewind offsets for every consumer instance is inside the ConsumerRebalanceCallback. The onPartitionsAssigned callback is invoked after the consumer is assigned a new set of partitions on rebalance and before the consumption restarts post rebalance. This is the right place to supply the newly rewound offsets to the consumer. It is recommended that if you foresee the requirement to ever reset the consumer's offsets in the presence of group management, that you always configure the consumer to use the ConsumerRebalanceCallback with a flag that protects whether or not the offset rewind logic is used. This method of rewinding offsets is useful if you notice an issue with your message processing after successful consumption and offset commit. And you would like to rewind the offsets for the entire consumer group as part of rolling out a fix to your processing logic. In this case, you would configure each of your consumer instances with the offset rewind configuration flag turned on and bounce each consumer instance in a rolling restart fashion. Each restart will trigger a rebalance and eventually all consumer instances would have rewound the offsets for the partitions they own, effectively rewinding the offsets for the entire consumer group.

 Properties props = new Properties();
 props.put("metadata.broker.list", "localhost:9092");
 props.put("group.id", "test");
 props.put("session.timeout.ms", "1000");
 props.put("enable.auto.commit", "false");
 KafkaConsumer consumer = new KafkaConsumer(props,
                                            new ConsumerRebalanceCallback() {
                                                boolean rewindOffsets = true;  // should be retrieved from external application config
                                                public void onPartitionsAssigned(Consumer consumer, Collection<TopicPartition> partitions) {
                                                    Map<TopicPartition, Long> latestCommittedOffsets = consumer.committed(partitions);
                                                    if(rewindOffsets)
                                                        Map<TopicPartition, Long> newOffsets = rewindOffsets(latestCommittedOffsets, 100);
                                                    consumer.seek(newOffsets);
                                                }
                                                public void onPartitionsRevoked(Consumer consumer, Collection<TopicPartition> partitions) {
                                                    consumer.commit(true);
                                                }
                                                // this API rewinds every partition back by numberOfMessagesToRewindBackTo messages 
                                                private Map<TopicPartition, Long> rewindOffsets(Map<TopicPartition, Long> currentOffsets,
                                                                                                long numberOfMessagesToRewindBackTo) {
                                                    Map<TopicPartition, Long> newOffsets = new HashMap<TopicPartition, Long>();
                                                    for(Map.Entry<TopicPartition, Long> offset : currentOffsets.entrySet()) 
                                                        newOffsets.put(offset.getKey(), offset.getValue() - numberOfMessagesToRewindBackTo);
                                                    return newOffsets;
                                                }
                                            });
 consumer.subscribe("foo", "bar");
 int commitInterval = 100;
 int numRecords = 0;
 boolean isRunning = true;
 Map<TopicPartition, Long> consumedOffsets = new HashMap<TopicPartition, Long>();
 while(isRunning) {
     Map<String, ConsumerRecords> records = consumer.poll(100);
     Map<TopicPartition, Long> lastConsumedOffsets = process(records);
     consumedOffsets.putAll(lastConsumedOffsets);
     numRecords += records.size();
     // commit offsets for all partitions of topics foo, bar synchronously, owned by this consumer instance
     if(numRecords % commitInterval == 0) 
         consumer.commit(consumedOffsets, true);
 }
 consumer.commit(true);
 consumer.close();
 
 
This example demonstrates how the consumer can be used to leverage Kafka's group management functionality along with custom offset storage. In this example, the assumption made is that the user chooses to store the consumer offsets outside Kafka. This requires the user to plugin logic for retrieving the offsets from a custom store and provide the offsets to the consumer in the ConsumerRebalanceCallback callback. The onPartitionsAssigned callback is invoked after the consumer is assigned a new set of partitions on rebalance and before the consumption restarts post rebalance. This is the right place to supply offsets from a custom store to the consumer.

Similarly, the user would also be required to plugin logic for storing the consumer's offsets to a custom store. The onPartitionsRevoked callback is invoked right after the consumer has stopped fetching data and before the partition ownership changes. This is the right place to commit the offsets for the current set of partitions owned by the consumer.

 Properties props = new Properties();
 props.put("metadata.broker.list", "localhost:9092");
 props.put("group.id", "test");
 props.put("session.timeout.ms", "1000");
 props.put("enable.auto.commit", "false"); // since enable.auto.commit only applies to Kafka based offset storage
 KafkaConsumer consumer = new KafkaConsumer(props,
                                            new ConsumerRebalanceCallback() {
                                                public void onPartitionsAssigned(Consumer consumer, Collection<TopicPartition> partitions) {
                                                    Map<TopicPartition, Long> lastCommittedOffsets = getLastCommittedOffsetsFromCustomStore(partitions);
                                                    consumer.seek(lastCommittedOffsets);
                                                }
                                                public void onPartitionsRevoked(Consumer consumer, Collection<TopicPartition> partitions) {
                                                    Map<TopicPartition, Long> offsets = getLastConsumedOffsets(partitions);
                                                    commitOffsetsToCustomStore(offsets); 
                                                }
                                                // following APIs should be implemented by the user for custom offset management
                                                private Map<TopicPartition, Long> getLastCommittedOffsetsFromCustomStore(Collection<TopicPartition> partitions) {
                                                    return null;
                                                }
                                                private Map<TopicPartition, Long> getLastConsumedOffsets(Collection<TopicPartition> partitions) { return null; }
                                                private void commitOffsetsToCustomStore(Map<TopicPartition, Long> offsets) {}
                                            });
 consumer.subscribe("foo", "bar");
 int commitInterval = 100;
 int numRecords = 0;
 boolean isRunning = true;
 Map<TopicPartition, Long> consumedOffsets = new HashMap<TopicPartition, Long>();
 while(isRunning) {
     Map<String, ConsumerRecords> records = consumer.poll(100);
     Map<TopicPartition, Long> lastConsumedOffsets = process(records);
     consumedOffsets.putAll(lastConsumedOffsets);
     numRecords += records.size();
     // commit offsets for all partitions of topics foo, bar synchronously, owned by this consumer instance
     if(numRecords % commitInterval == 0) 
         commitOffsetsToCustomStore(consumedOffsets);
 }
 consumer.commit(true);
 consumer.close();
 
 
This example demonstrates how the consumer can be used to subscribe to specific partitions of certain topics and consume upto the latest available message for each of those partitions before shutting down. When used to subscribe to specific partitions, the user foregoes the group management functionality and instead relies on manually configuring the consumer instances to subscribe to a set of partitions. This example assumes that the user chooses to use Kafka based offset storage. The user still has to specify a group.id to use Kafka based offset management. However, session.timeout.ms is not required since the Kafka consumer only does automatic failover when group management is used.
 Properties props = new Properties();
 props.put("metadata.broker.list", "localhost:9092");
 props.put("group.id", "test");
 props.put("enable.auto.commit", "true");
 props.put("auto.commit.interval.ms", "10000");
 KafkaConsumer consumer = new KafkaConsumer(props);
 // subscribe to some partitions of topic foo
 TopicPartition partition0 = new TopicPartition("foo", 0);
 TopicPartition partition1 = new TopicPartition("foo", 1);
 TopicPartition[] partitions = new TopicPartition[2];
 partitions[0] = partition0;
 partitions[1] = partition1;
 consumer.subscribe(partitions);
 // find the last committed offsets for partitions 0,1 of topic foo
 Map<TopicPartition, Long> lastCommittedOffsets = consumer.committed(Arrays.asList(partitions));
 // seek to the last committed offsets to avoid duplicates
 consumer.seek(lastCommittedOffsets);        
 // find the offsets of the latest available messages to know where to stop consumption
 Map<TopicPartition, Long> latestAvailableOffsets = consumer.offsetsBeforeTime(-2, Arrays.asList(partitions));
 boolean isRunning = true;
 Map<TopicPartition, Long> consumedOffsets = new HashMap<TopicPartition, Long>();
 while(isRunning) {
     Map<String, ConsumerRecords> records = consumer.poll(100);
     Map<TopicPartition, Long> lastConsumedOffsets = process(records);
     consumedOffsets.putAll(lastConsumedOffsets);
     for(TopicPartition partition : partitions) {
         if(consumedOffsets.get(partition) >= latestAvailableOffsets.get(partition))
             isRunning = false;
         else
             isRunning = true;
     }
 }
 consumer.commit(true);
 consumer.close();
 
 
This example demonstrates how the consumer can be used to subscribe to specific partitions of certain topics and consume upto the latest available message for each of those partitions before shutting down. When used to subscribe to specific partitions, the user foregoes the group management functionality and instead relies on manually configuring the consumer instances to subscribe to a set of partitions. This example assumes that the user chooses to use custom offset storage.
 Properties props = new Properties();
 props.put("metadata.broker.list", "localhost:9092");
 KafkaConsumer consumer = new KafkaConsumer(props);
 // subscribe to some partitions of topic foo
 TopicPartition partition0 = new TopicPartition("foo", 0);
 TopicPartition partition1 = new TopicPartition("foo", 1);
 TopicPartition[] partitions = new TopicPartition[2];
 partitions[0] = partition0;
 partitions[1] = partition1;
 consumer.subscribe(partitions);
 Map<TopicPartition, Long> lastCommittedOffsets = getLastCommittedOffsetsFromCustomStore();
 // seek to the last committed offsets to avoid duplicates
 consumer.seek(lastCommittedOffsets);        
 // find the offsets of the latest available messages to know where to stop consumption
 Map<TopicPartition, Long> latestAvailableOffsets = consumer.offsetsBeforeTime(-2, Arrays.asList(partitions));
 boolean isRunning = true;
 Map<TopicPartition, Long> consumedOffsets = new HashMap<TopicPartition, Long>();
 while(isRunning) {
     Map<String, ConsumerRecords> records = consumer.poll(100);
     Map<TopicPartition, Long> lastConsumedOffsets = process(records);
     consumedOffsets.putAll(lastConsumedOffsets);
     // commit offsets for partitions 0,1 for topic foo to custom store
     commitOffsetsToCustomStore(consumedOffsets);
     for(TopicPartition partition : partitions) {
         if(consumedOffsets.get(partition) >= latestAvailableOffsets.get(partition))
             isRunning = false;
         else
             isRunning = true;
     }            
 }      
 commitOffsetsToCustomStore(consumedOffsets);   
 consumer.close();
 
 


Constructor Summary
KafkaConsumer(java.util.Map<java.lang.String,java.lang.Object> configs)
          A consumer is instantiated by providing a set of key-value pairs as configuration.
KafkaConsumer(java.util.Map<java.lang.String,java.lang.Object> configs, ConsumerRebalanceCallback callback)
          A consumer is instantiated by providing a set of key-value pairs as configuration and a ConsumerRebalanceCallback implementation
KafkaConsumer(java.util.Properties properties)
          A consumer is instantiated by providing a Properties object as configuration.
KafkaConsumer(java.util.Properties properties, ConsumerRebalanceCallback callback)
          A consumer is instantiated by providing a Properties object as configuration and a ConsumerRebalanceCallback implementation.
 
Method Summary
 void close()
          Close this consumer
 OffsetMetadata commit(boolean sync)
          Commits offsets returned on the last poll() for the subscribed list of topics and partitions.
 OffsetMetadata commit(java.util.Map<TopicPartition,java.lang.Long> offsets, boolean sync)
          Commits the specified offsets for the specified list of topics and partitions to Kafka.
 java.util.Map<TopicPartition,java.lang.Long> committed(java.util.Collection<TopicPartition> partitions)
          Fetches the last committed offsets of partitions that the consumer currently consumes.
 java.util.Map<java.lang.String,? extends Metric> metrics()
          Return a map of metrics maintained by the consumer
 java.util.Map<TopicPartition,java.lang.Long> offsetsBeforeTime(long timestamp, java.util.Collection<TopicPartition> partitions)
          Fetches offsets before a certain timestamp.
 java.util.Map<java.lang.String,ConsumerRecords> poll(long timeout)
          Fetches data for the topics or partitions specified using one of the subscribe APIs.
 java.util.Map<TopicPartition,java.lang.Long> position(java.util.Collection<TopicPartition> partitions)
          Returns the fetch position of the next message for the specified topic partition to be used on the next poll()
 void seek(java.util.Map<TopicPartition,java.lang.Long> offsets)
          Overrides the fetch offsets that the consumer will use on the next poll(timeout).
 void subscribe(java.lang.String... topics)
          Incrementally subscribes to the given list of topics and uses the consumer's group management functionality
 void subscribe(TopicPartition... partitions)
          Incrementally subscribes to a specific topic partition and does not use the consumer's group management functionality.
 void unsubscribe(java.lang.String... topics)
          Unsubscribe from the specific topics.
 void unsubscribe(TopicPartition... partitions)
          Unsubscribe from the specific topic partitions.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

KafkaConsumer

public KafkaConsumer(java.util.Map<java.lang.String,java.lang.Object> configs)
A consumer is instantiated by providing a set of key-value pairs as configuration. Valid configuration strings are documented here. Values can be either strings or Objects of the appropriate type (for example a numeric configuration would accept either the string "42" or the integer 42).

Valid configuration strings are documented at ConsumerConfig

Parameters:
configs - The consumer configs

KafkaConsumer

public KafkaConsumer(java.util.Map<java.lang.String,java.lang.Object> configs,
                     ConsumerRebalanceCallback callback)
A consumer is instantiated by providing a set of key-value pairs as configuration and a ConsumerRebalanceCallback implementation

Valid configuration strings are documented at ConsumerConfig

Parameters:
configs - The consumer configs
callback - A callback interface that the user can implement to manage customized offsets on the start and end of every rebalance operation.

KafkaConsumer

public KafkaConsumer(java.util.Properties properties)
A consumer is instantiated by providing a Properties object as configuration. Valid configuration strings are documented at ConsumerConfig


KafkaConsumer

public KafkaConsumer(java.util.Properties properties,
                     ConsumerRebalanceCallback callback)
A consumer is instantiated by providing a Properties object as configuration and a ConsumerRebalanceCallback implementation.

Valid configuration strings are documented at ConsumerConfig

Parameters:
properties - The consumer configuration properties
callback - A callback interface that the user can implement to manage customized offsets on the start and end of every rebalance operation.
Method Detail

subscribe

public void subscribe(java.lang.String... topics)
Incrementally subscribes to the given list of topics and uses the consumer's group management functionality

As part of group management, the consumer will keep track of the list of consumers that belong to a particular group and will trigger a rebalance operation if one of the following events trigger -

Specified by:
subscribe in interface Consumer
Parameters:
topics - A variable list of topics that the consumer wants to subscribe to

subscribe

public void subscribe(TopicPartition... partitions)
Incrementally subscribes to a specific topic partition and does not use the consumer's group management functionality. As such, there will be no rebalance operation triggered when group membership or cluster and topic metadata change.

Specified by:
subscribe in interface Consumer
Parameters:
partitions - Partitions to incrementally subscribe to

unsubscribe

public void unsubscribe(java.lang.String... topics)
Unsubscribe from the specific topics. This will trigger a rebalance operation and messages for this topic will not be returned from the next poll() onwards

Specified by:
unsubscribe in interface Consumer
Parameters:
topics - Topics to unsubscribe from

unsubscribe

public void unsubscribe(TopicPartition... partitions)
Unsubscribe from the specific topic partitions. Messages for these partitions will not be returned from the next poll() onwards

Specified by:
unsubscribe in interface Consumer
Parameters:
partitions - Partitions to unsubscribe from

poll

public java.util.Map<java.lang.String,ConsumerRecords> poll(long timeout)
Fetches data for the topics or partitions specified using one of the subscribe APIs. It is an error to not have subscribed to any topics or partitions before polling for data.

The offset used for fetching the data is governed by whether or not seek(offsets) is used. If seek(offsets) is used, it will use the specified offsets on startup and on every rebalance, to consume data from that offset sequentially on every poll. If not, it will use the last checkpointed offset using commit(offsets, sync) for the subscribed list of partitions.

Specified by:
poll in interface Consumer
Parameters:
timeout - The time, in milliseconds, spent waiting in poll if data is not available. If 0, waits indefinitely. Must not be negative
Returns:
map of topic to records since the last fetch for the subscribed list of topics and partitions

commit

public OffsetMetadata commit(java.util.Map<TopicPartition,java.lang.Long> offsets,
                             boolean sync)
Commits the specified offsets for the specified list of topics and partitions to Kafka.

This commits offsets only to Kafka. The offsets committed using this API will be used on the first fetch after every rebalance and also on startup. As such, if you need to store offsets in anything other than Kafka, this API should not be used.

Specified by:
commit in interface Consumer
Parameters:
offsets - The list of offsets per partition that should be committed to Kafka.
sync - If true, commit will block until the consumer receives an acknowledgment
Returns:
An OffsetMetadata object that contains the partition, offset and a corresponding error code. Returns null if the sync flag is set to false.

commit

public OffsetMetadata commit(boolean sync)
Commits offsets returned on the last poll() for the subscribed list of topics and partitions.

This commits offsets only to Kafka. The offsets committed using this API will be used on the first fetch after every rebalance and also on startup. As such, if you need to store offsets in anything other than Kafka, this API should not be used.

Specified by:
commit in interface Consumer
Parameters:
sync - If true, commit will block until the consumer receives an acknowledgment
Returns:
An OffsetMetadata object that contains the partition, offset and a corresponding error code. Returns null if the sync flag is set to false.

seek

public void seek(java.util.Map<TopicPartition,java.lang.Long> offsets)
Overrides the fetch offsets that the consumer will use on the next poll(timeout). If this API is invoked for the same partition more than once, the latest offset will be used on the next poll(). Note that you may lose data if this API is arbitrarily used in the middle of consumption, to reset the fetch offsets

Specified by:
seek in interface Consumer
Parameters:
offsets - The map of fetch positions per topic and partition

position

public java.util.Map<TopicPartition,java.lang.Long> position(java.util.Collection<TopicPartition> partitions)
Returns the fetch position of the next message for the specified topic partition to be used on the next poll()

Specified by:
position in interface Consumer
Parameters:
partitions - Partitions for which the fetch position will be returned
Returns:
The position from which data will be fetched for the specified partition on the next poll()

committed

public java.util.Map<TopicPartition,java.lang.Long> committed(java.util.Collection<TopicPartition> partitions)
Fetches the last committed offsets of partitions that the consumer currently consumes. This API is only relevant if Kafka based offset storage is used. This API can be used in conjunction with seek(offsets) to rewind consumption of data.

Specified by:
committed in interface Consumer
Parameters:
partitions - The list of partitions to return the last committed offset for
Returns:
The list of offsets committed on the last commit(sync)

offsetsBeforeTime

public java.util.Map<TopicPartition,java.lang.Long> offsetsBeforeTime(long timestamp,
                                                                      java.util.Collection<TopicPartition> partitions)
Fetches offsets before a certain timestamp. Note that the offsets returned are approximately computed and do not correspond to the exact message at the given timestamp. As such, if the consumer is rewound to offsets returned by this API, there may be duplicate messages returned by the consumer.

Specified by:
offsetsBeforeTime in interface Consumer
Parameters:
partitions - The list of partitions for which the offsets are returned
timestamp - The unix timestamp. Value -1 indicates earliest available timestamp. Value -2 indicates latest available timestamp.
Returns:
The offsets per partition before the specified timestamp.

metrics

public java.util.Map<java.lang.String,? extends Metric> metrics()
Description copied from interface: Consumer
Return a map of metrics maintained by the consumer

Specified by:
metrics in interface Consumer

close

public void close()
Description copied from interface: Consumer
Close this consumer

Specified by:
close in interface java.io.Closeable
Specified by:
close in interface Consumer