Kafka Producer Retries: Everything You Need To Know For Reliable Streaming

When producing messages to Kafka, failures can occur due various reasons such as network issues, broker unavailability, or transient errors. Kafka provides a robust retry mechanism to handle such scenarios, ensuring reliable message delivery. The retry mechanism is completely configurable without requiring any additional application logic. In this blog, we will explore Kafka producer retries, their configuration, and best practices.

What are Kafka Producer Retries?

Kafka producer retries allow the producer to automatically resend failed messages. If a message send attempt fails due to a transient error, the producer retries the operation based on its configuration. This mechanism improves reliability by reducing the likelihood of message loss due to temporary issues.

How Kafka Producer Retries Work

When a producer sends a message to a Kafka broker, the broker may fail to acknowledge the message due to errors such as:

  • Network issues – Temporary disconnections or high latency.
  • Broker unavailability – The leader for a partition is unavailable.
  • Request timeouts – The broker does not respond within the configured timeout.

The producer’s retry mechanism kicks in by reattempting the message send, helping to ensure successful delivery without manual intervention.

Key Configuration Properties for Retries

There are mainly 4 configuration properties that drive producer retries –

1. retries

This property defines the number of retry attempts the producer will make for a failed send.

  • Default: 2147483647 (maximum integer value, effectively unlimited retries).
  • Example –
props.put("retries", 3); // Retry up to 3 times

Find official documentation here.

2. delivery.timeout.ms

Specifies the maximum amount of time the producer will attempt to deliver a message, including retries. If the delivery is not successful within this time, the producer gives up. Note that the window for this timeout starts after a call to send() method returns.

  • Default: 120000 (2 minutes).
  • Example –
props.put("delivery.timeout.ms", 30000); // 30 seconds

Find official documentation here.

3. retry.backoff.ms

Defines the time interval between retry attempts. This helps avoid overwhelming the broker with rapid retry requests.

  • Default: 100ms.
  • Example –
props.put("retry.backoff.ms", 500); // 500 milliseconds

Find official documentation here.

4. acks

Controls the acknowledgment level required for a successful send. Proper configuration of acks complements retries for ensuring reliability.

  • Default: all
  • Possible values are –
    • 0 – this means the producer will not wait for any acknowledgements from the Kafka server at all.
    • 1 – this means the message is considered as a successful send if the leader broker acknowledges receipt of the message.
    • all – this means that the message is considered a successful send only if all the followers/replicas acknowledge receipt of the message. This is the strongest available guarantee and is recommended for ensuring zero data loss. It guarantees message durability by waiting for all in-sync replicas to acknowledge.
  • Example –
props.put("acks", "all");

Find official documentation here.

Retry Mechanism in Action

Here is an example of configuring retries in a Kafka producer:

import org.apache.kafka.clients.producer.KafkaProducer;
import org.apache.kafka.clients.producer.ProducerRecord;
import java.util.Properties;

public class KafkaProducerWithRetries {
    public static void main(String[] args) {
        Properties props = new Properties();
        props.put("bootstrap.servers", "localhost:9092");
        props.put("key.serializer", "org.apache.kafka.common.serialization.StringSerializer");
        props.put("value.serializer", "org.apache.kafka.common.serialization.StringSerializer");
        props.put("acks", "all");
        props.put("retries", 3);
        props.put("retry.backoff.ms", 500);
        props.put("delivery.timeout.ms", 30000);

        KafkaProducer<String, String> producer = new KafkaProducer<>(props);

        ProducerRecord<String, String> record = new ProducerRecord<>("my-topic", "key1", "value1");

        try {
            // send message synchronously
            producer.send(record).get();
            System.out.println("Message sent successfully");
        } catch (Exception e) {
            System.err.println("Error sending message: " + e.getMessage());
        } finally {
            producer.close();
        }
    }
}

Best Practices for Configuring Retries

  1. Set Reasonable Retry Limits
    • Avoid setting excessively high retry counts to prevent unnecessary delays and resource usage. Use delivery.timeout.ms to cap the total retry duration.
  2. Use acks=all for Durability
    • Combine retries with acks=all to ensure that messages are only considered successfully sent when fully replicated to all in-sync replicas.
  3. Monitor and Tune Backoff Intervals
    • Adjust retry.backoff.ms based on your system’s latency and broker performance to avoid excessive retries in a short period.
  4. Monitor Producer Metrics
    • Use Kafka’s producer metrics (e.g., record-retry-rate) to track retry behavior and detect potential issues.

Common Pitfalls and How to Avoid Them

1. Infinite Retry Loops

  • Ensure delivery.timeout.ms is appropriately configured to prevent retry loops from running indefinitely.

2. Overloading the Broker

  • Set a reasonable retry.backoff.ms to prevent overwhelming the broker with retry requests.

3. Ignoring Exception Handling

  • Always handle exceptions from the send method to log errors and take corrective actions as needed.

4. Enable Idempotence

  • Use enable.idempotence=true to avoid duplicates during retries.

Conclusion

Kafka producer retries are a powerful feature for enhancing message delivery reliability. By configuring properties like retries, retry.backoff.ms, and delivery.timeout.ms, you can handle transient failures gracefully while maintaining high throughput and durability. Combine retries with robust error handling and monitoring practices to build resilient Kafka-based applications. Happy streaming!

1 thought on “Kafka Producer Retries: Everything You Need To Know For Reliable Streaming”

  1. Pingback: Kafka Producer Acknowledgements: A Complete Guide to Message Reliability - CodingJigs

Leave a Comment

Your email address will not be published. Required fields are marked *

Scroll to Top