When producing messages to Kafka, failures can occur due various reasons such as network issues, broker unavailability, or transient errors. Kafka provides a robust retry mechanism to handle such scenarios, ensuring reliable message delivery. The retry mechanism is completely configurable without requiring any additional application logic. In this blog, we will explore Kafka producer retries, their configuration, and best practices.
What are Kafka Producer Retries?
Kafka producer retries allow the producer to automatically resend failed messages. If a message send attempt fails due to a transient error, the producer retries the operation based on its configuration. This mechanism improves reliability by reducing the likelihood of message loss due to temporary issues.
How Kafka Producer Retries Work
When a producer sends a message to a Kafka broker, the broker may fail to acknowledge the message due to errors such as:
- Network issues – Temporary disconnections or high latency.
- Broker unavailability – The leader for a partition is unavailable.
- Request timeouts – The broker does not respond within the configured timeout.
The producer’s retry mechanism kicks in by reattempting the message send, helping to ensure successful delivery without manual intervention.
Key Configuration Properties for Retries
There are mainly 4 configuration properties that drive producer retries –
1. retries
This property defines the number of retry attempts the producer will make for a failed send.
- Default:
2147483647
(maximum integer value, effectively unlimited retries). - Example –
props.put("retries", 3); // Retry up to 3 times
Find official documentation here.
2. delivery.timeout.ms
Specifies the maximum amount of time the producer will attempt to deliver a message, including retries. If the delivery is not successful within this time, the producer gives up. Note that the window for this timeout starts after a call to send()
method returns.
- Default:
120000
(2 minutes). - Example –
props.put("delivery.timeout.ms", 30000); // 30 seconds
Find official documentation here.
3. retry.backoff.ms
Defines the time interval between retry attempts. This helps avoid overwhelming the broker with rapid retry requests.
- Default:
100ms
. - Example –
props.put("retry.backoff.ms", 500); // 500 milliseconds
Find official documentation here.
4. acks
Controls the acknowledgment level required for a successful send. Proper configuration of acks
complements retries for ensuring reliability.
- Default:
all
- Possible values are –
0
– this means the producer will not wait for any acknowledgements from the Kafka server at all.1
– this means the message is considered as a successful send if the leader broker acknowledges receipt of the message.all
– this means that the message is considered a successful send only if all the followers/replicas acknowledge receipt of the message. This is the strongest available guarantee and is recommended for ensuring zero data loss. It guarantees message durability by waiting for all in-sync replicas to acknowledge.
- Example –
props.put("acks", "all");
Find official documentation here.
Retry Mechanism in Action
Here is an example of configuring retries in a Kafka producer:
import org.apache.kafka.clients.producer.KafkaProducer;
import org.apache.kafka.clients.producer.ProducerRecord;
import java.util.Properties;
public class KafkaProducerWithRetries {
public static void main(String[] args) {
Properties props = new Properties();
props.put("bootstrap.servers", "localhost:9092");
props.put("key.serializer", "org.apache.kafka.common.serialization.StringSerializer");
props.put("value.serializer", "org.apache.kafka.common.serialization.StringSerializer");
props.put("acks", "all");
props.put("retries", 3);
props.put("retry.backoff.ms", 500);
props.put("delivery.timeout.ms", 30000);
KafkaProducer<String, String> producer = new KafkaProducer<>(props);
ProducerRecord<String, String> record = new ProducerRecord<>("my-topic", "key1", "value1");
try {
// send message synchronously
producer.send(record).get();
System.out.println("Message sent successfully");
} catch (Exception e) {
System.err.println("Error sending message: " + e.getMessage());
} finally {
producer.close();
}
}
}
Best Practices for Configuring Retries
- Set Reasonable Retry Limits
- Avoid setting excessively high retry counts to prevent unnecessary delays and resource usage. Use
delivery.timeout.ms
to cap the total retry duration.
- Avoid setting excessively high retry counts to prevent unnecessary delays and resource usage. Use
- Use
acks=all
for Durability- Combine retries with
acks=all
to ensure that messages are only considered successfully sent when fully replicated to all in-sync replicas.
- Combine retries with
- Monitor and Tune Backoff Intervals
- Adjust
retry.backoff.ms
based on your system’s latency and broker performance to avoid excessive retries in a short period.
- Adjust
- Monitor Producer Metrics
- Use Kafka’s producer metrics (e.g.,
record-retry-rate
) to track retry behavior and detect potential issues.
- Use Kafka’s producer metrics (e.g.,
Common Pitfalls and How to Avoid Them
1. Infinite Retry Loops
- Ensure
delivery.timeout.ms
is appropriately configured to prevent retry loops from running indefinitely.
2. Overloading the Broker
- Set a reasonable
retry.backoff.ms
to prevent overwhelming the broker with retry requests.
3. Ignoring Exception Handling
- Always handle exceptions from the
send
method to log errors and take corrective actions as needed.
4. Enable Idempotence
- Use
enable.idempotence=true
to avoid duplicates during retries.
Conclusion
Kafka producer retries are a powerful feature for enhancing message delivery reliability. By configuring properties like retries
, retry.backoff.ms
, and delivery.timeout.ms
, you can handle transient failures gracefully while maintaining high throughput and durability. Combine retries with robust error handling and monitoring practices to build resilient Kafka-based applications. Happy streaming!
Pingback: Kafka Producer Acknowledgements: A Complete Guide to Message Reliability - CodingJigs