如果微服务在 kafka 提交之前崩溃,kafka 消息会发生什么情况?
我是 kafka 的新手。我有一个使用 java 微服务的 Kafka Stream,它使用生产者和进程生成的来自 kafka 主题的消息。 kafka 提交间隔已使用 auto.commit.interval.ms 设置。我的问题是,在提交之前,如果微服务崩溃,已处理但未提交的消息会发生什么?会不会有重复记录?如果发生这种重复,如何解决?
I am new to kafka.I have a Kafka Stream using java microservice that consumes the messages from kafka topic produced by producer and processes. The kafka commit interval has been set using the auto.commit.interval.ms
. My question is, before commit if the microservice crashes , what will happen to the messages that got processed but didn't get committed? will there be duplicated records? and how to resolve this duplication, if happens?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
Kafka 具有一次性语义,保证记录只会被处理一次。查看 Spring Kafka 文档的本节有关 Spring 支持的更多详细信息。另外,请参阅本节了解对事务的支持。
Kafka has exactly-once-semantics which guarantees the records will get processed only once. Take a look at this section of Spring Kafka's docs for more details on the Spring support for that. Also, see this section for the support for transactions.
Kafka 提供了各种传递语义。这些交付语义可以根据您已实现的用例来决定。
如果您担心您的消息不会因消费者服务而丢失 - 您应该继续使用
at-lease Once
传递语义。现在根据
至少一次
传递语义回答您的问题:如果您的消费者服务在提交 Kafka 消息之前崩溃,那么一旦您的消费者服务启动,它将重新流式传输消息,并且跑步。这是因为分区的偏移量未提交。一旦消息被消费者处理,就会提交分区的偏移量。简单来说,它表示偏移量已经处理完毕,Kafka 不会发送同一分区的已提交消息。
至少一次
传递语义通常足以满足数据重复不是大问题或在消费者端可以进行重复数据删除的用例。例如 - 每条消息中都有唯一的键,当将重复数据写入数据库时,消息可能会被拒绝。Kafka provides various delivery semantics. These delivery semantics can be decided on the basis of your use-case you've implemented.
If you're concerned that your messages should not get lost by consumer service - you should go ahead with
at-lease once
delivery semantic.Now answering your question on the basis of
at-least once
delivery semantics:If your consumer service crashes before committing the Kafka message, it will re-stream the message once your consumer service is up and running. This is because the offset for a partition was not committed. Once the message is processed by the consumer, committing an offset for a partition happens. In simple words, it says that the offset has been processed and Kafka will not send the committed message for the same partition.
at-least once
delivery semantics are usually good enough for use cases where data duplication is not a big issue or deduplication is possible on the consumer side. For example - with a unique key in each message, a message can be rejected when writing duplicate data to the database.主要有三种类型的传递语义,
最多一次-
消费者一收到消息就提交偏移量。
这有点冒险,如果处理出错,消息就会丢失。
至少一次-
偏移量是在消息处理后提交的,因此它通常是首选。
如果处理出错,消息将被再次读取,因为它尚未提交。
这样做的问题是消息的重复处理,因此请确保您的处理是幂等的。 (是的,你的应用程序应该处理重复项,Kafka 不会在这里提供帮助)
意味着再次处理时不会影响您的系统。
恰好一次-
使用kafka Streams API可以实现kafka到kafka的通信。
这不是你的情况。
您可以根据您的要求从上面选择语义。
There are mainly three types of delivery semantics,
At most once-
Offsets are committed as soon as the message is received at consumer.
It's a bit risky as if the processing goes wrong the message will be lost.
At least once-
Offsets are committed after the messages processed so it's usually the preferred one.
If the processing goes wrong the message will be read again as its not been committed.
The problem with this is duplicate processing of message so make sure your processing is idempotent. (Yes your application should handle duplicates, Kafka won't help here)
Means in case of processing again will not impact your system.
Exactly once-
Can be achieved for kafka to kafka communication using kafka streams API.
Its not your case.
You can choose semantics from above as per your requirement.