Apache Kafka消息被存档 - 是否可以检索消息

发布于 2025-02-11 07:50:06 字数 195 浏览 1 评论 0 原文

我们正在使用Apache Kafka,我们每天处理超过3000万条消息。我们的保留政策为“ 30”天。但是,在30天之前,我们的消息被存档了。

有什么方法可以检索已删除的消息? 是否可以将“启动索引”重置为较旧的索引以通过查询检索数据?

我们还有哪些其他选择?

如果我们有“磁盘备份”,我们可以将其用于检索数据吗?

谢谢

We are using Apache Kafka and we process more than 30 million messages per day. We have an retention policy of "30" days. However, before 30 days, our messages got archived.

Is there a way we could retrieve the deleted messages?
Is it possible to reset the "start index" to older index to retrieve the data through query?

What other options do we have?

If we have "disk backup", could we use that for retrieving the data?

Thank You

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

扬花落满肩 2025-02-18 07:50:06

我假设您的消息在这里被KAFKA群集删除

从理论上讲,如果您可以访问备份,则可以将KAFKA数据存储器文件移至服务器目录,但行为不确定。尝试使用无限尺寸/时间策略的新鲜群集(因此,没有立即清除)可能会起作用,并让您再次消费。

I'm assuming your messages got deleted by the Kafka cluster here.

In general, no - if the records got deleted due to duration / size related policies, then they have been removed.

Theoretically, if you have access to backups you might move the Kafka data-log files to server directory, but the behaviour is undefined. Trying that with a fresh cluster with infinite size/time policies (so nothing gets purged instantly) might work and let you consume again.

影子的影子 2025-02-18 07:50:06

根据我的经验,直到,没有免费/简便的方法来恢复数据(通过KAFKA消费者协议)。

例如,您可以使用一些KAFKA连接接收器连接器来写入一些外部,更持久的存储。那么,您想写一个刮擦数据的作业吗?当然,您可以拥有一个字符串主题,int Timestamp,blob键,blob值的SQL数据库表,也许与此轨道单独跟踪“消费者偏移”吗?如果您使用该设计,那么Kafka似乎并不是真的有用,因为当您可以在Kafka群集中添加更多存储空间时,您将重新实现它的各个部分。

是否可以将“启动索引”重置为较旧的索引,以通过查询检索数据?

这就是 auto.offset.reset =最早的会做的,或 kafka-consumer-groups -reset-offsets-reset-offsets - 至to-eartiest

具有“磁盘备份”,我们可以使用

谨慎使用它,也许。例如 - 您可以将旧经纪人日志段复制到服务器中,但是我不知道的任何工具都会追溯发现每个主题的新“低水标记”(也许经纪人在重新启动时发现了这一点,我没有T测试)。我相信,您需要手动为每个经纪人复制此数据
另外,除非您停止所有消费者并重置它们,否则消费者的偏移已经在阅读该数据之外。

我也不确定如果您在细分文件中存在差距会发生什么。例如,您当前最古老的段为 n ,您复制 n-2 ,但不是 n-1 ...然后您可能会陷入错误或消费者只需应用 auto.offset.Reset 策略,并寻求下一个可用偏移或主题的结尾

In my experience, until the general availability of Tiered Storage, there is no free/easy way to recover data (via the Kafka Consumer protocol).

For example, you can use some Kafka Connect Sink connector to write to some external, more persistent storage. Then, would you want to write a job that scrapes that data? Sure, you could have a SQL database table of STRING topic, INT timestamp, BLOB key, BLOB value, and maybe track "consumer offsets" separately from that? If you use that design, then Kafka doesn't really seem useful, as you'd be reimplementing various parts of it when you could've just added more storage to the Kafka cluster.

Is it possible to reset the "start index" to older index to retrieve the data through query?

That is what auto.offset.reset=earliest will do, or kafka-consumer-groups --reset-offsets --to-earliest

have "disk backup", could we use that

With caution, maybe. For example - you can copy old broker log segments into a server, but then there aren't any tools I know of that will retroactively discover the new "low watermark" of each topic (maybe the broker finds this upon restart, I haven't tested). You'd need to copy this data for each broker manually, I believe, since the replicas wouldn't know about old segments (again, maybe after a full cluster restart, they might).
Plus, the consumer offsets would already be reading way past that data, unless you stop all consumers and reset them.

I'm also not sure what happens if you had gaps in the segment files. E.g. your current oldest segment is N and you copy N-2, but not N-1... You might then run into an error or the consumer will simply apply auto.offset.reset policy, and seek to the next available offset or to the very end of the topic

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文