Apache Kafka消息被存档 - 是否可以检索消息
我们正在使用Apache Kafka,我们每天处理超过3000万条消息。我们的保留政策为“ 30”天。但是,在30天之前,我们的消息被存档了。
有什么方法可以检索已删除的消息? 是否可以将“启动索引”重置为较旧的索引以通过查询检索数据?
我们还有哪些其他选择?
如果我们有“磁盘备份”,我们可以将其用于检索数据吗?
谢谢
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
我假设您的消息在这里被KAFKA群集删除
。
从理论上讲,如果您可以访问备份,则可以将KAFKA数据存储器文件移至服务器目录,但行为不确定。尝试使用无限尺寸/时间策略的新鲜群集(因此,没有立即清除)可能会起作用,并让您再次消费。
I'm assuming your messages got deleted by the Kafka cluster here.
In general, no - if the records got deleted due to duration / size related policies, then they have been removed.
Theoretically, if you have access to backups you might move the Kafka data-log files to server directory, but the behaviour is undefined. Trying that with a fresh cluster with infinite size/time policies (so nothing gets purged instantly) might work and let you consume again.
根据我的经验,直到,没有免费/简便的方法来恢复数据(通过KAFKA消费者协议)。
例如,您可以使用一些KAFKA连接接收器连接器来写入一些外部,更持久的存储。那么,您想写一个刮擦数据的作业吗?当然,您可以拥有一个
字符串主题,int Timestamp,blob键,blob值
的SQL数据库表,也许与此轨道单独跟踪“消费者偏移”吗?如果您使用该设计,那么Kafka似乎并不是真的有用,因为当您可以在Kafka群集中添加更多存储空间时,您将重新实现它的各个部分。这就是
auto.offset.reset =最早的
会做的,或kafka-consumer-groups -reset-offsets-reset-offsets - 至to-eartiest
谨慎使用它,也许。例如 - 您可以将旧经纪人日志段复制到服务器中,但是我不知道的任何工具都会追溯发现每个主题的新“低水标记”(也许经纪人在重新启动时发现了这一点,我没有T测试)。我相信,您需要手动为每个经纪人复制此数据
另外,除非您停止所有消费者并重置它们,否则消费者的偏移已经在阅读该数据之外。
我也不确定如果您在细分文件中存在差距会发生什么。例如,您当前最古老的段为
n
,您复制n-2
,但不是n-1
...然后您可能会陷入错误或消费者只需应用auto.offset.Reset
策略,并寻求下一个可用偏移或主题的结尾In my experience, until the general availability of Tiered Storage, there is no free/easy way to recover data (via the Kafka Consumer protocol).
For example, you can use some Kafka Connect Sink connector to write to some external, more persistent storage. Then, would you want to write a job that scrapes that data? Sure, you could have a SQL database table of
STRING topic, INT timestamp, BLOB key, BLOB value
, and maybe track "consumer offsets" separately from that? If you use that design, then Kafka doesn't really seem useful, as you'd be reimplementing various parts of it when you could've just added more storage to the Kafka cluster.That is what
auto.offset.reset=earliest
will do, orkafka-consumer-groups --reset-offsets --to-earliest
With caution, maybe. For example - you can copy old broker log segments into a server, but then there aren't any tools I know of that will retroactively discover the new "low watermark" of each topic (maybe the broker finds this upon restart, I haven't tested). You'd need to copy this data for each broker manually, I believe, since the replicas wouldn't know about old segments (again, maybe after a full cluster restart, they might).
Plus, the consumer offsets would already be reading way past that data, unless you stop all consumers and reset them.
I'm also not sure what happens if you had gaps in the segment files. E.g. your current oldest segment is
N
and you copyN-2
, but notN-1
... You might then run into an error or the consumer will simply applyauto.offset.reset
policy, and seek to the next available offset or to the very end of the topic