结构化流查询失败,并显示“找不到事务日志中引用的文件。”

发布于 2025-01-21 00:21:54 字数 276 浏览 1 评论 0原文

我正在从Delta表源流式传输,而我的查询不断失败,而无法找到交易日志中引用的文件。怪异的部分是,当我运行fsck修复表table table table table dry Run以查看缺少哪些文件,它将返回没有结果。为什么流媒体查询认为交易日志中缺少文件,而FSCK维修则说没有?

我还尝试过运行:spark._jvm.com.databricks.sql.transaction.tahoe.deltalog.clearcache()

I am streaming from a delta table source and my queries keep failing with A file referenced in the transaction log cannot be found. The weird part is that when I run fsck repair table table_name dry run to see which files are missing it returns no results. Why would the streaming query think that there is a file missing from the transaction log while the fsck repair says there are none?

I have also tried running: spark._jvm.com.databricks.sql.transaction.tahoe.DeltaLog.clearCache()

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

通知家属抬走 2025-01-28 00:21:54

问题最终是在优化了桌子,然后进行了真空,然后流媒体作业试图读取事务日志,但文件已被吸尘。解决方案是将真空保留率提高比最新流媒体所读取的更大。 FSCK不起作用的原因是,从表的最新状态的角度来看,这些文件实际上并未丢失。

The issue ended up being that tables were being optimized and then vacuumed and then the streaming job was trying to read the transaction log but the files had been vacuumed. The fix was to increase the vacuum retention to be greater than what the latest reads from the streaming. The reason fsck did not work is because the files were not actually missing from the perspective of the latest state of the table.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文