结构化流查询失败,并显示“找不到事务日志中引用的文件。”
我正在从Delta表源流式传输,而我的查询不断失败,而无法找到交易日志中引用的文件
。怪异的部分是,当我运行fsck修复表table table table table dry Run
以查看缺少哪些文件,它将返回没有结果。为什么流媒体查询认为交易日志中缺少文件,而FSCK维修则说没有?
我还尝试过运行:spark._jvm.com.databricks.sql.transaction.tahoe.deltalog.clearcache()
I am streaming from a delta table source and my queries keep failing with A file referenced in the transaction log cannot be found
. The weird part is that when I run fsck repair table table_name dry run
to see which files are missing it returns no results. Why would the streaming query think that there is a file missing from the transaction log while the fsck repair says there are none?
I have also tried running: spark._jvm.com.databricks.sql.transaction.tahoe.DeltaLog.clearCache()
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
问题最终是在优化了桌子,然后进行了真空,然后流媒体作业试图读取事务日志,但文件已被吸尘。解决方案是将真空保留率提高比最新流媒体所读取的更大。 FSCK不起作用的原因是,从表的最新状态的角度来看,这些文件实际上并未丢失。
The issue ended up being that tables were being optimized and then vacuumed and then the streaming job was trying to read the transaction log but the files had been vacuumed. The fix was to increase the vacuum retention to be greater than what the latest reads from the streaming. The reason fsck did not work is because the files were not actually missing from the perspective of the latest state of the table.