Azure Data Lake Gen1:在`concurrentappend`之后,后续读取文件返回不同的内容
我们有一个 Azure Data Lake Storage Gen1 实例和一个将传入数据附加到文件中的系统。
我们想向另一个团队保证,在午夜,当天结束的文件将变得不可变。然而,另一个团队抱怨说文件似乎在午夜后发生了变化。
我在诊断设置中启用了“请求日志”,并确认文件内容在读取之间似乎发生了变化,即使读取之间没有写入操作:
看来第一个读到的concurrentappend
操作后的文件返回不包含最后附加内容的数据。 即使写入和第一次读取之间有很长的时间(例如 1 小时),这似乎也是如此。 第二次和后续读取正确返回所有数据。
我的问题是:这是 ADLS Gen1 中的错误还是预期行为?这个答案说ADLS 具有写后读一致性,但我所观察到的似乎与它相矛盾......
We have an instance of Azure Data Lake Storage Gen1 and a system that appends incoming data to files there.
We want to guarantee to another team, that at midnight, the files for the day that ends become immutable. However, the other team complained that files appear to change after midnight.
I enabled "Request logs" in Diagnostic settings and confirmed that content of the files appear to change between the reads, even though there is no write operation in between the reads:
It appears that the first read of the file after concurrentappend
operation returns data that does not include the last appended content.
This seems true even when there is a lot of time between write and the first read (e.g. 1 hour).
The second and subsequent reads return all the data - correctly.
My question is: is it a bug in ADLS Gen1 or an intended behavior? This answer says ADLS has read-after-write consistency, but what I'm observing seems to contradict it...
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论