没有 fsync() 的 rename() 安全吗?
在不先调用 fsync(tmppath_fd)
的情况下调用 rename(tmppath, path)
是否安全?
我希望路径始终指向完整的文件。 我主要关心Ext4。 rename() 是否承诺在所有未来的 Linux 内核版本中都是安全的?
Python 中的使用示例:
def store_atomically(path, data):
tmppath = path + ".tmp"
output = open(tmppath, "wb")
output.write(data)
output.flush()
os.fsync(output.fileno()) # The needed fsync().
output.close()
os.rename(tmppath, path)
Is it safe to call rename(tmppath, path)
without calling fsync(tmppath_fd)
first?
I want the path to always point to a complete file.
I care mainly about Ext4. Is the rename() promised to be safe in all future Linux kernel versions?
A usage example in Python:
def store_atomically(path, data):
tmppath = path + ".tmp"
output = open(tmppath, "wb")
output.write(data)
output.flush()
os.fsync(output.fileno()) # The needed fsync().
output.close()
os.rename(tmppath, path)
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
不。
看看 libeatmydata,以及这个演示文稿:
吃掉我的数据:每个人如何获取文件 IO 错误
http://www.oscon.com/oscon2008/public/schedule/detail/3172,
来自 MySql 的 Stewart Smith。
如果它离线/不再可用,我会保留它的副本:
No.
Look at libeatmydata, and this presentation:
Eat My Data: How Everybody Gets File IO Wrong
http://www.oscon.com/oscon2008/public/schedule/detail/3172
by Stewart Smith from MySql.
In case it is offline/no longer available, I keep a copy of it:
来自 ext4 文档:
从“损坏的应用程序”一词来看,它ext4 开发人员绝对认为这是不好的做法,但实际上它的使用如此广泛,以至于在 ext4 本身中进行了修补。
因此,如果您的使用符合模式,那么您应该是安全的。
如果没有,我建议您进一步调查,而不是为了安全起见到处插入
fsync
。这可能不是一个好主意,因为fsync
可能会对 ext3 造成重大性能影响(读取)。另一方面,重命名之前刷新是在非日志文件系统上进行替换的正确方法。也许这就是为什么 ext4 最初期望程序出现这种行为,后来添加了 auto_da_alloc 选项作为修复。另外这个写回(非日志)模式的ext3补丁试图帮助粗心的程序通过在重命名时异步刷新来降低数据丢失的可能性。
您可以阅读有关 ext4 问题的更多信息 此处。
From ext4 documentation:
Judging by the wording "broken applications", it is definitely considered bad practice by the ext4 developers, but in practice it is so widely used approach that it was patched in ext4 itself.
So if your usage fits the pattern, you should be safe.
If not, I suggest you to investigate further instead of inserting
fsync
here and there just to be safe. That might not be such a good idea sincefsync
can be a major performance hit on ext3 (read).On the other hand, flushing before rename is the correct way to do the replacement on non-journaling file systems. Maybe that's why ext4 at first expected this behavior from programs, the
auto_da_alloc
option was added later as a fix. Also this ext3 patch for the writeback (non-journaling) mode tries to help the careless programs by flushing asynchronously on rename to lower the chance of data loss.You can read more about the ext4 problem here.
如果您只关心 ext4 而不是 ext3,那么我建议在重命名之前对新文件使用 fsync。 ext4 上的 fsync 性能似乎比 ext3 上好得多,而且没有很长的延迟。或者可能是因为写回是默认模式(至少在我的 Linux 系统上)。
如果您只关心文件是否完整而不关心目录中的哪个文件被命名,那么您只需要 fsync 新文件。也无需同步目录,因为它将指向带有完整数据的新文件或旧文件。
If you only care about ext4 and not ext3 then I'd recommend using fsync on the new file before doing the rename. The fsync performance on ext4 seems to be much better than on ext3 without the very long delays. Or it might be the fact that writeback is the default mode (at least on my Linux system).
If you only care that the file is complete and not which file is named in the directory then you only need to fsync the new file. There's no need to fsync the directory too since it will point to either the new file with its complete data, or the old file.