没有 fsync() 的 rename() 安全吗?

发布于 2024-12-04 21:46:14 字数 470 浏览 1 评论 0原文

在不先调用 fsync(tmppath_fd) 的情况下调用 rename(tmppath, path) 是否安全?

我希望路径始终指向完整的文件。 我主要关心Ext4。 rename() 是否承诺在所有未来的 Linux 内核版本中都是安全的?

Python 中的使用示例:

def store_atomically(path, data):
    tmppath = path + ".tmp"
    output = open(tmppath, "wb")
    output.write(data)

    output.flush()
    os.fsync(output.fileno())  # The needed fsync().
    output.close()
    os.rename(tmppath, path)

Is it safe to call rename(tmppath, path) without calling fsync(tmppath_fd) first?

I want the path to always point to a complete file.
I care mainly about Ext4. Is the rename() promised to be safe in all future Linux kernel versions?

A usage example in Python:

def store_atomically(path, data):
    tmppath = path + ".tmp"
    output = open(tmppath, "wb")
    output.write(data)

    output.flush()
    os.fsync(output.fileno())  # The needed fsync().
    output.close()
    os.rename(tmppath, path)

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

怂人 2024-12-11 21:46:14

不。

看看 libeatmydata,以及这个演示文稿:

吃掉我的数据:每个人如何获取文件 IO 错误

http://www.oscon.com/oscon2008/public/schedule/detail/3172

来自 MySql 的 Stewart Smith。

如果它离线/不再可用,我会保留它的副本:

No.

Look at libeatmydata, and this presentation:

Eat My Data: How Everybody Gets File IO Wrong

http://www.oscon.com/oscon2008/public/schedule/detail/3172

by Stewart Smith from MySql.

In case it is offline/no longer available, I keep a copy of it:

没有伤那来痛 2024-12-11 21:46:14

来自 ext4 文档

When mounting an ext4 filesystem, the following option are accepted:
(*) == default

auto_da_alloc(*)    Many broken applications don't use fsync() when 
noauto_da_alloc     replacing existing files via patterns such as
                    fd = open("foo.new")/write(fd,..)/close(fd)/
                    rename("foo.new", "foo"), or worse yet,
                    fd = open("foo", O_TRUNC)/write(fd,..)/close(fd).
                    If auto_da_alloc is enabled, ext4 will detect
                    the replace-via-rename and replace-via-truncate
                    patterns and force that any delayed allocation
                    blocks are allocated such that at the next
                    journal commit, in the default data=ordered
                    mode, the data blocks of the new file are forced
                    to disk before the rename() operation is
                    committed.  This provides roughly the same level
                    of guarantees as ext3, and avoids the
                    "zero-length" problem that can happen when a
                    system crashes before the delayed allocation
                    blocks are forced to disk.

从“损坏的应用程序”一词来看,它ext4 开发人员绝对认为这是不好的做法,但实际上它的使用如此广泛,以至于在 ext4 本身中进行了修补。

因此,如果您的使用符合模式,那么您应该是安全的。

如果没有,我建议您进一步调查,而不是为了安全起见到处插入 fsync 。这可能不是一个好主意,因为 fsync 可能会对 ext3 造成重大性能影响(读取)。

另一方面,重命名之前刷新是在非日志文件系统上进行替换的正确方法。也许这就是为什么 ext4 最初期望程序出现这种行为,后来添加了 auto_da_alloc 选项作为修复。另外这个写回(非日志)模式的ext3补丁试图帮助粗心的程序通过在重命名时异步刷新来降低数据丢失的可能性。

您可以阅读有关 ext4 问题的更多信息 此处

From ext4 documentation:

When mounting an ext4 filesystem, the following option are accepted:
(*) == default

auto_da_alloc(*)    Many broken applications don't use fsync() when 
noauto_da_alloc     replacing existing files via patterns such as
                    fd = open("foo.new")/write(fd,..)/close(fd)/
                    rename("foo.new", "foo"), or worse yet,
                    fd = open("foo", O_TRUNC)/write(fd,..)/close(fd).
                    If auto_da_alloc is enabled, ext4 will detect
                    the replace-via-rename and replace-via-truncate
                    patterns and force that any delayed allocation
                    blocks are allocated such that at the next
                    journal commit, in the default data=ordered
                    mode, the data blocks of the new file are forced
                    to disk before the rename() operation is
                    committed.  This provides roughly the same level
                    of guarantees as ext3, and avoids the
                    "zero-length" problem that can happen when a
                    system crashes before the delayed allocation
                    blocks are forced to disk.

Judging by the wording "broken applications", it is definitely considered bad practice by the ext4 developers, but in practice it is so widely used approach that it was patched in ext4 itself.

So if your usage fits the pattern, you should be safe.

If not, I suggest you to investigate further instead of inserting fsync here and there just to be safe. That might not be such a good idea since fsync can be a major performance hit on ext3 (read).

On the other hand, flushing before rename is the correct way to do the replacement on non-journaling file systems. Maybe that's why ext4 at first expected this behavior from programs, the auto_da_alloc option was added later as a fix. Also this ext3 patch for the writeback (non-journaling) mode tries to help the careless programs by flushing asynchronously on rename to lower the chance of data loss.

You can read more about the ext4 problem here.

醉态萌生 2024-12-11 21:46:14

如果您只关心 ext4 而不是 ext3,那么我建议在重命名之前对新文件使用 fsync。 ext4 上的 fsync 性能似乎比 ext3 上好得多,而且没有很长的延迟。或者可能是因为写回是默认模式(至少在我的 Linux 系统上)。

如果您只关心文件是否完整而不关心目录中的哪个文件被命名,那么您只需要 fsync 新文件。也无需同步目录,因为它将指向带有完整数据的新文件或旧文件。

If you only care about ext4 and not ext3 then I'd recommend using fsync on the new file before doing the rename. The fsync performance on ext4 seems to be much better than on ext3 without the very long delays. Or it might be the fact that writeback is the default mode (at least on my Linux system).

If you only care that the file is complete and not which file is named in the directory then you only need to fsync the new file. There's no need to fsync the directory too since it will point to either the new file with its complete data, or the old file.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文