O_SYNC 写入何时在页面缓存(mmap 文件)中可见?

发布于 2024-11-29 11:02:15 字数 569 浏览 2 评论 0原文

我有一个文件 mmap 只读/共享,多个线程/进程同时读取数据。允许单个写入者随时修改数据(在单独的共享内存区域中使用互斥体)。更改是使用底层文件上的 write() 执行的。整体设置是数据库的一部分,旨在保持事务一致性。

多个任意数据页将以任意顺序写出,然后调用 fdatasync()。在写入根页面之前,文件中没有任何内容指向这些更改的页面。根页面是使用通过 O_SYNC 打开的第二个文件描述符写入的,因此在成功写入根页面之前写入不会返回。所有正在写入的页面都是 mmap 区域的一部分,因此它们最终将对所有读者可见。

问题是 - 一旦内核将用户缓冲区复制到页面缓存中,最终的 O_SYNC 写入是否立即可见?或者只有在同步写入完成后才变得可见?我已经阅读了一些内核代码,但没有完全遵循它;在我看来,用户数据立即复制到页面缓存,然后安排写入,然后等待写入完成。同时,写入的数据已经存在于页面缓存中,因此读取进程立即可见。这是不可取的,因为如果物理写入实际上失败,则必须回滚事务,并且永远不应该允许读者看到不成功事务写入的任何内容。

有人确切知道 O_SYNC 写入如何与页面缓存交互吗?我想为了安全起见,我可以使用互斥体包装对根页面的访问,但这会增加一层最好避免的开销。

I have a file mmap'd read-only/shared, with multiple threads/processes reading the data concurrently. A single writer is allowed to modify the data at any time (using a mutex in a separate shared memory region). Changes are performed using a write() on the underlying file. The overall setup is part of a database that is intended to be transactionally consistent.

A number of arbitrary data pages will be written out in any order, and then fdatasync() is called. Nothing in the file points to these altered pages until a root page is written. The root page is written using a second file descriptor that was opened with O_SYNC, so the write will not return until the root page has been written successfully. All of the pages being written are part of the mmap region, so they will eventually become visible to all of the readers.

The question is - does the final O_SYNC write become visible immediately, as soon as the kernel copies the user buffer into the page cache? Or does it become visible only after the synchronous write completes? I've read thru the kernel code a bit but haven't followed it all the way; it looks to me like the user data is copied immediately to the page cache, and then a write is scheduled, and then it waits for the write to complete. In the meantime, the written data is already present in the page cache and so is immediately visible to the reader processes. This is undesirable because if the physical write actually fails, the transaction must be rolled back, and readers should never be allowed to see anything that was written by an unsuccessful transaction.

Anyone know for certain how O_SYNC writes interact with the page cache? I suppose just to be safe I can wrap accesses to the root page with a mutex, but that adds a layer of overhead that would be better to avoid.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

梦罢 2024-12-06 11:02:15

在正式的 POSIX 标准下,MAP_SHARED 区域的更新可以随时出现。同步 I/O 定义指定只有在数据到达物理介质后写入才会返回,但没有讨论其他进程看到的数据。

在 Linux 上的实践中,它的工作原理正如您所描述的那样 - 页面缓存是调度设备写入的暂存区域,MAP_SHARED 映射是页面缓存的视图。

作为替代方案,您可以将根页面的副本放入共享匿名区域。读取进程将使用该副本,写入进程将在将根页面同步到磁盘后更新它。不过,您仍然需要同步,因为您无法自动更新整个页面。

Under the formal POSIX standard, updates to MAP_SHARED regions can appear at any time. The Synchronised I/O definition specifies that the write will only return once the data has landed on physical media, but doesn't talk about the data seen by other processes.

In practice on Linux, it works as you have described - the page cache is the staging area from where device writes are dispatched, and a MAP_SHARED mapping is a view of the page cache.

As an alternative, you could put a copy of the root page into a shared anonymous region. The reading processes would use that copy, and the writing process would update it after it has synched the root page to disk. You will still need synchronisation though, because you can't atomically update an entire page.

独木成林 2024-12-06 11:02:15

您应该对映射文件使用 msync(2) 。混合写入和映射访问会带来麻烦。

You should use msync(2) for mmapped files. Mixing write and mmapped access is asking for troubles.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文