内存映射文件和单个块的原子写入

发布于 2024-09-24 10:47:48 字数 312 浏览 5 评论 0原文

如果我使用普通 IO API 读取和写入单个文件,则保证每个块的写入都是原子的。也就是说,如果我的写入仅修改单个块,则操作系统保证要么写入整个块,要么什么也不写入。

如何在内存映射文件上达到相同的效果?

内存映射文件只是字节数组,因此如果我修改字节数组,操作系统无法知道我何时考虑写入“完成”,因此它可能(即使不太可能)交换出内存中的内存在我的块写入操作的中间,实际上我写了半个块。

我需要某种“进入/离开关键部分”,或者在写入文件时将文件页面“固定”到内存中的某种方法。存在这样的东西吗?如果是这样,那么它是否可以跨常见 POSIX 系统和应用程序移植?视窗?

If I read and write a single file using normal IO APIs, writes are guaranteed to be atomic on a per-block basis. That is, if my write only modifies a single block, the operating system guarantees that either the whole block is written, or nothing at all.

How do I achieve the same effect on a memory mapped file?

Memory mapped files are simply byte arrays, so if I modify the byte array, the operating system has no way of knowing when I consider a write "done", so it might (even if that is unlikely) swap out the memory just in the middle of my block-writing operation, and in effect I write half a block.

I'd need some sort of a "enter/leave critical section", or some method of "pinning" the page of a file into memory while I'm writing to it. Does something like that exist? If so, is that portable across common POSIX systems & Windows?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

苏璃陌 2024-10-01 10:47:48

保留 日记 的技术似乎是唯一的方式。我不知道这对于多个应用程序写入同一文件是如何工作的。 Cassandra 项目有一篇关于如何通过日志获得性能的好文章。关键是要确保日志只记录积极操作(我的第一个方法是将每次写入的原像写入日志,允许您回滚,但它变得过度复杂的)。

所以基本上你的内存映射文件在标头中有一个 transactionId ,如果你的标头适合一个块,你就知道它不会被损坏,尽管很多人似乎用校验和写了两次:<代码>[标头[cksum]][标头[cksum]]。如果第一个校验和失败,则使用第二个。

日记看起来像这样:

[beginTxn[txnid]] [offset, length, data...] [commitTxn[txnid]]

您只需不断附加日记记录,直到它变得太大,然后在某个时候将其滚动。当您启动程序时,您将检查文件的事务 ID 是否为日志的最后一个事务 ID,如果不是,您将回放日志中的所有事务以进行同步。

The technique of keeping a journal seems to be the only way. I don't know how this works with multiple apps writing to the same file. The Cassandra project has a good article on how to get performance with a journal. The key thing is to make sure of, is that the journal only records positive actions (my first approach was to write the pre-image of each write to the journal allowing you to rollback, but it got overly complicated).

So basically your memory-mapped file has a transactionId in the header, if your header fits into one block you know it won't get corrupted, though many people seem to write it twice with a checksum: [header[cksum]] [header[cksum]]. If the first checksum fails, use the second.

The journal looks something like this:

[beginTxn[txnid]] [offset, length, data...] [commitTxn[txnid]]

You just keep appending journal records until it gets too big, then roll it over at some point. When you startup your program you check to see if the transaction id for the file is at the last transaction id of the journal -- if not you play back all the transactions in the journal to sync up.

摇划花蜜的午后 2024-10-01 10:47:48

如果我使用普通 IO API 读取和写入单个文件,则保证每个块的写入都是原子的。也就是说,如果我的写入仅修改单个块,则操作系统保证要么写入整个块,要么什么也不写入。

在一般情况下,操作系统保证使用“普通 IO API”完成的“块写入”是原子的:

  • 块更多的是文件系统概念 - 文件系统的块大小实际上可能映射到多个磁盘扇区...
  • 假设您指的是扇区,您如何知道您的写入仅映射到扇区?当 I/O 通过文件系统的间接寻址时,没有任何内容表明 I/O 与扇区的 I/O 很好地对齐
  • 没有任何内容表明您的磁盘有实现扇区原子性。 “真实磁盘”通常可以,但它不是强制性的,也不是有保证的属性。遗憾的是,您的程序无法“检查”此属性,除非它是 NVMe 磁盘并且您可以访问原始设备,或者您正在向原始设备发送具有原子性保证的原始命令。

此外,您通常关心多个扇区的耐用性(例如,如果发生断电,我在该扇区之前发送的数据是否肯定在稳定存储上?)。如果正在进行任何缓冲,您的写入可能仍然只在 RAM/磁盘缓存中,除非您使用另一个命令首先检查/使用 请求缓存绕过的标志和所述标志实际上得到了遵守

If I read and write a single file using normal IO APIs, writes are guaranteed to be atomic on a per-block basis. That is, if my write only modifies a single block, the operating system guarantees that either the whole block is written, or nothing at all.

In the general case, the OS does not guarantee "writes of a block" done with "normal IO APIs" are atomic:

  • Blocks are more of a filesystem concept - a filesystem's block size may actually map to multiple disk sectors...
  • Assuming you meant sector, how do you know your write only mapped to a sector? There's nothing saying the I/O was well aligned to that of a sector when it's gone through the indirection of a filesystem
  • There's nothing saying your disk HAS to implement sector atomicity. A "real disk" usually does but it's not mandatory or a guaranteed property. Sadly your program can't "check" for this property unless its an NVMe disk and you have access to the raw device or you're sending raw commands that have atomicity guarantees to a raw device.

Further, you're usually concerned with durability over multiple sectors (e.g. if power loss happens was the data I sent before this sector definitely on stable storage?). If there's any buffering going on, your write may have still only been in RAM/disk cache unless you used another command to check first / opened the file/device with flags requesting cache bypass and said flags were actually honoured.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文