如何使用 mmap 创建仅手动同步到磁盘的检查点文件

发布于 2024-09-08 03:00:29 字数 168 浏览 6 评论 0原文

我需要最快的方法来定期将文件与内存同步。

我想我想要的是有一个 mmap 文件,该文件仅手动同步到磁盘。我不确定如何防止发生任何自动同步。

除非我手动指定,否则无法修改该文件。关键是要有一个检查点文件,它在内存中保存状态的快照。我想尽可能避免复制,因为这需要相当频繁地调用,而且速度很重要。

I need the fastest way to periodically sync file with memory.

What I think I would like is to have an mmap'd file, which is only sync'd to disk manually. I'm not sure how to prevent any automatic syncing from happening.

The file cannot be modified except at the times I manually specify. The point is to have a checkpoint file which keeps a snapshot of the state in memory. I would like to avoid copying as much as possible, since this will be need to called fairly frequently and speed is important.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(5

疑心病 2024-09-15 03:00:29

您在文件的 MAP_SHARED 映射中写入内存的任何内容都被视为当时已写入该文件,就像您使用了 write() 一样。从这个意义上讲,msync()fsync() 完全类似 - 它只是确保您对文件已经进行的更改实际上被推送出到永久存储。您无法更改这一点 - 这就是 mmap() 的定义方式。

一般来说,执行此操作的安全方法是将数据的完整一致副本写入临时文件,同步临时文件,然后以原子方式将其重命名为之前的检查点文件。这是确保检查点之间的崩溃不会导致文件不一致的唯一方法。任何减少复制的解决方案都将需要更复杂的事务日志样式文件格式,并且对应用程序的其余部分更具侵入性(需要在内存状态更改的每个位置调用特定的挂钩) 。

Anything you write to the memory within a MAP_SHARED mapping of a file is considered as being written to the file at that time, as surely as if you had used write(). msync() in this sense is completely analagous to fsync() - it merely ensures that changes you have already made to the file are actually pushed out to permanent storage. You can't change this - it's how mmap() is defined to work.

In general, the safe way to do this is to write a complete consistent copy of the data to a temporary file, sync the temporary file, then atomically rename it over the prior checkpoint file. This is the only way to ensure that a crash between checkpoints doesn't leave you with an inconsistent file. Any solution that does less copying is going to require both a more complicated transaction-log style file format, and be more intrusive to the rest of your application (requiring specific hooks to be invoked in each place that the in-memory state is changed).

爱格式化 2024-09-15 03:00:29

您可以 mmap() 将文件作为写入时复制,这样您在内存中所做的任何更新都不会写回到文件中,然后当您想要同步时,您可以:

A) 创建一个新内存映射不是写入时复制,而是仅将您修改的页面复制到其中。

或者

B) 使用直接 I/O(块大小对齐大小的读取和写入)打开文件(常规文件打开)并仅写入您修改的页面。直接 I/O 会很好而且很快,因为您正在写入整个页面(内存页面大小是磁盘块大小的倍数)并且没有缓冲。此方法的优点是不使用地址空间,以防您的 mmap() 很大并且没有空间容纳 mmap() 另一个大文件。

同步后,写入时的副本 mmap() 与磁盘文件相同,但内核仍将需要同步的页面标记为非共享(与磁盘)。因此,您可以关闭并重新创建 mmap() (仍然在写入时复制),这样如果存在内存压力,内核可以在必要时丢弃您的页面(而不是将它们分页到交换空间)。

当然,您必须跟踪您自己修改了哪些页面,因为我无法想象您如何访问操作系统保存该信息的位置。 (这不是一个方便的 syscall() 吗?)

- 编辑 -

实际上,请参阅 可以从用户空间找到 mmap 页面的脏页吗? 了解如何查看哪些页面脏页。

You could mmap() the file as copy on write so that any updates you do in memory are not written back to the file, then when you want to sync, you could:

A) Make a new memory mapping that is not copy on write and copy just the pages you modified into it.

Or

B) Open the file (regular file open) with direct I/O (block size aligned sized reading and writing) and write only the pages you modified. Direct I/O would be nice and fast because you're writing whole pages (memory page size is a multiple of disk block size) and there's no buffering. This method has the benefit of not using address space in case your mmap() is large and there's no room to mmap() another huge file.

After the sync, your copy on write mmap() is the same as your disk file, but the kernel still has the pages you needed to sync marked as non shared (with the disk). So you can then close and recreate the mmap() (still copy on write) that way the kernel can discard your pages if necessary (instead of paging them out to swap space) if there's memory pressure.

Of course, you'd have to keep track of which pages you had modified yourself because I can't think of how you'd get access to where the OS keeps that info. (wouldn't that be a handy syscall()?)

-- edit --

Actually, see Can the dirtiness of pages of a mmap be found from userspace? for ideas on how to see which pages are dirty.

别把无礼当个性 2024-09-15 03:00:29

mmap 不能用于此目的。无法阻止数据写入磁盘。实际上,使用 mlock() 使内存不可交换可能会产生副作用,阻止其写入磁盘,除非您要求写入,但是没有任何保证。当然,如果另一个进程打开该文件,它将看到缓存在内存中的副本(包含最新的更改),而不是物理磁盘上的副本。在很多方面,您应该做什么取决于您是否正在尝试与其他进程进行同步,或者只是为了在崩溃或电源故障时确保安全。

如果您的数据量很小,您可以尝试许多其他方法来原子同步到磁盘。一种方法是将整个数据集存储在一个文件名中,并按该名称创建一个空文件,然后删除旧文件。如果启动时存在 2 个文件(由于极不可能发生崩溃),请删除较旧的文件并从较新的文件恢复。如果您的数据大小小于文件系统块、页面大小或磁盘块,则 write() 也可能是原子的,但我不知道对此有何保证效果立竿见影。你必须做一些研究。

另一种非常标准的方法,只要您的数据不太大,以至于 2 个副本无法容纳在磁盘上,就可以使用:只需使用临时名称创建第二个副本,然后将其 rename() 放在上面旧的。 rename() 始终是原子的。这可能是最好的方法,除非您有理由不这样做。

mmap can't be used for this purpose. There's no way to prevent data from being written to disk. In practice, using mlock() to make the memory unswappable might have a side effect of preventing it from getting written to disk except when you ask for it to be written, but there's no guarantee. Certainly if another process opens the file, it's going to see the copy cached in memory (with your latest changes), not the copy on physical disk. In many ways, what you should do depends on whether you're trying to do synchronization with other processes or just for safety in case of crash or power failure.

If your data size is small, you might try a number of other methods for atomic syncing to disk. One way is to store the entire dataset in a filename and create an empty file by that name, then delete the old file. If 2 files exist at startup (due to extremely unlikely crash time), delete the older one and resume from the newer one. write() may also be atomic if your data size is smaller than a filesystem block, page size, or disk block, but I don't know of any guarantee to that effect right off. You'd have to do some research.

Another very standard approach that works as long as your data isn't so big that 2 copies won't fit on disk: just create a second copy with a temporary name, then rename() it over top of the old one. rename() is always atomic. This is probably the best approach unless you have a reason not to do it that way.

难以启齿的温柔 2024-09-15 03:00:29

正如其他受访者所建议的,我认为没有一种可移植的方法可以在不复制的情况下完成您想做的事情。如果您希望在可以控制操作系统等的特殊用途环境中执行此操作,则可以在 Linux 下使用 btrfs 文件系统来执行此操作。

btrfs 支持新的 reflink() 操作,该操作本质上是写时复制文件系统副本。您可以在启动时reflink()将文件链接到临时文件,mmap()临时文件,然后msync()reflink() 临时返回到原来的检查点。

As the other respondents have suggested, I don't think there's a portable way to do what you want without copying. If you're looking to do this in a special-purpose environment where you can control the OS etc, you may be able to do it under Linux with the btrfs filesystem.

btrfs supports a new reflink() operation which is essentially a copy-on-write filesystem copy. You could reflink() your file to a temporary on start-up, mmap() the temporary, then msync() and reflink() the temporary back to the original to checkpoint.

陌上青苔 2024-09-15 03:00:29

我高度怀疑任何操作系统都可能不会利用这一点,但操作系统可能会注意到以下方面的优化:

int fd = open("file", O_RDWR | O_SYNC | O_DIRECT);

size_t length = get_lenght(fd);

uint8_t * map_addr = mmap(NULL, length, PROT_READ | PROT_WRITE, MAP_PRIVATE, fd, 0);

...

// This represents all of the changes that could possibly happen before you
// want to update the on disk file.
change_various_data(map_addr);

if (is_time_to_update()) {
   write(fd, map_addr, length);
   lseek(fd, 0, SEEK_SET);
   // you could have just used pwrite here and not seeked
}

操作系统可能可能利用这一点的原因是,直到您编写到某个特定页面(其他人也没有这样做),操作系统可能只会使用该位置的实际文件页面作为该页面的交换。

然后,当您写入其中某些页面时,操作系统会为您的进程复制On写入这些页面,但仍保留未写入的页面由原始文件向上。

然后,在调用 write 时,操作系统可以注意到写入在内存和磁盘上都是块对齐的,然后它可以注意到一些源内存页面已经与那些确切的文件系统同步它们被写入的页面,并且只写出已更改的页面。

话虽这么说,如果这种优化不是由任何操作系统完成的,我也不会感到惊讶,而且这种类型的代码最终会非常慢,并且当您调用“write”时会导致大量磁盘写入。如果被利用的话那就很酷了。

I highly suspect that may not be taken advantage of by any OS, but it would be possible for an OS to notice optimizations for:

int fd = open("file", O_RDWR | O_SYNC | O_DIRECT);

size_t length = get_lenght(fd);

uint8_t * map_addr = mmap(NULL, length, PROT_READ | PROT_WRITE, MAP_PRIVATE, fd, 0);

...

// This represents all of the changes that could possibly happen before you
// want to update the on disk file.
change_various_data(map_addr);

if (is_time_to_update()) {
   write(fd, map_addr, length);
   lseek(fd, 0, SEEK_SET);
   // you could have just used pwrite here and not seeked
}

The reasons that an OS could possibly take advantage of this is that until you write to a particular page (and no one else did either) the OS would probably just use the actual file's page at that location as the swap for that page.

Then when you wrote to some set of those pages the OS would Copy On Write those pages for your process, but still keep the unwritten pages backed up by the original file.

Then, upon calling write the OS could notice that the write was block aligned both in memory and on disk, and then it could notice that some of the source memory pages were already synched up with those exact file system pages that they were being written to and only write out the pages which had changed.

All of that being said, it wouldn't surprise me if this optimization isn't done by any OS, and this type of code ends up being really slow and causes lots of disk writing when you call 'write'. It would be cool if it was taken advantage of.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文