将 32 位整数存储到磁盘的绝对最快方法?
我有一个对延迟非常敏感的例程,它按顺序生成整数,但需要将最后生成的整数存储到磁盘,以防崩溃或重新启动。
目前我正在寻找文件的开头,然后写出整数,然后在每次生成新的 int 时刷新。需要刷新,以便写入至少到达电池供电的控制器缓存。
查找的成本相当高,因此我考虑只附加 4 个字节,如果需要恢复,则查找到末尾并读取最后 4 个字节。前面的陈述显然假设没有发生太多其他磁盘活动,因此写磁头理想情况下应保留在文件末尾。
该数字通常不会超过 10,000,000,因此 40MB 也还不错。
关于如何在不牺牲完整性的情况下实现最小延迟的任何建议?
Linux 2.6+ 上的 C 或 C++
I have a very latency sensitive routine that generates integers sequentially, but needs to store the last generated one to disk in case of a crash or re-start.
Currently I'm doing a seek to beginning of file then writing out the integer then flush each time a new int is generated. The flush is required so the write at least hits the battery-backed controller cache.
The seek is quite costly so I was thinking about just appending 4 bytes and if recovery is needed then to seek to the end and read the last 4 bytes. This previous statement obviously assumes that there isn't too much other disk activity happening, so the write head should ideally stay at end of the file.
The number won't typically go higher than 10,000,000 so 40MB isn't so bad.
Any advice as to how to achieve minimum latency without sacrificing integrity?
C or C++ on Linux 2.6+
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(8)
为什么您的应用程序必须等待写入完成?
异步写入数据,或者可能从另一个线程写入数据。
您实际上对硬盘驱动器没有太多的低级控制。只要您一次写入这么少的数据,就会产生大量昂贵的搜索。但由于您仅将其用作发生崩溃时恢复的“检查点”,因此似乎没有理由不能异步进行写入。
Why does your application have to wait for the write complete at all?
Write your data asynchronously, or perhaps from another thread.
You don't really have much low-level control over the harddrive. As long as you write so little data at a time, you're going to incur a lot of expensive seeks. But since you're only using it as "checkpoints" to recover from in case of a crash, there seems to be no reason why the write couldn't occur asynchronously.
无论块大小如何,存储 int 仅占用磁盘上的一个块。因此,您必须将一个块同步到光盘,并且需要很长时间才能完成,而且您无法采取任何措施使其更快。
无论你做什么,fdatasync() 都将是时间上的杀手。它将把一个块同步到您的(电池支持的 RAID)控制器中。
除非您有某种非易失性内存,否则所有(合理的)方法都将完全相同,因为它们都需要同步一个块。
执行搜索系统调用不会产生任何影响,因为这对硬件没有影响。无论如何,您可以通过使用 pwrite() 来避免它。
Storing an int only takes one block on disc, regardless of block size. So you have to sync one block to disc, and it takes as long as it takes, and there is nothing you can do to make it faster.
Whatever else you do, fdatasync() will be the killer, time-wise. It will sync one block into your (battery-backed RAID) controller.
Unless you have some kind of non-volatile ram, all (sensible) methods are going to be exactly equivalent because they all require one block to be sync'd.
Doing a seek system call is not going to make any difference, as that has no effect on hardware. In any case, you can avoid it by using pwrite().
考虑一下“附加 4 个字节”的含义。磁盘不存储文件,甚至字节。它们存储簇以及固定数量的簇。文件的概念是由操作系统创建的。它将一些簇分配给文件系统表,以跟踪文件的精确位置。现在,追加 4 个字节意味着至少将这 4 个字节写入簇。但这也意味着确定哪个集群。现有文件大小是多少?我们需要一个新的集群吗?如果没有,我们需要读取最后一个簇,将4个字节修补到正确的位置,然后写回簇,然后更新文件系统中的文件大小。如果我们确实追加一个新簇,我们可以写入 4 个字节,后跟零(不需要旧值),但我们需要做大量的簿记工作才能将簇添加到文件中。
因此,绝对最快的方法不可能是附加 4 个字节。您必须覆盖 4 个现有字节。最好是在内存中已有的扇区中。其他人已经指出,您可以使用
mmap/msync
来实现这一点。显然,考虑到当前的 SSD 和开发人员价格以及您的 40 MB 限制,您将使用 SSD。如果你节省一个小时,那就值得了。因此寻道时间无关紧要; SSD 没有物理磁头。
Consider what "appending 4 bytes" means. Disks don't store files, or even bytes. They store clusters, and a fixed number of them. The notion of a file is created by the OS. It allocates some clusters to file system tables, to keep track of where a file is precisely located. Now, appending 4 bytes means at least writing the 4 bytes to a cluster. But that also means determining which cluster. What's the existing file size? Do we need a new cluster? If not, we need to read the last cluster, patch the 4 bytes in the correct position, and write back the cluster, then update the file size in the file system. If we do append a new cluster, we can write the 4 bytes followed by zeroes (don't need old value) but we need to do a whole lot of bookkeeping to add a cluster to a file.
So, the absolute fastest way cannot ever be to append 4 bytes. You must overwrite 4 existing bytes. Preferably in a sector that you already have in memory. Others have already pointed out that you can achieve this with
mmap/msync
.Obviously, given current SSD and developer prices, and your 40 MB limit, you'll be using an SSD. It pays for itself if you save an hour. Therefore seek times are irrelevant; SSDs don't have physical heads.
这里有很多人谈论 mmap() 好像这会解决某些问题,但是与磁盘写入开销相比,您的系统调用开销基本上为零。请记住,附加或写入文件需要您无论如何都要更新索引节点(mtime,文件大小),这意味着磁盘寻道。
我建议您考虑将整数存储在磁盘以外的地方。例如:
将其写入您控制的某些 nvram(例如在嵌入式系统上)。 (如果您的 RAID 控制器有用于写入的 nvram,它可能会为您执行此操作。但如果您问这个问题,它可能不会。)
将其写入系统 CMOS 内存中的可用字节(例如,在PC 硬件)。
将其写入网络上的另一台计算机(如果是快速网络)并让它们确认。
重新设计您的应用程序,以便您可以在每 n 笔交易之后(而不是每次交易之后)进行同步。这将比每次都快大约 n 倍。
重新设计您的应用程序,以便如果整数丢失,您最近事务的更改也会丢失。那么,从技术上讲,您丢失了整数更新这一事实并不重要。当您重新启动时,就好像您从未增加过它一样,因此您可以从那里恢复。
你没有解释为什么你需要这种行为;老实说,如果你的应用程序需要这个,听起来你的应用程序可能设计得不是很好。例如,有些人建议使用数据库,因为他们一直在做这种事情;确实如此,但数据库的速度很慢(即每次都同步磁盘),除非您首先创建事务,在这种情况下,磁盘仅需要在您执行“提交事务”时同步。但是,如果您绝对必须在每个整数之后进行同步,那么您将不断提交事务,而数据库无法帮助您避免这种情况;数据库没有什么神奇的方法可以保证不丢失数据,除非它至少执行了 fdatasync()。
There are a lot of people here talking about mmap() as if that will fix something, but your syscall overhead is basically zero compared to the disk write overhead. Remember that appending or writing to a file requires you to update the inode (mtime, filesize) anyway, so that means a disk seek.
I suggest you consider storing the integer somewhere other than a disk. For example:
write it to some nvram that you control (eg. on an embedded system). (If your RAID controller has nvram for writing, it might do this for you. But if you're asking this question, it probably doesn't.)
write it to free bytes in the system CMOS memory (eg. on PC hardware).
write it to another machine on the network (if it's a fast network) and get them to acknowledge.
redesign your application so you can get away with syncing after every n transactions, instead of after every transaction. That will be about n times faster than doing it every time.
redesign your application so that if the integer is lost, the changes from your most recent transaction are also lost. Then the fact that you've technically lost an integer update doesn't matter; when you reboot, it'll be as if you never incremented it, so you can just resume from there.
You didn't explain why you need this behaviour; to be honest, if your app needs this, it sounds like your application is probably not designed very well. For example, some people suggested using a database because they do this sort of thing all the time; true, but databases do it by being slow (ie. syncing the disk every time), unless you create a transaction first, in which case the disk only needs to get synced when you do 'commit transaction'. But if you absolutely must have a sync after every integer, you'd be constantly committing transactions, and a database couldn't save you from that; there's no magical way a database could guarantee not to lose data unless it does at least fdatasync().
我认为最快/最简单的方法是使用 mmap/msync——将文件的 1 页映射到内存中并将值存储在该页上。每当值发生变化时,调用 msync(2) 强制将页面返回到磁盘。这样每个商店只需要一次系统调用
I would think the fastest/easiest way to do this would be with mmap/msync -- mmap 1 page of the file into memory and store the value on that page. Any time the value changes, call msync(2) to force the page back to disk. This way you need only one system call per store
如果我没看错的话,使用内存映射文件怎么样?只需将您的号码写入指定的地址,它就会出现在文件中。这假设操作系统在需要时将缓存稳健地写入磁盘,但您可能会发现它值得一试。
*mappedNumber 现在可以包含您的整数。
If I read correctly, how about using a memory mapped file? Just write your number to the assigned address and it appears in the file. This makes assumptions that the OS writing the cache to disk robustly when needed, but you might find it worth a try.
*mappedNumber can now contain your integer.
测量。
您对硬件有多少控制权?如果没有满,您将得不到任何保证。
在 Linux 上,我可能会尝试制作一个内核驱动程序,它会以最高优先级进行写入,甚至可能不使用文件系统。
但是,理论上......如果足以让您命中控制器缓存,那么每次将任何内容刷新到磁盘时数据都会命中它。这意味着无论驱动器内部是否存在物理查找,数据都已经存在。而且因为您永远不会知道其他应用程序会做什么,或者磁盘旋转的速度有多快,所以即使您将逻辑文件句柄保留在文件的开头或结尾,您的查找也将是随机的。
您随时可以要求您的用户使用闪存驱动器。
Measure.
How much control do you have over the hardware? If anything less than full, you'll get no guarantees.
On Linux I'd probably try making a kernel driver that would do its writes with the highest priority, possibly even without using a file system.
But, theoretically... If it is enough for you to hit the controller cache, data will hit it every time you flush anything to disk. This means regardless of whether there will be physical seek inside the drive or not, the data will already be there. And because you'll never know what will other applications do, or how fast does the disk rotate, your seeks will be random even if you keep the logical file handle at the beginning or end of file.
And you can always ask your user to use a flash drive.
写入文件的最快方法是将文件映射到内存并将其视为字符数组。
如果您不关心操作系统崩溃(Linux 在生产中从未对我造成过崩溃),则无需同步文件。您的所有写入都会绕过内核进入该文件映射,换句话说,真正的零复制(您还不能在标准硬件上使用套接字来做到这一点)。您可能需要在该文件中保留一个标头,其中包含许多写入的记录,以防应用程序在将记录写入内存期间崩溃。即写入一条记录,然后才增加记录计数器。
调整此文件的大小需要
ftruncate()/remap()
序列,这可能会花费太长的时间,因此您可能希望通过按一个因子增大文件来最小化调整大小,例如std::vector< ;>
当push_back()
溢出时,其大小会增加 1.5 倍。根据您的吞吐量和延迟要求,可以应用某些优化。内核将异步地将文件映射写入磁盘(就好像应用程序中有另一个线程专门用于写入磁盘)。有一种方法可以在必要时使用
msync()
强制写入磁盘。然而,只有当您想在操作系统崩溃中幸存下来时,才有必要这样做。但无论如何,在操作系统崩溃中幸存下来都需要复杂的应用程序设计,因此在实践中,在应用程序崩溃中幸存就足够了。The fastest way to write a file is to map that file into memory and treat it as a char array.
You don't need to sync the file if you don't care about OS crashes (Linux never crashed on me in production). All your writes go to that file mapping bypassing the kernel, in other words, real zero-copy (you can't do that with sockets on the standard hardware yet). You may need to keep a header in that file that contains a number of records written in case your application crash during writing a record into the memory. I.e. write a record and only after that increment the record counter.
Resizing this file requires
ftruncate()/remap()
sequence which may take a bit too long, so you may want to minimize resizing by growing the file by a factor, likestd::vector<>
grows by 1.5 its size onpush_back()
when it overflows. Depending on your throughput and latency requirements certain optimization can be applied.The kernel is going to write the file mapping to disk asynchronously (as if there were another thread in your application dedicated to writing to disk). There is a way to force the writes to disk if necessary by using
msync()
. This is only necessary, however, if you'd like to survive an OS crash. But surviving an OS crash requires sophisticated application design anyway, so in practice surviving the application crash is good enough.