Linux 上的缓冲异步文件 I/O
我正在寻找在 Linux 上执行异步文件 I/O 的最有效方法。
POSIX glibc 实现在用户空间中使用线程。
本机 aio 内核 api 仅适用于无缓冲操作,存在用于添加对缓冲操作支持的内核补丁,但这些补丁已经存在超过 3 年了,似乎没有人关心将它们集成到主线中。
我发现了许多其他允许异步 I/O 的想法、概念和补丁,尽管其中大多数文章都已经有超过 3 年的历史了。今天的内核中真正可用的是什么?我读过有关 servlet、acall、内核线程的内容以及更多我现在甚至不记得的内容。
在当今的内核中进行缓冲异步文件输入/输出的最有效方法是什么?
I am looking for the most efficient way to do asynchronous file I/O on linux.
The POSIX glibc implementation uses threads in userland.
The native aio kernel api only works with unbuffered operations, patches for the kernel to add support for buffered operations exist, but those are >3 years old and no one seems to care about integrating them into the mainline.
I found plenty of other ideas, concepts, patches that would allow asynchronous I/O, though most of them in articles that are also >3 years old. What of all this is really available in todays kernel? I've read about servlets, acalls, stuff with kernel threads and more things I don't even remember right now.
What is the most efficient way to do buffered asynchronous file input/output in todays kernel?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(4)
除非你想编写自己的IO线程池,否则glibc实现是一个可以接受的解决方案。实际上,对于完全在用户空间运行的东西来说,它的效果出奇的好。
根据我的经验,内核实现根本不适用于缓冲 IO(尽管我见过其他人说相反的情况!)。如果您想通过 DMA 读取大量数据,这很好,但如果您打算利用缓冲区高速缓存,这当然会浪费大量时间。
另请注意,内核 AIO 调用实际上可能会阻塞。命令缓冲区的大小有限,大的读取被分成几个较小的读取。一旦队列已满,异步命令就会同步运行。惊喜。我一两年前遇到过这个问题,但找不到解释。四处询问后我得到的答案是“是的,当然,这就是它的工作原理”。
据我了解,尽管多年来似乎有几种可行的解决方案,但对支持缓冲 aio 的“官方”兴趣也不是很大。我读过的一些论点是“无论如何你都不想使用缓冲区”和“没有人需要它”和“大多数人甚至还没有使用 epoll”。所以,嗯……嗯。
直到最近,能够获得由已完成的异步操作发出的
epoll
信号仍然是另一个问题,但与此同时,通过eventfd
可以很好地工作。请注意,glibc 实现实际上会在
__aio_enqueue_request
内按需生成个线程。这可能没什么大不了的,因为生成线程不再那么昂贵,但人们应该意识到这一点。如果您对启动异步操作的理解是“立即返回”,那么该假设可能不正确,因为它可能会首先生成一些线程。编辑:
顺便说一句,在 Windows 下,存在与 glibc AIO 实现中的情况非常相似的情况,其中对异步操作进行排队的“立即返回”假设不成立。
如果您想要读取的所有数据都在缓冲区缓存中,Windows 将决定改为同步运行请求,因为无论如何它都会立即完成。这是有据可查的,而且无可否认听起来也很棒。除非有几兆字节需要复制,或者另一个线程出现页面错误或同时进行 IO(从而竞争锁),否则“立即”可能会花费相当长的时间——我见过 2 的“立即”时间-5毫秒。这在大多数情况下都没有问题,但例如在 16.66ms 帧时间的约束下,您可能不希望冒随机阻塞 5ms 的风险。因此,“可以从我的渲染线程执行异步 IO 没有问题,因为异步不会阻塞”的天真假设是有缺陷的。
Unless you want to write your own IO thread pool, the glibc implementation is an acceptable solution. It actually works surprisingly well for something that runs entirely in userland.
The kernel implementation does not work with buffered IO at all in my experience (though I've seen other people say the opposite!). Which is fine if you want to read huge amounts of data via DMA, but of course it sucks big time if you plan to take advantage of the buffer cache.
Also note that the kernel AIO calls may actually block. There is a limited size command buffer, and large reads are broken up into several smaller ones. Once the queue is full, asynchronous commands run synchronously. Surprise. I've run into this problem a year or two ago and could not find an explanation. Asking around gave me the "yeah of course, that's how it works" answer.
From what I've understood, the "official" interest in supporting buffered aio is not terribly great either, despite several working solutions seem to be available for years. Some of the arguments that I've read were on the lines of "you don't want to use the buffers anyway" and "nobody needs that" and "most people don't even use epoll yet". So, well... meh.
Being able to get an
epoll
signalled by a completed async operation was another issue until recently, but in the meantime this works really fine viaeventfd
.Note that the glibc implementation will actually spawn threads on demand inside
__aio_enqueue_request
. It is probably no big deal, since spawning threads is not that terribly expensive any more, but one should be aware of it. If your understanding of starting an asynchronous operation is "returns immediately", then that assumption may not be true, because it may be spawning some threads first.EDIT:
As a sidenote, under Windows there exists a very similar situation to the one in the glibc AIO implementation where the "returns immediately" assumption of queuing an asynchronous operation is not true.
If all data that you wanted to read is in the buffer cache, Windows will decide that it will instead run the request synchronously, because it will finish immediately anyway. This is well-documented, and admittedly sounds great, too. Except in case there are a few megabytes to copy or in case another thread has page faults or does IO concurrently (thus competing for the lock) "immediately" can be a surprisingly long time -- I've seen "immediate" times of 2-5 milliseconds. Which is no problem in most situations, but for example under the constraint of a 16.66ms frame time, you probably don't want to risk blocking for 5ms at random times. Thus, the naive assumption of "can do async IO from my render thread no problem, because async doesn't block" is flawed.
这些材料看起来很旧——嗯,它确实很旧——因为它已经存在很长时间了,而且虽然绝不是微不足道的,但却很容易理解。您可以提出的解决方案发表在 W. Richard Stevens 的精彩且无与伦比的书中(请阅读“圣经”)。这本书是稀世珍宝,清晰、简洁、完整:每一页都具有真实和直接的价值:
UNIX 环境中的高级编程
另外两卷也是 Stevens 的,是他的 Unix 网络编程 集的前两卷
: 第 1 卷:Sockets Networking API (与 Fenner 和 Rudoff 合作)和
第 2 卷:进程间通信
我无法想象没有这三本基础书籍;当我发现有人没有听说过他们时,我会目瞪口呆。
还有更多史蒂文的书,同样珍贵:
TCP/IP 插图,卷。 1:协议
The material seems old -- well, it is old -- because it's been around for long and, while by no means trivial, is well understood. A solution you can lift is published in W. Richard Stevens's superb and unparalleled book (read "bible"). The book is the rare treasure that is clear, concise, and complete: every page gives real and immediate value:
Advanced Programming in the UNIX Environment
Two other such, also by Stevens, are the first two volumes of his Unix Network Programming collection:
Volume 1: The Sockets Networking API (with Fenner and Rudoff) and
Volume 2: Interprocess Communications
I can't imagine being without these three fundamental books; I'm dumbstruck when I find someone who hasn't heard of them.
Still more of Steven's books, just as precious:
TCP/IP Illustrated, Vol. 1: The Protocols
(2021) 如果您的 Linux 内核足够新(至少 5.1,但较新的内核带来了改进),那么
io_uring
将是“进行异步文件输入/输出的最有效方法” *。这适用于两者缓冲 I/O 和直接 I/O!在 Kernel Recipes 2019 视频“通过 io_uring 加快 IO”中,
io_uring
作者 Jens Axboe 通过io_uring
演示了缓冲 I/O,完成时间几乎是同步缓冲 I/O 的一半。正如 @Marenz 指出的,除非您想要用户空间线程,否则io_uring
是执行缓冲异步 I/O 的唯一方法,因为 Linux AIO(又名libaio
/io_submit()
)无法始终执行缓冲异步操作I/O...此外,在“现代存储速度足够快”一文中。 Glauber Costa 演示了如何与在 Optane 设备上使用
io_uring
进行异步缓冲 I/O 相比,仔细使用io_uring
与异步直接 I/O 可以提高吞吐量。它要求 Glauber 有一个用户空间预读实现(没有它,缓冲 I/O 显然是赢家),但改进令人印象深刻。* 这个答案的上下文显然与存储有关(在提到了“缓冲”这个词之后)。对于网络 I/O,
io_uring
在后来的内核中稳步改进,以至于它可以与epoll()
之类的东西进行对抗,如果它继续下去,有一天它将是在所有情况下都相同或更好。(2021) If your Linux kernel is new enough (at least 5.1 but newer kernels bring improvements) then
io_uring
will be "the most efficient way to do asynchronous file input/output" *. That applies to both buffered and direct I/O!In the Kernel Recipes 2019 video "Faster IO through io_uring",
io_uring
author Jens Axboe demonstrates buffered I/O viaio_uring
finishing in almost half the time of synchronous buffered I/O. As @Marenz noted, unless you want to userspace threadsio_uring
is the only way to do buffered asynchronous I/O because Linux AIO (akalibaio
/io_submit()
) doesn't have the ability to always do buffered asynchronous I/O...Additionally, in the article "Modern storage is plenty fast." Glauber Costa demonstrates how careful use of
io_uring
with asynchronous direct I/O can improve throughput compared to usingio_uring
for asynchronous buffered I/O on an Optane device. It required Glauber to have a userspace readahead implementation (without which buffered I/O was a clear winner) but the improvement was impressive.* The context of this answer is clearly in relation to storage (after all the word buffered was mentioned). For network I/O
io_uring
has steadily improved in later kernels to the extent that it can trade blows with things likeepoll()
and if it continues it will one day be either equal or better in all cases.我不认为异步文件 I/O 的 Linux 内核实现真的有用,除非你也使用 O_DIRECT,抱歉。
有关当前世界状况的更多信息,请参见:https://github.com/littledan/linux- aio 。它是由曾在 Google 工作的人于 2012 年更新的。
I don't think the Linux kernel implementation of asynchronous file I/O is really usable unless you also use O_DIRECT, sorry.
There's more information about the current state of the world here: https://github.com/littledan/linux-aio . It was updated in 2012 by someone who used to work at Google.