当前位置：文江博客话题详情

Linux 上的 AIO 支持

发布于 2024-12-13 10:38:58 字数 183 浏览 5 评论 0原文

有谁知道我可以在哪里获得有关最新 Linux 内核上对 aio 的内核支持状态的最新信息？谷歌搜索显示的网页可能已经过时了。

编辑：

更具体地说，我对非文件相关的描述符感兴趣，例如管道和套接字。网上说不支持，现在还是这样吗？

编辑2：我正在寻找类似于 Windows OVERLAPPED IO 的东西

原文

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

卷耳 2024-12-20 10:38:58

您不需要 POSIX AIO（即 man aio）来异步使用套接字和管道。根据 man 3 aio 这甚至是不可能的。您应该使用非阻塞文件描述符，以及事件通知接口，例如select()、poll()，或epoll。 epoll 是 Linux 特有的，但扩展性比前两者好得多。

要在非阻塞模式下使用文件描述符，您必须在每个文件描述符上设置 O_NONBLOCK 标志：

fcntl(fd, F_SETFL, O_NONBLOCK)

文件描述符处于非阻塞模式后，I/O 操作如 read() 和 write() 永远不会阻塞，但如果操作无法立即完成，则会返回 EAGAIN 或 EWOULDBLOCK。一些更具体的操作，例如 connect()，必须在非阻塞模式下以不同的方式使用；请参阅相关手册页。

为了能够正确使用非阻塞文件描述符，您的应用程序需要事件驱动。基本上，在 main() 中，您需要首先初始化内容，然后进入事件循环。事件循环重复等待事件（使用事件通知接口，例如epoll_wait()），然后检查发生了哪些事件，并对它们做出响应。

现在，当您确实执行 read() 时，它因 EWOULDBLOCK 失败，您可以将其添加到监视可读性的文件描述符列表中；当事件提供者指示可读性时，您重试。

同样，如果您尝试 write() 并且失败并显示 EWOULDBLOCK，您可能需要缓冲数据，并在指示可写性时重试。

You don't need POSIX AIO (i.e. man aio) to use sockets and pipes asynchronously. According to man 3 aio it is not even possible. You should use non-blocking file descriptors instead, together with an event notification interface, such as select(), poll(), or epoll. epoll is Linux specific, but scales much better than the former two.

To use file descriptors in non-blocking mode you have to set the O_NONBLOCK flag on every file descriptor:

fcntl(fd, F_SETFL, O_NONBLOCK)

After a file descriptor is in non-blocking mode, I/O operations like read() and write() will never block, but will return EAGAIN or EWOULDBLOCK if the operation cannot be completed immediately. Some more specific operations, like connect(), have to be used in a different way in non-blocking mode; see relevant man pages.

To be able to use non-blocking file descritors correctly, your application needs to be event driven. Basically, in main(), you need to first initialize stuff, then enter the event loop. The event loop repetedly waits for events (using an event notification interface, e.g. epoll_wait()), then checks which events happened, and responds to them.

Now when you do say a read(), and it fails with EWOULDBLOCK, you add it to the list of file descriptors watched for readability; when the event provider indicates readability, you try again.

Similarly, if you try to write() and it fails with EWOULDBLOCK, you might want to buffer the data and try again when writability is indicated.

回复收藏 0 原文

握住我的手 2024-12-20 10:38:58

Linux 下有两种类型的 AIO。

一种是内核-AIO。它很丑陋，有时行为不符合文档（例如，它会在某些条件下同步运行，而您无法对其执行某些操作，并且在某些条件下它不会正确取消正在进行的请求等， ETC）。它不适用于管道。
这些是 io_ 类型的函数。请注意，您必须与 -laio 链接，您必须在某些系统（例如 Debian/Ubuntu）上单独安装它。

第二个是纯用户态实现 (glibc)，它根据需要生成线程来处理请求。它有详细的文档记录，工作得相当好，并且根据文档，它可以与几乎任何文件描述符包括管道一起使用。
这些是aio_类型的函数。我绝对会推荐使用它们，即使它们是“不酷的用户态实现”——它们工作得很好。

顺便说一句，两者都同时使用 eventfd 作为通知机制，尽管我上次查看时内核版本仍然没有记录（但功能位于标头中）。

或者，正如 Ambroz Bizjak 指出的那样，完全跳过 AIO，因为您所描述的情况并不是绝对必要的。

编辑：
另一方面，由于您使用了“管道”和“套接字”这两个词，您是否知道vmsplice 和拼接？这些可能是向套接字/管道发送数据或从套接字/管道发送数据的最有效的函数。不幸的是，这又是一种记录模糊、难以理解且陷阱不明的黑客行为。已警告您，请自行承担风险。

splice 允许您将数据从套接字（或任何文件描述符）传输到管道，或者反之亦然。 vmsplice 允许您在应用程序空间和管道之间传输数据。
具有讽刺意味的是，vmsplice 在理想情况下应该做完全相同的事情（重新映射页面，又名“玩虚拟机”），早在 2006 年，就有一个人以此为论点，声称所有 BSD 开发人员都是白痴。

好消息就这么多，坏消息是您可以移动的数据量存在“秘密限制”。据我记得它是 64kB（但可以在 /proc 中的某个位置进行配置）。如果您有比这更多的数据，则必须在多个块中工作，可能需要使用多个管道缓冲区，在读取另一个缓冲区时填充一个缓冲区，并在完成后重用旧的管道缓冲区。
这就是事情变得复杂的地方。如果您浏览有关内核陷阱的讨论，您会发现即使是大师也不能 100% 确定在处理多个缓冲区时何时覆盖旧缓冲区是安全的。

另外，要使 vmsplice 真正起作用（即重新映射页面而不是复制），您需要使用“GIFT”标志，至少对我来说，从记录该内存随后会发生什么。按照文档的字面意思，您将需要泄漏内存，因为您永远不允许再次触摸它。当然不可能是这样。也许我只是愚蠢。

我最终放弃了这一点，只是决定使用 epoll 进行准备，并使用普通的 write 进行非阻塞套接字。这种组合可能不是最佳性能，但它有详细的文档记录，并且按文档记录工作。

There are two kinds of AIO under Linux.

One is kernel-AIO. It is ugly and sometimes does not behave in accordance with the documentation (for example, it will run synchronously under certain conditions without you being able to do something about it, and it will not properly cancel in-flight requests under certain conditions, etc, etc). It does not work on pipes.
These are the io_ kind of functions. Note that you must link with -laio, which you must separately install on some systems (e.g. Debian/Ubuntu).

The second is is a pure userland implementation (glibc) which spawns threads on demand to handle requests. It is well-documented, works reasonably well, and according to the documentation, and it works with pretty much anything that is a file descriptor including pipes.
These are the aio_kind of functions. I would definitively recommend to use these, even if they are an "uncool userland implementation" -- they work nicely.

Both work with eventfd as a notification mechanism in the mean time, btw, though the kernel version was still undocumented last time I looked (but the funciton is in the headers).

Or, as Ambroz Bizjak pointed out, skip AIO at all, for what you describe it's not strictly necessary.

EDIT:
On a different note, since you used the words "pipes" and "sockets", are you aware of vmsplice and splice? Those are the probably most efficient functions to send data to/from sockets/pipes. Unluckily, it's another one of those ambiguously documented, hard to understand hacks with obscure pitfalls. Proceed at your own risk, you have been warned.

splice lets you transfer data from a socket (or any file descriptor) to a pipe, or the other way around. vmsplice lets you transfer data between application space and a pipe.
Ironically, vmsplice is ideally supposed to do the exact same thing (remap pages, a.k.a. "play with VM") that one particular person took as argument to claim that all BSD developers are idiots, back in 2006.

So much for the good news, the bad news is that there is a "secret limit" to how much data you can move. As far as I remember it's 64kB (but configurable somewhere in /proc). If you have more data than that, you must therefore work in several chunks, presumably with several pipe buffers, filling one while the other is read, and reusing old pipe buffers after they are done.
And this is where it gets complicated. If you browse through the discussions Kernel Trap, you find that even the Grand Master is not 100% sure about when it's safe to overwrite an old buffer when juggling with several buffers.

Also, for vmsplice to really work (i.e. remapping pages instead of copying), you need to use the "GIFT" flag, and at least to me it's not clear from the docs what becomes of that memory then. Following the docs to the letter, you would need to leak memory, since you are never allowed to touch it again. Of course that can't be it. Maybe I'm just stupid.

I eventually gave up on this, and just settled for using epoll for readiness and non-blocking sockets with plain normal write. That combination is maybe not the utmost performer, but it is well-documented and works as documented.

回复收藏 0 原文