在linux上写IO中断?

发布于 2024-11-03 08:39:10 字数 393 浏览 5 评论 0原文

我的应用程序使用 O_DIRECT 将 2MB 的数据直接刷新到 3 路条带存储(作为 lvm 卷安装)。

我在此存储上的写入速度非常慢。 iostat 显示较大的请求大小正在被分解为较小的请求。

avgrq-sz <20...该驱动器上的读取内容不多。

刷新 2MB 的连续内存块(使用 mlock 来确保这一点)、扇区对齐(使用 posix_memalign )需要大约 2 秒,而使用 dd 进行测试 和 iozone 将存储能力评为 > 20Mbps的写入速度。

如果您能提供有关如何进一步调查此问题的任何线索,我将不胜感激。

PS:如果这不是此查询的正确论坛,我希望有一个可能有帮助的指标。

谢谢。

My application is using O_DIRECT for flushing 2MB worth of data directly to a 3-way-stripe storage (mounted as an lvm volume)..

I am getting a very pathetic write speed on this storage. The iostat shows that the large request size is being broken into smaller ones.

avgrq-sz is <20... There aren't much read on that drive.

It takes around 2 seconds to flush down 2MB worth of contiguous memory blocks (using mlock to assure that), sector aligned (using posix_memalign), whereas tests with dd and iozone rate the storage capable of > 20Mbps of write speed.

I would appreciate any clues on how to investigate this issue further.

PS: If this is not the right forum for this query, I would appreciate indicators to a one that could be helpful.

Thanks.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

誰ツ都不明白 2024-11-10 08:39:10

在linux上写IO中断?

磁盘本身可能有最大请求大小,需要权衡块大小和延迟(发送到磁盘的请求越大,消耗的时间可能越长),并且可以有 驱动程序在单个请求中可以消耗多少矢量 I/O 的限制。考虑到上述所有情况,当进一步向堆栈提交时,内核将“分解”太大的单个请求。

如果您能提供有关如何进一步调查此问题的任何线索,我将不胜感激。

不幸的是,如果没有看到实际提交 I/O 的代码(也许您的程序正在提交 10KBytes),很难说为什么 avgrq-sz 如此小(如果它在每个 I/O 大约 10KBytes 的扇区中)缓冲区?)。我们也不知道在提问者测试期间 iozonedd 是否使用 O_DIRECT。如果不是,那么它们的 I/O 将进入写回缓存,然后稍后流出,内核可以以更优化的方式做到这一点。

注意:使用 O_DIRECT 并不是更快的条纹。在正确的情况下,O_DIRECT可以降低开销,但只编写O_DIRECT如果您想达到尽可能高的吞吐量,那么使用磁盘会增加您并行提交 I/O 的压力(例如通过 AIO/io_uring 或通过多个进程/线程),因为您已经抢走了内核为您创建并行提交到设备的最佳方式。

Write IO breakups on linux?

The disk itself may have a maximum request size, there is a tradeoff being block size and latency (the bigger the request being sent to the disk the longer it will likely take to to be consumed) and there can be constraints on how much vectored I/O a driver can consume in a single request. Given all the above, the kernel is going to "break up" single requests that are too large when submitting further down the stack.

I would appreciate any clues on how to investigate this issue further.

Unfortunately it's hard to say why the avgrq-sz is so small (if its in sectors that about 10KBytes per I/O) without seeing the code actually submitting the I/O (maybe your program is submitting 10KByte buffers?). We also don't know if iozone and dd were using O_DIRECT during the questioners test. If they weren't then their I/O would have been going into the write back cache and then streamed out later and the kernel can do that in a more optimal fashion.

Note: Using O_DIRECT is NOT a go faster stripe. In the right circumstances O_DIRECT can lower overhead BUT writing O_DIRECTly to do disk increases the pressure on you to submit I/O in parallel (e.g. via AIO/io_uring or via multiple processes/threads) if you want to reach the highest possible throughput because you have robbed the kernel of its best way of creating parallel submission to the device for you.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文