Linux缓冲区缓存对IO写入的影响?
我正在 Linux 服务器(内核 2.6.37、16 核、32G RAM)上的 2 个文件系统之间复制大文件 (3 x 30G),但性能很差。我怀疑缓冲区高速缓存的使用会降低 I/O 性能。
我写了一个小 C 程序来重现这个问题。该程序将 20G 的零字节直接写入 SAS 磁盘(/dev/sda,无文件系统)。它还支持 O_DIRECT 标志。
当我使用 O_DIRECT 运行程序时,可以获得非常稳定且可预测的性能:
/dev/sda: 100M current_rate=195.569950M/s avg_rate=195.569950M/s
/dev/sda: 200M current_rate=197.063362M/s avg_rate=196.313815M/s
/dev/sda: 300M current_rate=200.479145M/s avg_rate=197.682893M/s
/dev/sda: 400M current_rate=210.400076M/s avg_rate=200.715853M/s
...
/dev/sda: 20100M current_rate=206.102701M/s avg_rate=201.217154M/s
/dev/sda: 20200M current_rate=206.485716M/s avg_rate=201.242573M/s
/dev/sda: 20300M current_rate=197.683935M/s avg_rate=201.224729M/s
/dev/sda: 20400M current_rate=200.772976M/s avg_rate=201.222510M/s
如果没有 O_DIRECT 则情况不同:
/dev/sda: 100M current_rate=1323.171377M/s avg_rate=1323.171377M/s
/dev/sda: 200M current_rate=1348.181303M/s avg_rate=1335.559265M/s
/dev/sda: 300M current_rate=1351.223533M/s avg_rate=1340.740178M/s
/dev/sda: 400M current_rate=1349.564091M/s avg_rate=1342.935321M/s
...
/dev/sda: 20100M current_rate=67.203804M/s avg_rate=90.685743M/s
/dev/sda: 20200M current_rate=68.259013M/s avg_rate=90.538482M/s
/dev/sda: 20300M current_rate=64.882401M/s avg_rate=90.362464M/s
/dev/sda: 20400M current_rate=65.412577M/s avg_rate=90.193827M/s
我知道初始吞吐量很高,因为数据被缓存并稍后提交到磁盘。不过,我预计使用缓冲区高速缓存的整体性能不会比使用 O_DIRECT 低 50%。
我也用 dd 进行了测试,得到了类似的结果(虽然我在这里使用了 10G 而不是 20G):
$ dd if=/dev/zero of=/dev/sdb bs=32K count=327680 oflag=direct
327680+0 records in
327680+0 records out
10737418240 bytes (11 GB) copied, 54.0547 s, 199 MB/s
$ dd if=/dev/zero of=/dev/sdb bs=32K count=327680
327680+0 records in
327680+0 records out
10737418240 bytes (11 GB) copied, 116.993 s, 91.8 MB/s
是否有任何内核调整可以修复/最小化问题?
I'm copying large files (3 x 30G) between 2 filesystems on a Linux server (kernel 2.6.37, 16 cores, 32G RAM) and I'm getting poor performance. I suspect that the usage of the buffer cache is killing the I/O performance.
I've written a small C program to replicate the problem. The program writes 20G of zero bytes directly to a SAS disk (/dev/sda, no filesystem). It also supports the O_DIRECT flag.
When I run the program with O_DIRECT a get a very steady and predictable performance:
/dev/sda: 100M current_rate=195.569950M/s avg_rate=195.569950M/s
/dev/sda: 200M current_rate=197.063362M/s avg_rate=196.313815M/s
/dev/sda: 300M current_rate=200.479145M/s avg_rate=197.682893M/s
/dev/sda: 400M current_rate=210.400076M/s avg_rate=200.715853M/s
...
/dev/sda: 20100M current_rate=206.102701M/s avg_rate=201.217154M/s
/dev/sda: 20200M current_rate=206.485716M/s avg_rate=201.242573M/s
/dev/sda: 20300M current_rate=197.683935M/s avg_rate=201.224729M/s
/dev/sda: 20400M current_rate=200.772976M/s avg_rate=201.222510M/s
Without O_DIRECT is a different story:
/dev/sda: 100M current_rate=1323.171377M/s avg_rate=1323.171377M/s
/dev/sda: 200M current_rate=1348.181303M/s avg_rate=1335.559265M/s
/dev/sda: 300M current_rate=1351.223533M/s avg_rate=1340.740178M/s
/dev/sda: 400M current_rate=1349.564091M/s avg_rate=1342.935321M/s
...
/dev/sda: 20100M current_rate=67.203804M/s avg_rate=90.685743M/s
/dev/sda: 20200M current_rate=68.259013M/s avg_rate=90.538482M/s
/dev/sda: 20300M current_rate=64.882401M/s avg_rate=90.362464M/s
/dev/sda: 20400M current_rate=65.412577M/s avg_rate=90.193827M/s
I understand that the initial throughtput is high because that data is cached and committed later to disk. However I don't expect the overall performance using the buffer cache to be 50% less than with O_DIRECT.
I also did tests with dd, I get similar results (I used 10G though here instead of 20G):
$ dd if=/dev/zero of=/dev/sdb bs=32K count=327680 oflag=direct
327680+0 records in
327680+0 records out
10737418240 bytes (11 GB) copied, 54.0547 s, 199 MB/s
$ dd if=/dev/zero of=/dev/sdb bs=32K count=327680
327680+0 records in
327680+0 records out
10737418240 bytes (11 GB) copied, 116.993 s, 91.8 MB/s
Are there any kernel tunings that could fix/minimize the problem?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
即使在缓冲大量数据时,缓冲区高速缓存也非常高效。
在企业级 SSD 上运行 dd 测试,我可以轻松地通过缓冲区缓存执行超过 1GBps 的 32KB 写入操作。
我发现你的结果很有趣,但我不认为你的问题是“缓冲区缓存太慢”。
我的第一个问题是:速度慢是因为 CPU 限制还是磁盘限制?检查测试期间是否有一个 CPU 核心处于 100%——这可能表明驱动程序或块级别存在问题,例如 I/O 电梯出现异常。如果您发现某个核心已固定,请运行一些配置文件以查看该核心的功能。
如果您受到磁盘限制,您可能想要研究设备级别的 I/O 是什么样的(使用 blktrace?),并查看是否可以确定生成的 I/O 模式是否会在设备级别带来较差的性能。
另外,您可能需要考虑使用像
fio
这样的东西来运行您的测试,而不是发明您自己的基准程序 - 其他人会更容易重现您的结果并相信您的程序不是有过错。The buffer cache is quite efficient, even when buffering huge amounts of data.
Running your dd test on an enterprise SSD, I can easily do over 1GBps of 32KB writes through the buffer cache.
I find your results interesting, but I don't think your problem is "buffer cache too slow".
My first question would be: is it slow because you're CPU-limited or disk-limited? Check if you have one CPU core pegged at 100% during the test-- this might indicate that there's something wrong at the driver or block level, like an I/O elevator that's misbehaving. If you find a core pegged run some profiles to see what that core is up to.
If you're disk-limited you might want to investigate what the I/Os look like at the device level (use blktrace?) and see if you can figure out if the resulting I/O pattern gives poor performance at the device level.
Also, you might want to consider using something like
fio
to run your tests, instead of inventing your own benchmark program-- it'll be easier for others to reproduce your results and trust your program isn't at fault.