为什么高 IO 速率操作会降低 Linux 上的一切速度?

发布于 2024-10-04 20:45:54 字数 1459 浏览 7 评论 0原文

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(4

初心 2024-10-11 20:45:54

这可能是当前调度程序的限制。谷歌“Galbraith's sched:autogroup patch”或“linux Miracle patch”(是的,真的!)。显然有一个 200 行的补丁正在完善和合并的过程中,它添加了组调度,关于它 莱纳斯说

我也对它对交互性能的影响感到非常满意。
不可否认,我的“测试用例”确实微不足道(在
网络浏览器,滚动一下,同时在
同时内核),但这是一个非常相关的测试用例
为我。这是一个巨大的改进。

此处前后对比视频。

It could be a limitation of the current scheduler. Google "Galbraith's sched:autogroup patch" or "linux miracle patch" (yes really!). There's apparently a 200-line patch in the process of being refined and merged which adds group scheduling, about which Linus says:

I'm also very happy with just what it does to interactive performance.
Admittedly, my "testcase" is really trivial (reading email in a
web-browser, scrolling around a bit, while doing a "make -j64" on the
kernel at the same time), but it's a test-case that is very relevant
for me. And it is a huge improvement.

Before-and-after videos here.

南渊 2024-10-11 20:45:54

因为,复制大文件(大于可用缓冲区高速缓存)通常涉及将其通过缓冲区高速缓存,这通常会导致最近较少使用的页面被丢弃,然后必须将其带回。

其他正在执行微小操作的进程少量的偶尔 IO(比如只是统计一个目录)然后将它们的缓存全部清除,并且必须进行物理读取以将这些页面带回。

希望这可以通过可以检测到这种情况的复制命令来解决并相应地通知内核(例如,使用 posix_fadvise),以便对不需要随后读取的文件进行大量一次性批量传输,而不会完全丢弃缓冲区高速缓存中的所有干净页面,这通常是现在发生的情况。

Because, copying a large file (bigger than the available buffer cache) usually involves bringing it through the buffer cache, which generally causes less recently-used pages to be thrown out, which must then be brought back in.

Other processes which are doing tiny small amounts of occasional IO (say just stat'ing a directory) then get their caches all blown away and must do physical reads to bring those pages back in.

Hopefully this can get fixed by a copy-command which can detect this kind of thing and advise the kernel accordingly (e.g. with posix_fadvise) so that a large one-off bulk transfer of a file which does not need to be subsequently read does not completely discard all clean pages from the buffer cache, which now normally mostly happens.

放手` 2024-10-11 20:45:54

高 IO 操作率通常意味着必须由 CPU 处理的高中断率,这会占用 CPU 时间。

对于cp,它还使用了大量的可用内存带宽,因为每个数据块都被复制到用户空间或从用户空间复制。这也往往会从 CPU 缓存和 TLB 中弹出其他进程所需的数据,这会在其他进程发生缓存未命中时减慢速度。

A high rate of IO operations usually means a high rate of interrupts that must be serviced by the CPU, which takes CPU time.

In the case of cp, it also uses a considerable amount of the available memory bandwidth, as each block of data is copied to and from userspace. This will also tend to eject data required by other processes from the CPUs caches and TLB, which will slow down other processes as they take cache misses.

唠甜嗑 2024-10-11 20:45:54

此外,您是否知道一种在 Linux 上验证您的假设的方法,例如执行 IO 密集型操作时的中断数量。

关于中断,我猜咖啡馆的假设是:

  • 每秒有很多中断;
  • 中断由任何/所有 CPU 提供服务;
  • 因此,中断会刷新 CPU 缓存。

您需要测试的统计数据是每个 CPU 每秒的中断数。

我不知道是否可以将中断绑定到单个 CPU:请参阅 http:// /www.google.com/#q=cpu+affinity+interrupt 了解更多详情。

这是我不明白的事情(这是我第一次看到这个问题):我的笔记本电脑(运行 Windows Vista)上的 perfmon 在几乎空闲(什么也不做)时显示每秒 2000 个中断(每个核心 1000 个)但显示 perfmon)。我无法想象哪个设备每秒生成 2000 个中断,而且我认为这足以破坏 CPU 缓存(我的猜测是繁忙线程的 CPU 量程约为 50 毫秒)。它还显示平均 350 DPC/秒。

高端硬件是否也遇到类似问题?

一种类型的硬件差异可能是磁盘硬件和磁盘设备驱动程序,产生更多或更少的中断和/或其他争用。

Also, would you know a way to validate your hypothesis on linux, e.g. number of interrupts while doing IO intensive operations.

To do with interrupts, I'm guessing that caf's hypothesis is:

  • many interrupts per second;
  • interrupts are serviced by any/all CPUs;
  • therefore, interrupts flush the CPU caches.

The statistics you'd need to test that would be the number of interrupts per second per CPU.

I don't know whether it's possible to tie interrupts to a single CPU: see http://www.google.com/#q=cpu+affinity+interrupt for further details.

Here's something I don't understand (this is the first time I've looked at this question): perfmon on my laptop (running Windows Vista) is showing 2000 interrupts/second (1000 on each core) when it's almost idle (doing nothing but displaying perfmon). I can't imagine which device is generating 2000 interrupts/second, and I would have thought that's enough to blow away the CPU caches (my guess is that the CPU quantum for a busy thread is something like 50 msec). It's also showing an average of 350 DPCs/sec.

Do high end hardware suffer from similar issues ?

One type of hardware difference might be the disk hardware and disk device driver, generating more or fewer interrupts and/or other contentions.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文