如何在多核上运行使用 bash 管道传输的进程?

发布于 2024-08-04 10:16:59 字数 852 浏览 4 评论 0 原文

我有一个简单的 bash 脚本,可以将一个进程的输出传输到另一个进程。即:。

dostuff | filterstuff

碰巧在我的 Linux 系统(openSUSE,如果重要的话,内核 2.6.27)上,这两个进程都在单个内核上运行。但是,在不同内核上运行不同进程是默认策略,在这种情况下不会触发。

系统的哪个组件负责此操作?我应该怎样做才能利用多核功能?

注意在2.6.30内核上不存在此问题。

澄清:遵循丹尼斯·威廉姆森的建议后,我确​​信 /em> 对于 top 程序,管道进程确实总是在同一处理器上运行。 Linux 调度程序通常做得很好,但这次却不行了。

我认为 bash 中的某些内容阻止操作系统执行此操作。问题是我需要一个适用于多核和单核机器的便携式解决方案。 任务集 Dennis Williamson 提出的解决方案 不适用于单核机器。目前我正在使用:,

dostuff | taskset -c 0 filterstuff 

但这似乎是一个肮脏的黑客。有人能提供更好的解决方案吗?

I have a simple bash script that pipes output of one process to another. Namely:.

dostuff | filterstuff

It happens that on my Linux system (openSUSE if it matters, kernel 2.6.27) these both processes run on a single core. However, running different processes on different cores is a default policy that doesn't happen to trigger in this case.

What component of the system is responsible for that and what should I do to utilize multicore feature?

Note that there's no such problem on 2.6.30 kernel.

Clarification: Having followed Dennis Williamson's advice, I made sure with top program, that piped processes are indeed always run on the same processor. Linux scheduler, which usually does a really good job, this time doesn't do it.

I figure that something in bash prevents OS from doing it. The thing is that I need a portable solution for both multi-core and single-core machines. The taskset solution proposed by Dennis Williamson won't work on single-core machines. Currently I'm using:,

dostuff | taskset -c 0 filterstuff 

but this seems like a dirty hack. Could anyone provide a better solution?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

装纯掩盖桑 2024-08-11 10:16:59

假设 dostuff 在一个 CPU 上运行。它将数据写入管道,并且该数据将位于该 CPU 的缓存中。由于 filterstuff 正在从该管道读取数据,因此调度程序决定在同一 CPU 上运行它,以便其输入数据已经在缓存中。

如果您的内核是使用 CONFIG_SCHED_DEBUG=y 构建的,

# echo NO_SYNC_WAKEUPS > /sys/kernel/debug/sched_features

则应禁用此类启发式方法。 (有关其他调度程序可调参数,请参阅 /usr/src/linux/kernel/sched_features.h/proc/sys/kernel/sched_*。)

如果有帮助,那么较新的内核仍然会出现问题,并且在单独的 CPU 上运行确实比在一个 CPU 上运行要快,请将问题报告给 Linux 内核邮件列表,以便他们可以调整其启发式方法。

Suppose dostuff is running on one CPU. It writes data into a pipe, and that data will be in cache on that CPU. Because filterstuff is reading from that pipe, the scheduler decides to run it on the same CPU, so that its input data is already in cache.

If your kernel is built with CONFIG_SCHED_DEBUG=y,

# echo NO_SYNC_WAKEUPS > /sys/kernel/debug/sched_features

should disable this class of heuristics. (See /usr/src/linux/kernel/sched_features.h and /proc/sys/kernel/sched_* for other scheduler tunables.)

If that helps, and the problem still happens with a newer kernel, and it's really faster to run on separate CPUs than one CPU, please report the problem to the Linux Kernel Mailing List so that they can adjust their heuristics.

忆离笙 2024-08-11 10:16:59

尝试设置 CPU(处理器)亲和性:

taskset -c 0 dostuff | taskset -c 1 filterstuff

编辑:

尝试这个实验:

  • 创建一个名为 proctest 的文件,并以此作为 chmod +x proctest内容:

    <前><代码>#!/bin/bash
    虽然是真的

    附注
    睡觉2
    完毕

  • 开始运行:

    <前><代码>./proctest | grep bash

  • 在另一个终端中,启动 top - 确保它按 %CPU 排序,
  • 让它稳定几秒钟,然后退出
  • 命令 ps u
  • 使用最高的几个进程的 PID 列表(例如其中 8 个)启动 top -p,该列表来自退出的 top 留在屏幕上的列表以及由 ps 列出的 proctestgrep - 全部用逗号分隔,如下所示(顺序无关紧要):

    <预><代码>顶部 -p 1234、1255、1211、1212、1270、1275、1261、1250、16521、16522

  • 添加处理器字段 - 按 f 然后按 j 然后 Space
  • 将排序设置为 PID - 按 Shift+F 然后 a 然后 Space
  • 可选:按 Shift+H 转动在线程视图上
  • 可选:按 d 并输入 .09 并按 Enter 设置较短的延迟时间,
  • 现在观察进程从一个处理器移动到另一个处理器,您应该看到 proctestgrep 来回跳动,有时在同一处理器上,有时在不同的处理器上

Give this a try to set the CPU (processor) affinity:

taskset -c 0 dostuff | taskset -c 1 filterstuff

Edit:

Try this experiment:

  • create a file called proctest and chmod +x proctest with this as the contents:

    #!/bin/bash
    while true
    do
      ps
      sleep 2
    done  
    
  • start this running:

    ./proctest | grep bash
    
  • in another terminal, start top - make sure it's sorting by %CPU
  • let it settle for several seconds, then quit
  • issue the command ps u
  • start top -p with a list of the PIDs of the highest several processes, say 8 of them, from the list left on-screen by the exited top plus the ones for proctest and grep which were listed by ps - all separated by commas, like so (the order doesn't matter):

    top -p 1234, 1255, 1211, 1212, 1270, 1275, 1261, 1250, 16521, 16522
    
  • add the processor field - press f then j then Space
  • set the sort to PID - press Shift+F then a then Space
  • optional: press Shift+H to turn on thread view
  • optional: press d and type .09 and press Enter to set a short delay time
  • now watch as processes move from processor to processor, you should see proctest and grep bounce around, sometimes on the same processor, sometimes on different ones
盗梦空间 2024-08-11 10:16:59

Linux 调度程序旨在提供最大吞吐量,而不是做您想象的最好的事情。如果您正在运行与管道连接的进程,很可能其中一个进程会阻塞另一个进程,然后它们就会交换。在单独的内核上运行它们几乎不会实现任何目标,所以它不会。

如果您有两个真正准备好在 CPU 上运行的任务,我希望看到它们被安排在不同的内核上(在某些时候)。

我的猜测是,发生的情况是 dostuff 运行直到管道缓冲区变满,此时它无法再运行,因此“filterstuff”进程运行,但它运行的时间很短,以至于 dostuff 无法运行重新调度,直到filterstuff完成过滤整个管道缓冲区,此时dostuff将再次调度。

The Linux scheduler is designed to give maximum throughput, not do what you imagine is best. If you're running processes which are connected with a pipe, in all likelihood, one of them is blocking the other, then they swap over. Running them on separate cores would achieve little or nothing, so it doesn't.

If you have two tasks which are both genuinely ready to run on the CPU, I'd expect to see them scheduled on different cores (at some point).

My guess is, what happens is that dostuff runs until the pipe buffer becomes full, at which point it can't run any more, so the "filterstuff" process runs, but it runs for such a short time that dostuff doesn't get rescheduled until filterstuff has finished filtering the entire pipe buffer, at which point dostuff then gets scheduled again.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文