如何控制进程运行在哪个核心上？

发布于 2024-07-15 21:53:13 字数 651 浏览 9 评论 0原文

我可以理解如何编写一个使用多个进程或线程的程序：fork()一个新进程并使用IPC，或者创建多个线程并使用这些类型的通信机制。

我也理解上下文切换。也就是说，操作系统只需要一个CPU，就可以为每个进程调度时间（并且有大量的调度算法），从而我们实现了同时运行多个进程。

现在我们有了多核处理器（或多处理器计算机），我们可以在两个单独的核心上同时运行两个进程。

我的问题是关于最后一个场景：内核如何控制进程在哪个核心上运行？哪些系统调用（在 Linux 中，甚至 Windows 中）在特定内核上调度进程？

我问的原因是：我正在为学校开展一个项目，我们将在其中探索计算领域的一个最新主题 - 我选择了多核架构。似乎有很多关于如何在这种环境中编程（如何观察死锁或竞争条件）的材料，但关于控制各个内核本身的材料却很少。我希望能够编写一些演示程序并呈现一些汇编指令或 C 代码，以实现“看，我在第二个核心上运行无限循环，看看 CPU 利用率的峰值”特定核心”。

有代码示例吗？或者教程？

编辑：为了澄清 - 很多人都说这是操作系统的目的，并且应该让操作系统来处理这个问题。我完全同意！但我要问（或试图了解）的是操作系统实际上做了什么来做到这一点。不是调度算法，而是更多“一旦选择了核心，必须执行哪些指令才能让该核心开始获取指令？”

原文

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

櫻之舞 2024-07-22 21:53:14

正如其他人所提到的，处理器亲和力是操作系统特定的。如果您想在操作系统的范围之外执行此操作，您会享受到很多乐趣，我的意思是痛苦。

也就是说，其他人提到了 SetProcessAffinityMask 对于 Win32。没有人提到 Linux 内核设置处理器关联的方法，所以我会提到。您需要使用 sched_setaffinity(2)< /a> 系统调用。这是关于如何操作的一个很好的教程。

此系统调用的命令行包装器是 taskset(1)。例如
taskset -c 2,3 perf stat awk 'BEGIN{for(i=0;i<100000000;i++){}}' 将繁忙循环的 perf-stat 限制为在以下任意一个上运行核心 2 或 3（仍然允许它在核心之间迁移，但仅限于这两个核心之间）。

回复收藏 0 原文

孤独难免 2024-07-22 21:53:14

通常，应用程序将在哪个核心上运行的决定是由系统决定的。但是，您可以将应用程序与特定核心的“亲和性”设置为告诉操作系统仅在该核心上运行该应用程序。通常这不是一个好主意，但在一些罕见的情况下它可能是有意义的。

要在 Windows 中执行此操作，请使用任务管理器，右键单击该进程，然后选择“设置关联性”。您可以在 Windows 中使用 SetThreadAffinityMask、SetProcessAffinityMask 或 SetThreadIdealProcessor 等函数以编程方式完成此操作。

ETA：

如果您对操作系统实际如何进行调度感兴趣，您可能需要查看以下链接：

Wikipedia关于上下文切换的文章

关于调度的维基百科文章

Linux 内核中的调度

对于大多数现代操作系统，操作系统会调度一个线程在内核上短暂执行一段时间。当时间片到期，或者线程执行IO操作导致其自愿让出核心时，操作系统将调度另一个线程在核心上运行（如果有任何线程准备运行）。具体哪个线程被调度取决于操作系统的调度算法。

上下文切换到底如何发生的实现细节是CPU和CPU。依赖于操作系统。它通常会涉及切换到内核模式，操作系统保存前一个线程的状态，加载新线程的状态，然后切换回用户模式并恢复新加载的线程。我上面链接的上下文切换文章对此有更多详细信息。

回复收藏 0 原文

第几種人 2024-07-22 21:53:14

没有任何东西告诉核心“现在开始运行这个进程”。

核心看不到进程，它只知道可执行代码和各种运行级别以及可执行指令的相关限制。

当计算机启动时，为了简单起见，只有一个核心/处理器处于活动状态并实际运行任何代码。然后，如果操作系统支持多处理器，它会使用某些系统特定指令激活其他核心，其他核心很可能从与其他核心完全相同的位置获取并从那里运行。

因此，调度程序所做的就是查看操作系统内部结构（任务/进程/线程队列）并选择一个并将其标记为在其核心运行。然后，在其他核心上运行的其他调度程序实例不会接触它，直到任务再次处于等待状态（并且未标记为固定到特定核心）。任务被标记为正在运行后，调度程序执行切换到用户态，任务从之前暂停的位置恢复。

从技术上讲，没有任何东西可以阻止核心在完全相同的时间运行完全相同的代码（许多未锁定的函数都会这样做），但除非编写代码以期望这种情况，否则它可能会搞砸。

对于更奇特的内存模型，情况会变得更奇怪（上面假设“通常的”线性单个工作内存空间），其中核心不一定都看到相同的内存，并且可能需要从其他核心的离合器中获取代码，但通过简单的处理会更容易将任务固定在核心上（据我所知带有 SPU 的索尼 PS3 架构就是这样）。

回复收藏 0 原文

时光暖心i 2024-07-22 21:53:14

要找出处理器的数量，而不是使用 /proc/cpuinfo 只需运行：

nproc

要在一组特定处理器上运行进程：

taskset --cpu-list 1,2 my_command

会说我的命令只能在 cpu 1 或 2 上运行。

要在 4 个处理器上运行程序，请执行以下操作： 4 个不同的事物使用参数化。程序的参数告诉它做一些不同的事情：

for i in `seq 0 1 3`;
do 
  taskset --cpu-list $i my_command $i;
done

一个很好的例子是处理数组中的 800 万个操作，以便 0 到 (2mil-1) 进入处理器 1，2mil 到 (4mil-1) 进入处理器 2等等。

您可以通过使用 apt-get/yum 安装 htop 并在命令行中运行来查看每个进程的负载：

 htop

To find out the number of processors instead of using /proc/cpuinfo just run:

nproc

To run a process on a group of specific processors:

taskset --cpu-list 1,2 my_command

will say that my command can only run on cpu 1 or 2.

To run a program on 4 processors doing 4 different things use parameterization. The argument to the program tells it to do something different:

for i in `seq 0 1 3`;
do 
  taskset --cpu-list $i my_command $i;
done

A good example of this is dealing with 8 million operation in an array so that 0 to (2mil-1) goes to processor 1, 2mil to (4mil-1) to processor 2 and so on.

You can look at the load on each process by installing htop using apt-get/yum and running at the command line:

 htop

回复收藏 0 原文

吃颗糖壮壮胆 2024-07-22 21:53:14

OpenMPI 项目有一个以可移植方式在 Linux 上设置处理器亲和力的库。

不久前，我在一个项目中使用了这个，效果很好。

警告：我依稀记得在找出操作系统如何对核心进行编号时存在一些问题。我在每个 4 核的 2 Xeon CPU 系统中使用了它。

查看 cat /proc/cpuinfo 可能会有所帮助。在我使用的盒子上，它很奇怪。最后是煮沸的输出。

显然，偶数核心位于第一个 cpu 上，奇数核心位于第二个 cpu 上。但是，如果我没记错的话，缓存有问题。在这些 Intel Xeon 处理器上，每个 CPU 上的两个内核共享其 L2 缓存（我不记得处理器是否有 L3 缓存）。我认为虚拟处理器0和2共享一个L2缓存，1和3共享一个，4和6共享一个，5和7共享一个。

由于这种奇怪的现象（1.5 年前我找不到任何有关 Linux 中进程编号的文档），我会小心地进行这种低级调整。然而，显然有一些用途。如果您的代码在几种机器上运行，那么进行这种调整可能是值得的。另一个应用程序将采用某种特定于域的语言，例如 StreamIt ，编译器可以在其中执行此操作肮脏的工作并计算一个明智的时间表。

processor       : 0
physical id     : 0
siblings        : 4
core id         : 0
cpu cores       : 4

processor       : 1
physical id     : 1
siblings        : 4
core id         : 0
cpu cores       : 4

processor       : 2
physical id     : 0
siblings        : 4
core id         : 1
cpu cores       : 4

processor       : 3
physical id     : 1
siblings        : 4
core id         : 1
cpu cores       : 4

processor       : 4
physical id     : 0
siblings        : 4
core id         : 2
cpu cores       : 4

processor       : 5
physical id     : 1
siblings        : 4
core id         : 2
cpu cores       : 4

processor       : 6
physical id     : 0
siblings        : 4
core id         : 3
cpu cores       : 4

processor       : 7
physical id     : 1
siblings        : 4
core id         : 3
cpu cores       : 4

The OpenMPI project has a library to set the processor affinity on Linux in a portable way.

Some while back, I have used this in a project and it worked fine.

Caveat: I dimly remember that there were some issues in finding out how the operating system numbers the cores. I used this in a 2 Xeon CPU system with 4 cores each.

A look at cat /proc/cpuinfo might help. On the box I used, it is pretty weird. Boiled down output is at the end.

Evidently, the evenly numbered cores are on the first cpu and the oddly numbered cores are on the second cpu. However, if I remember correctly, there was an issue with the caches. On these Intel Xeon processors, two cores on each CPU share their L2 caches (I do not remember whether the processor has an L3 cache). I think that the virtual processors 0 and 2 shared one L2 cache, 1 and 3 shared one, 4 and 6 shared one and 5 and 7 shared one.

Because of this weirdness (1.5 years back I could not find any documentation on the process numbering in Linux), I would be careful do do this kind of low level tuning. However, there clearly are some uses. If your code runs on few kinds of machines then it might be worth to do this kind of tuning. Another application would be in some domain specific language like StreamIt where the compiler could do this dirty work and compute a smart schedule.

processor       : 0
physical id     : 0
siblings        : 4
core id         : 0
cpu cores       : 4

processor       : 1
physical id     : 1
siblings        : 4
core id         : 0
cpu cores       : 4

processor       : 2
physical id     : 0
siblings        : 4
core id         : 1
cpu cores       : 4

processor       : 3
physical id     : 1
siblings        : 4
core id         : 1
cpu cores       : 4

processor       : 4
physical id     : 0
siblings        : 4
core id         : 2
cpu cores       : 4

processor       : 5
physical id     : 1
siblings        : 4
core id         : 2
cpu cores       : 4

processor       : 6
physical id     : 0
siblings        : 4
core id         : 3
cpu cores       : 4

processor       : 7
physical id     : 1
siblings        : 4
core id         : 3
cpu cores       : 4

回复收藏 0 原文

偏爱自由 2024-07-22 21:53:14

Linux sched_setaffinity C 最小可运行示例

在这个示例中，我们获取亲和力，修改它，并检查它是否已生效 sched_getcpu()。

main.c

#define _GNU_SOURCE
#include <assert.h>
#include <sched.h>
#include <stdbool.h>
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>

void print_affinity() {
    cpu_set_t mask;
    long nproc, i;

    if (sched_getaffinity(0, sizeof(cpu_set_t), &mask) == -1) {
        perror("sched_getaffinity");
        assert(false);
    }
    nproc = sysconf(_SC_NPROCESSORS_ONLN);
    printf("sched_getaffinity = ");
    for (i = 0; i < nproc; i++) {
        printf("%d ", CPU_ISSET(i, &mask));
    }
    printf("\n");
}

int main(void) {
    cpu_set_t mask;

    print_affinity();
    printf("sched_getcpu = %d\n", sched_getcpu());
    CPU_ZERO(&mask);
    CPU_SET(0, &mask);
    if (sched_setaffinity(0, sizeof(cpu_set_t), &mask) == -1) {
        perror("sched_setaffinity");
        assert(false);
    }
    print_affinity();
    /* TODO is it guaranteed to have taken effect already? Always worked on my tests. */
    printf("sched_getcpu = %d\n", sched_getcpu());
    return EXIT_SUCCESS;
}

GitHub 上游。

编译并运行：

gcc -ggdb3 -O0 -std=c99 -Wall -Wextra -pedantic -o main.out main.c
./main.out

示例输出：

sched_getaffinity = 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 
sched_getcpu = 9
sched_getaffinity = 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 
sched_getcpu = 0

这意味着：

最初，我的所有 16 个核心均已启用，并且进程在核心 9（第 10 个）上随机运行，
在我们将亲和力设置为仅第一个核心后，进程被移动必然到核心 0（第一个）

通过任务集运行这个程序也很有趣：

taskset -c 1,3 ./a.out

它给出了形式的输出：

sched_getaffinity = 0 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 
sched_getcpu = 2
sched_getaffinity = 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 
sched_getcpu = 0

所以我们看到它从一开始就限制了亲和力。

这是有效的，因为亲和力是由子进程继承的，而 taskset 正在分叉：如何防止子进程继承 CPU 亲和力？

在 Ubuntu 16.04 中测试。

x86 裸机

如果您是铁杆用户：多核汇编语言是什么样的？

Linux 如何实现它

sched_setaffinity() 是如何工作的？

Python：os.sched_getaffinity 和 os.sched_setaffinity

请参阅：如何使用python找出CPU的数量

Linux sched_setaffinity C minimal runnable example

In this example, we get the affinity, modify it, and check if it has taken effect with sched_getcpu().

main.c

#define _GNU_SOURCE
#include <assert.h>
#include <sched.h>
#include <stdbool.h>
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>

void print_affinity() {
    cpu_set_t mask;
    long nproc, i;

    if (sched_getaffinity(0, sizeof(cpu_set_t), &mask) == -1) {
        perror("sched_getaffinity");
        assert(false);
    }
    nproc = sysconf(_SC_NPROCESSORS_ONLN);
    printf("sched_getaffinity = ");
    for (i = 0; i < nproc; i++) {
        printf("%d ", CPU_ISSET(i, &mask));
    }
    printf("\n");
}

int main(void) {
    cpu_set_t mask;

    print_affinity();
    printf("sched_getcpu = %d\n", sched_getcpu());
    CPU_ZERO(&mask);
    CPU_SET(0, &mask);
    if (sched_setaffinity(0, sizeof(cpu_set_t), &mask) == -1) {
        perror("sched_setaffinity");
        assert(false);
    }
    print_affinity();
    /* TODO is it guaranteed to have taken effect already? Always worked on my tests. */
    printf("sched_getcpu = %d\n", sched_getcpu());
    return EXIT_SUCCESS;
}

GitHub upstream.

Compile and run:

gcc -ggdb3 -O0 -std=c99 -Wall -Wextra -pedantic -o main.out main.c
./main.out

Sample output:

sched_getaffinity = 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 
sched_getcpu = 9
sched_getaffinity = 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 
sched_getcpu = 0

Which means that:

initially, all of my 16 cores were enabled, and the process was randomly running on core 9 (the 10th one)
after we set the affinity to only the first core, the process was moved necessarily to core 0 (the first one)

It is also fun to run this program through taskset:

taskset -c 1,3 ./a.out

Which gives output of form:

sched_getaffinity = 0 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 
sched_getcpu = 2
sched_getaffinity = 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 
sched_getcpu = 0

and so we see that it limited the affinity from the start.

This works because the affinity is inherited by child processes, which taskset is forking: How to prevent inheriting CPU affinity by child forked process?

Tested in Ubuntu 16.04.

x86 bare metal

If you are that hardcore: What does multicore assembly language look like?

How Linux implements it

How does sched_setaffinity() work?

Python: os.sched_getaffinity and os.sched_setaffinity

See: How to find out the number of CPUs using python

回复收藏 0 原文

穿透光 2024-07-22 21:53:14

正如其他人提到的，它是由操作系统控制的。根据操作系统的不同，它可能会也可能不会为您提供允许您影响给定进程在哪个核心上执行的系统调用。但是，您通常应该让操作系统执行默认行为。如果您有一个 4 核系统，有 37 个进程正在运行，其中 34 个进程正在休眠，那么它将把剩余的 3 个活动进程调度到单独的内核上。

您可能只会在非常专业的多线程应用程序中使用核心亲和力时看到速度的提升。例如，假设您的系统具有 2 个双核处理器。假设您有一个具有 3 个线程的应用程序，其中两个线程对同一组数据进行大量操作，而第三个线程则使用不同的数据集。在这种情况下，通过让两个线程在同一处理器上交互而第三个线程在另一个处理器上交互，您将受益匪浅，因为这样它们就可以共享缓存。操作系统不知道每个线程需要访问哪些内存，因此它可能无法正确地将线程分配给内核。

如果您对操作系统如何感兴趣，请阅读调度。 x86 上多处理的具体细节可以在 Intel 64 和 IA-32 架构软件中找到开发者手册。第 3A 卷第 7 章和第 8 章包含相关信息，但请记住这些手册技术性很强。