如何控制进程运行在哪个核心上?

发布于 2024-07-15 21:53:13 字数 651 浏览 6 评论 0原文

我可以理解如何编写一个使用多个进程或线程的程序:fork()一个新进程并使用IPC,或者创建多个线程并使用这些类型的通信机制。

我也理解上下文切换。 也就是说,操作系统只需要一个CPU,就可以为每个进程调度时间(并且有大量的调度算法),从而我们实现了同时运行多个进程。

现在我们有了多核处理器(或多处理器计算机),我们可以在两个单独的核心上同时运行两个进程。

我的问题是关于最后一个场景:内核如何控制进程在哪个核心上运行? 哪些系统调用(在 Linux 中,甚至 Windows 中)在特定内核上调度进程?

我问的原因是:我正在为学校开展一个项目,我们将在其中探索计算领域的一个最新主题 - 我选择了多核架构。 似乎有很多关于如何在这种环境中编程(如何观察死锁或竞争条件)的材料,但关于控制各个内核本身的材料却很少。 我希望能够编写一些演示程序并呈现一些汇编指令或 C 代码,以实现“看,我在第二个核心上运行无限循环,看看 CPU 利用率的峰值”特定核心”。

有代码示例吗? 或者教程?

编辑:为了澄清 - 很多人都说这是操作系统的目的,并且应该让操作系统来处理这个问题。 我完全同意! 但我要问(或试图了解)的是操作系统实际上做了什么来做到这一点。 不是调度算法,而是更多“一旦选择了核心,必须执行哪些指令才能让该核心开始获取指令?”

I can understand how one can write a program that uses multiple processes or threads: fork() a new process and use IPC, or create multiple threads and use those sorts of communication mechanisms.

I also understand context switching. That is, with only once CPU, the operating system schedules time for each process (and there are tons of scheduling algorithms out there) and thereby we achieve running multiple processes simultaneously.

And now that we have multi-core processors (or multi-processor computers), we could have two processes running simultaneously on two separate cores.

My question is about the last scenario: how does the kernel control which core a process runs on? Which system calls (in Linux, or even Windows) schedule a process on a specific core?

The reason I'm asking: I'm working on a project for school where we are to explore a recent topic in computing - and I chose multi-core architectures. There seems to be a lot of material on how to program in that kind of environment (how to watch for deadlock or race conditions) but not much on controlling the individual cores themselves. I would love to be able to write a few demonstration programs and present some assembly instructions or C code to the effect of "See, I am running an infinite loop on the 2nd core, look at the spike in CPU utilization for that specific core".

Any code examples? Or tutorials?

edit: For clarification - many people have said that this is the purpose of the OS, and that one should let the OS take care of this. I completely agree! But then what I'm asking (or trying to get a feel for) is what the operating system actually does to do this. Not the scheduling algorithm, but more "once a core is chosen, what instructions must be executed to have that core start fetching instructions?"

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(9

櫻之舞 2024-07-22 21:53:14

正如其他人所提到的,处理器亲和力是操作系统特定的。 如果您想在操作系统的范围之外执行此操作,您会享受到很多乐趣,我的意思是痛苦。

也就是说,其他人提到了 SetProcessAffinityMask 对于 Win32。 没有人提到 Linux 内核设置处理器关联的方法,所以我会提到。 您需要使用 sched_setaffinity(2)< /a> 系统调用。 这是关于如何操作的一个很好的教程

此系统调用的命令行包装器是 taskset(1)。 例如
taskset -c 2,3 perf stat awk 'BEGIN{for(i=0;i<100000000;i++){}}' 将繁忙循环的 perf-stat 限制为在以下任意一个上运行核心 2 或 3(仍然允许它在核心之间迁移,但仅限于这两个核心之间)。

As others have mentioned, processor affinity is Operating System specific. If you want to do this outside the confines of the operating system, you're in for a lot of fun, and by that I mean pain.

That said, others have mentioned SetProcessAffinityMask for Win32. Nobody has mentioned the Linux kernel way to set processor affinity, and so I shall. You need to use the sched_setaffinity(2) system call. Here's a nice tutorial on how.

The command-line wrapper for this system call is taskset(1). e.g.
taskset -c 2,3 perf stat awk 'BEGIN{for(i=0;i<100000000;i++){}}' restricts that perf-stat of a busy-loop to running on either of core 2 or 3 (still allowing it to migrate between cores, but only between those two).

孤独难免 2024-07-22 21:53:14

通常,应用程序将在哪个核心上运行的决定是由系统决定的。 但是,您可以将应用程序与特定核心的“亲和性”设置为告诉操作系统仅在该核心上运行该应用程序。 通常这不是一个好主意,但在一些罕见的情况下它可能是有意义的。

要在 Windows 中执行此操作,请使用任务管理器,右键单击该进程,然后选择“设置关联性”。 您可以在 Windows 中使用 SetThreadAffinityMask、SetProcessAffinityMask 或 SetThreadIdealProcessor 等函数以编程方式完成此操作。

ETA:

如果您对操作系统实际如何进行调度感兴趣,您可能需要查看以下链接:

Wikipedia关于上下文切换的文章

关于调度的维基百科文章

Linux 内核中的调度

对于大多数现代操作系统,操作系统会调度一个线程在内核上短暂执行一段时间。 当时间片到期,或者线程执行IO操作导致其自愿让出核心时,操作系统将调度另一个线程在核心上运行(如果有任何线程准备运行)。 具体哪个线程被调度取决于操作系统的调度算法。

上下文切换到底如何发生的实现细节是CPU和CPU。 依赖于操作系统。 它通常会涉及切换到内核模式,操作系统保存前一个线程的状态,加载新线程的状态,然后切换回用户模式并恢复新加载的线程。 我上面链接的上下文切换文章对此有更多详细信息。

Normally the decision about which core an app will run on is made by the system. However, you can set the "affinity" for an application to a specific core to tell the OS to only run the app on that core. Normally this isn't a good idea, but there are some rare cases where it might make sense.

To do this in windows, use task manager, right click on the process, and choose "Set Affinity". You can do it programmatically in Windows using functions like SetThreadAffinityMask, SetProcessAffinityMask or SetThreadIdealProcessor.

ETA:

If you are interested in how the OS actually does the scheduling, you might want to check out these links:

Wikipedia article on context switching

Wikipedia article on scheduling

Scheduling in the linux kernel

With most modern OS's, the OS schedules a thread to execute on a core for a short slice of time. When the time slice expires, or the thread does an IO operation that causes it to voluntarily yield the core, the OS will schedule another thread to run on the core (if there are any threads ready to run). Exactly which thread is scheduled depends on the OS's scheduling algorithm.

The implementation details of exactly how the context switch occurs are CPU & OS dependent. It generally will involve a switch to kernel mode, the OS saving the state of the previous thread, loading the state of the new thread, then switching back to user mode and resuming the newly loaded thread. The context switching article I linked to above has a bit more detail about this.

第几種人 2024-07-22 21:53:14

没有任何东西告诉核心“现在开始运行这个进程”。

核心看不到进程,它只知道可执行代码和各种运行级别以及可执行指令的相关限制。

当计算机启动时,为了简单起见,只有一个核心/处理器处于活动状态并实际运行任何代码。 然后,如果操作系统支持多处理器,它会使用某些系统特定指令激活其他核心,其他核心很可能从与其他核心完全相同的位置获取并从那里运行。

因此,调度程序所做的就是查看操作系统内部结构(任务/进程/线程队列)并选择一个并将其标记为在其核心运行。 然后,在其他核心上运行的其他调度程序实例不会接触它,直到任务再次处于等待状态(并且未标记为固定到特定核心)。 任务被标记为正在运行后,调度程序执行切换到用户态,任务从之前暂停的位置恢复。

从技术上讲,没有任何东西可以阻止核心在完全相同的时间运行完全相同的代码(许多未锁定的函数都会这样做),但除非编写代码以期望这种情况,否则它可能会搞砸。

对于更奇特的内存模型,情况会变得更奇怪(上面假设“通常的”线性单个工作内存空间),其中核心不一定都看到相同的内存,并且可能需要从其他核心的离合器中获取代码,但通过简单的处理会更容易将任务固定在核心上(据我所知带有 SPU 的索尼 PS3 架构就是这样)。

Nothing tells core "now start running this process".

The core does not see process, it only knows about executable code and various running levels and associated limitations to instructions that can be executed.

When computer boots, for sake of simplicity only one core/processor is active and actually runs any code. Then if OS is MultiProcessor capable, it activates other cores with some system specific instruction, other cores most likely pick up from exactly same spot as other core and run from there.

So what scheduler does is it looks through OS internal structures (task/process/thread queue) and picks one and marks it as running at its core. Then other scheduler instances running on other cores won't touch it until the task is in waiting state again (and not marked as pinned to specific core). After task is marked as running, scheduler executes switch to userland with task resuming at the point it was previously suspended.

Technically there is nothing whatsoever stopping cores from running exact same code at exact same time (and many unlocked functions do), but unless code is written to expect that, it will probably piss all over itself.

Scenario goes weirder with more exotic memory models (above assumes "usual" linear single working memory space) where cores don't necessarily all see same memory and there may be requirements on fetching code from other core's clutches, but it's much easier handled by simply keeping task pinned to core (AFAIK Sony PS3 architecture with SPU's is like that).

时光暖心i 2024-07-22 21:53:14

要找出处理器的数量,而不是使用 /proc/cpuinfo 只需运行:

nproc

要在一组特定处理器上运行进程:

taskset --cpu-list 1,2 my_command 

会说我的命令只能在 cpu 1 或 2 上运行。

要在 4 个处理器上运行程序,请执行以下操作: 4 个不同的事物使用参数化。 程序的参数告诉它做一些不同的事情:

for i in `seq 0 1 3`;
do 
  taskset --cpu-list $i my_command $i;
done

一个很好的例子是处理数组中的 800 万个操作,以便 0 到 (2mil-1) 进入处理器 1,2mil 到 (4mil-1) 进入处理器 2等等。

您可以通过使用 apt-get/yum 安装 htop 并在命令行中运行来查看每个进程的负载:

 htop

To find out the number of processors instead of using /proc/cpuinfo just run:

nproc

To run a process on a group of specific processors:

taskset --cpu-list 1,2 my_command 

will say that my command can only run on cpu 1 or 2.

To run a program on 4 processors doing 4 different things use parameterization. The argument to the program tells it to do something different:

for i in `seq 0 1 3`;
do 
  taskset --cpu-list $i my_command $i;
done

A good example of this is dealing with 8 million operation in an array so that 0 to (2mil-1) goes to processor 1, 2mil to (4mil-1) to processor 2 and so on.

You can look at the load on each process by installing htop using apt-get/yum and running at the command line:

 htop
吃颗糖壮壮胆 2024-07-22 21:53:14

OpenMPI 项目有一个 以可移植方式在 Linux 上设置处理器亲和力的库。

不久前,我在一个项目中使用了这个,效果很好。

警告:我依稀记得在找出操作系统如何对核心进行编号时存在一些问题。 我在每个 4 核的 2 Xeon CPU 系统中使用了它。

查看 cat /proc/cpuinfo 可能会有所帮助。 在我使用的盒子上,它很奇怪。 最后是煮沸的输出。

显然,偶数核心位于第一个 cpu 上,奇数核心位于第二个 cpu 上。 但是,如果我没记错的话,缓存有问题。 在这些 Intel Xeon 处理器上,每个 CPU 上的两个内核共享其 L2 缓存(我不记得处理器是否有 L3 缓存)。 我认为虚拟处理器0和2共享一个L2缓存,1和3共享一个,4和6共享一个,5和7共享一个。

由于这种奇怪的现象(1.5 年前我找不到任何有关 Linux 中进程编号的文档),我会小心地进行这种低级调整。 然而,显然有一些用途。 如果您的代码在几种机器上运行,那么进行这种调整可能是值得的。 另一个应用程序将采用某种特定于域的语言,例如 StreamIt ,编译器可以在其中执行此操作肮脏的工作并计算一个明智的时间表。

processor       : 0
physical id     : 0
siblings        : 4
core id         : 0
cpu cores       : 4

processor       : 1
physical id     : 1
siblings        : 4
core id         : 0
cpu cores       : 4

processor       : 2
physical id     : 0
siblings        : 4
core id         : 1
cpu cores       : 4

processor       : 3
physical id     : 1
siblings        : 4
core id         : 1
cpu cores       : 4

processor       : 4
physical id     : 0
siblings        : 4
core id         : 2
cpu cores       : 4

processor       : 5
physical id     : 1
siblings        : 4
core id         : 2
cpu cores       : 4

processor       : 6
physical id     : 0
siblings        : 4
core id         : 3
cpu cores       : 4

processor       : 7
physical id     : 1
siblings        : 4
core id         : 3
cpu cores       : 4

The OpenMPI project has a library to set the processor affinity on Linux in a portable way.

Some while back, I have used this in a project and it worked fine.

Caveat: I dimly remember that there were some issues in finding out how the operating system numbers the cores. I used this in a 2 Xeon CPU system with 4 cores each.

A look at cat /proc/cpuinfo might help. On the box I used, it is pretty weird. Boiled down output is at the end.

Evidently, the evenly numbered cores are on the first cpu and the oddly numbered cores are on the second cpu. However, if I remember correctly, there was an issue with the caches. On these Intel Xeon processors, two cores on each CPU share their L2 caches (I do not remember whether the processor has an L3 cache). I think that the virtual processors 0 and 2 shared one L2 cache, 1 and 3 shared one, 4 and 6 shared one and 5 and 7 shared one.

Because of this weirdness (1.5 years back I could not find any documentation on the process numbering in Linux), I would be careful do do this kind of low level tuning. However, there clearly are some uses. If your code runs on few kinds of machines then it might be worth to do this kind of tuning. Another application would be in some domain specific language like StreamIt where the compiler could do this dirty work and compute a smart schedule.

processor       : 0
physical id     : 0
siblings        : 4
core id         : 0
cpu cores       : 4

processor       : 1
physical id     : 1
siblings        : 4
core id         : 0
cpu cores       : 4

processor       : 2
physical id     : 0
siblings        : 4
core id         : 1
cpu cores       : 4

processor       : 3
physical id     : 1
siblings        : 4
core id         : 1
cpu cores       : 4

processor       : 4
physical id     : 0
siblings        : 4
core id         : 2
cpu cores       : 4

processor       : 5
physical id     : 1
siblings        : 4
core id         : 2
cpu cores       : 4

processor       : 6
physical id     : 0
siblings        : 4
core id         : 3
cpu cores       : 4

processor       : 7
physical id     : 1
siblings        : 4
core id         : 3
cpu cores       : 4
偏爱自由 2024-07-22 21:53:14

Linux sched_setaffinity C 最小可运行示例

在这个示例中,我们获取亲和力,修改它,并检查它是否已生效 sched_getcpu()

main.c

#define _GNU_SOURCE
#include <assert.h>
#include <sched.h>
#include <stdbool.h>
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>

void print_affinity() {
    cpu_set_t mask;
    long nproc, i;

    if (sched_getaffinity(0, sizeof(cpu_set_t), &mask) == -1) {
        perror("sched_getaffinity");
        assert(false);
    }
    nproc = sysconf(_SC_NPROCESSORS_ONLN);
    printf("sched_getaffinity = ");
    for (i = 0; i < nproc; i++) {
        printf("%d ", CPU_ISSET(i, &mask));
    }
    printf("\n");
}

int main(void) {
    cpu_set_t mask;

    print_affinity();
    printf("sched_getcpu = %d\n", sched_getcpu());
    CPU_ZERO(&mask);
    CPU_SET(0, &mask);
    if (sched_setaffinity(0, sizeof(cpu_set_t), &mask) == -1) {
        perror("sched_setaffinity");
        assert(false);
    }
    print_affinity();
    /* TODO is it guaranteed to have taken effect already? Always worked on my tests. */
    printf("sched_getcpu = %d\n", sched_getcpu());
    return EXIT_SUCCESS;
}

GitHub 上游

编译并运行:

gcc -ggdb3 -O0 -std=c99 -Wall -Wextra -pedantic -o main.out main.c
./main.out

示例输出:

sched_getaffinity = 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 
sched_getcpu = 9
sched_getaffinity = 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 
sched_getcpu = 0

这意味着:

  • 最初,我的所有 16 个核心均已启用,并且进程在核心 9(第 10 个)上随机运行,
  • 在我们将亲和力设置为仅第一个核心后,进程被移动必然到核心 0(第一个)

通过任务集运行这个程序也很有趣:

taskset -c 1,3 ./a.out

它给出了形式的输出:

sched_getaffinity = 0 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 
sched_getcpu = 2
sched_getaffinity = 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 
sched_getcpu = 0

所以我们看到它从一开始就限制了亲和力。

这是有效的,因为亲和力是由子进程继承的,而 taskset 正在分叉:如何防止子进程继承 CPU 亲和力?

在 Ubuntu 16.04 中测试。

x86 裸机

如果您是铁杆用户:多核汇编语言是什么样的?

Linux 如何实现它

sched_setaffinity() 是如何工作的?

Python:os.sched_getaffinityos.sched_setaffinity

请参阅:如何使用python找出CPU的数量

Linux sched_setaffinity C minimal runnable example

In this example, we get the affinity, modify it, and check if it has taken effect with sched_getcpu().

main.c

#define _GNU_SOURCE
#include <assert.h>
#include <sched.h>
#include <stdbool.h>
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>

void print_affinity() {
    cpu_set_t mask;
    long nproc, i;

    if (sched_getaffinity(0, sizeof(cpu_set_t), &mask) == -1) {
        perror("sched_getaffinity");
        assert(false);
    }
    nproc = sysconf(_SC_NPROCESSORS_ONLN);
    printf("sched_getaffinity = ");
    for (i = 0; i < nproc; i++) {
        printf("%d ", CPU_ISSET(i, &mask));
    }
    printf("\n");
}

int main(void) {
    cpu_set_t mask;

    print_affinity();
    printf("sched_getcpu = %d\n", sched_getcpu());
    CPU_ZERO(&mask);
    CPU_SET(0, &mask);
    if (sched_setaffinity(0, sizeof(cpu_set_t), &mask) == -1) {
        perror("sched_setaffinity");
        assert(false);
    }
    print_affinity();
    /* TODO is it guaranteed to have taken effect already? Always worked on my tests. */
    printf("sched_getcpu = %d\n", sched_getcpu());
    return EXIT_SUCCESS;
}

GitHub upstream.

Compile and run:

gcc -ggdb3 -O0 -std=c99 -Wall -Wextra -pedantic -o main.out main.c
./main.out

Sample output:

sched_getaffinity = 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 
sched_getcpu = 9
sched_getaffinity = 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 
sched_getcpu = 0

Which means that:

  • initially, all of my 16 cores were enabled, and the process was randomly running on core 9 (the 10th one)
  • after we set the affinity to only the first core, the process was moved necessarily to core 0 (the first one)

It is also fun to run this program through taskset:

taskset -c 1,3 ./a.out

Which gives output of form:

sched_getaffinity = 0 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 
sched_getcpu = 2
sched_getaffinity = 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 
sched_getcpu = 0

and so we see that it limited the affinity from the start.

This works because the affinity is inherited by child processes, which taskset is forking: How to prevent inheriting CPU affinity by child forked process?

Tested in Ubuntu 16.04.

x86 bare metal

If you are that hardcore: What does multicore assembly language look like?

How Linux implements it

How does sched_setaffinity() work?

Python: os.sched_getaffinity and os.sched_setaffinity

See: How to find out the number of CPUs using python

穿透光 2024-07-22 21:53:14

正如其他人提到的,它是由操作系统控制的。 根据操作系统的不同,它可能会也可能不会为您提供允许您影响给定进程在哪个核心上执行的系统调用。 但是,您通常应该让操作系统执行默认行为。 如果您有一个 4 核系统,有 37 个进程正在运行,其中 34 个进程正在休眠,那么它将把剩余的 3 个活动进程调度到单独的内核上。

您可能只会在非常专业的多线程应用程序中使用核心亲和力时看到速度的提升。 例如,假设您的系统具有 2 个双核处理器。 假设您有一个具有 3 个线程的应用程序,其中两个线程对同一组数据进行大量操作,而第三个线程则使用不同的数据集。 在这种情况下,通过让两个线程在同一处理器上交互而第三个线程在另一个处理器上交互,您将受益匪浅,因为这样它们就可以共享缓存。 操作系统不知道每个线程需要访问哪些内存,因此它可能无法正确地将线程分配给内核。

如果您对操作系统如何感兴趣,请阅读调度。 x86 上多处理的具体细节可以在 Intel 64 和 IA-32 架构软件中找到开发者手册。 第 3A 卷第 7 章和第 8 章包含相关信息,但请记住这些手册技术性很强。

As others have mentioned, it's controlled by the operating system. Depending on the OS, it may or may not provide you with system calls that allow you to affect what core a given process executes on. However, you should usually just let the OS do the default behavior. If you have a 4-core system with 37 processes running, and 34 of those processes are sleeping, it's going to schedule the remaining 3 active processes onto separate cores.

You'll likely only see a speed boost on playing with core affinities in very specialized multithreaded applications. For example, suppose you have a system with 2 dual-core processors. Suppose you have an application with 3 threads, and two of threads operate heavily on the same set of data, whereas the third thread uses a different set of data. In this case, you would benefit the most by having the two threads which interact on the same processor and the third thread on the other processor, since then they can share a cache. The OS has no idea what memory each thread needs to access, so it may not allocate threads to cores appropriately.

If you're interested in how the operating system, read up on scheduling. The nitty gritty details of multiprocessing on x86 can be found in the Intel 64 and IA-32 Architectures Software Developer's Manuals. Volume 3A, Chapters 7 and 8 contain relevant information, but bear in mind these manuals are extremely technical.

李白 2024-07-22 21:53:14

我不知道组装说明。
但Windows API函数是SetProcessAffinityMask
您可以看到 一个示例,是我不久前拼凑起来的仅在一个核心上运行 Picasa 的示例

I don't know the assembly instructions.
But the windows API function is SetProcessAffinityMask.
You can see an example of something I cobbled together a while ago to run Picasa on only one core

捶死心动 2024-07-22 21:53:14

操作系统知道如何执行此操作,您不必这样做。 如果指定在哪个核心上运行,您可能会遇到各种问题,其中一些问题实际上可能会减慢进程速度。 让操作系统来解决这个问题,您只需要启动新线程即可。

例如,如果您告诉一个进程在核心 x 上启动,但核心 x 已经处于重负载下,那么您的情况会比直接让操作系统处理它更糟糕。

The OS knows how to do this, you do not have to. You could run into all sorts of issues if you specified which core to run on, some of which could actually slow the process down. Let the OS figure it out, you just need to start the new thread.

For example, if you told a process to start on core x, but core x was already under a heavy load, you would be worse off than if you had just let the OS handle it.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文