使用现代操作系统调度程序,手动将进程锁定到特定的 CPU/内核是否仍然有意义?
我最近了解到,有时人们会将特定的进程或线程锁定到特定的处理器或内核,并且人们认为这种手动调整将最好地分配负载。这对我来说有点违反直觉——我认为操作系统调度程序能够比人类在如何分散负载方面做出更好的决定。我可以看到对于较旧的操作系统来说这是正确的,这些操作系统可能没有意识到诸如特定核心对之间的延迟更长,或者一对核心之间而不是另一对核心之间的共享缓存之类的问题。但我认为像 Linux、Solaris 10、OS X 和 Vista 这样的“现代”操作系统应该有知道这些信息的调度程序。我对他们的能力有误解吗?我是否误认为这是操作系统实际上可以解决的问题?我对 Solaris 和 Linux 的答案特别感兴趣。
结果是我是否需要告知我的(多线程)软件的用户他们可能会如何考虑在他们的盒子上进行平衡。
I recently learned that sometimes people will lock specific processes or threads to specific processors or cores, and it's thought that this manual tuning will best distribute the load. This is a bit counter-intuitive to me -- I would think the OS scheduler would be able to make a better decision than a human about how to spread the load. I could see it being true for older operating systems that perhaps weren't aware of issues like their being more latency between specific pairs of cores, or shared cache between one pair of cores but not another pair. But I assume 'modern' OSs like Linux, Solaris 10, OS X, and Vista should have schedulers that know this information. Am I mistaken about their capabilities? Am I mistaken that it's a problem the OS can actually solve? I'm particularly interested in the answer for Solaris and Linux.
The consequence is whether or not I need to inform users of my (multithreaded) software of how they might consider balancing on their box.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(7)
首先,“锁定”并不是描述它的正确术语。 “亲和力”这个词更合适。
大多数情况下,你不需要关心它。但是,在某些情况下,手动设置 CPU/进程/线程关联性可能会有所帮助。
操作系统通常忽视现代多核架构的细节。例如,假设我们有 2 插槽四核处理器,并且该处理器支持 SMT(=HyperThreading )。在本例中,我们有 2 个处理器、8 个内核和 16 个硬件线程。因此,操作系统将看到 16 个逻辑处理器。如果操作系统无法识别这种层次结构,则很可能会失去一些性能提升。原因是:
缓存:在我们的示例中,两个不同的处理器(安装在两个不同的插槽上)不共享任何片上缓存。假设一个应用程序有 4 个繁忙运行的线程,并且线程共享大量数据。如果操作系统跨处理器调度线程,那么我们可能会丢失一些缓存局部性,从而导致性能损失。然而,线程不共享太多数据(具有不同的工作集),那么通过增加有效缓存容量来分离到不同的物理处理器会更好。此外,可能会发生更棘手的情况,操作系统很难意识到这一点。
资源冲突:让我们考虑 SMT(=HyperThreading) 情况。 SMT共享CPU的很多重要资源,如缓存、TLB、执行单元等。假设只有两个繁忙的线程。但是,操作系统可能会愚蠢地将这两个线程调度到同一物理核心的两个逻辑处理器上。在这种情况下,两个逻辑线程会争夺大量资源。
Windows 7 就是一个很好的例子。Windows 7 现在支持考虑 SMT 的智能调度策略 (相关文章)。 Windows 7 实际上可以防止上述 2. 情况。以下是 Windows 7 中任务管理器的快照,Core i7(四核超线程 = 8 个逻辑处理器)负载为 20%:
(来源:egloos.com)
CPU 使用历史非常有趣,不是吗? :) 您可能会看到仅使用成对的单个 CPU,这意味着 Windows 7 尽可能避免在同一核心上同时调度两个线程。这一政策必将减少SMT带来的资源冲突等负面影响。
我想说操作系统不太聪明,无法理解现代多核架构,其中有大量缓存、共享末级缓存、SMT,甚至 NUMA。因此,您可能有充分的理由需要手动设置 CPU/进程/线程关联性。
但是,我不会说这是真正需要的。只有当您完全了解您的工作负载模式和系统架构时,才可以尝试。并且,看看结果是否有效。
First of all, 'Lock' is not a correct term to describe it. 'Affinity' is more suitable term.
In most case, you don't need to care about it. However, in some cases, manually setting CPU/Process/Thread affinity could be beneficial.
Operating systems are usually oblivious to the details of modern multicore architecture. For example, say we have 2-socket quadcore processors, and the processor supports SMT(=HyperThreading). In this case, we have 2 processors, 8 cores, and 16 hardware threads. So, OS will see 16 logical processors. If an OS does not recognize such hierarchy, it is highly likely to lose some performance gains. The reasons are:
Caches: in our example, two different processors (installed on two different sockets) are not sharing any on-chip caches. Say an application has 4 busy-running threads and a lot of data are shared by threads. If an OS schedules the threads across the processors, then we may lose some cache locality, resulting in performance lose. However, the threads are not sharing much data (having distinct working set), then separating to different physical processors would be better by increasing effective cache capacity. Also, more tricky scenario could be happen, which is very hard for OS to be aware of.
Resource conflict: let's consider SMT(=HyperThreading) case. SMT shares a lot of important resources of CPU such as caches, TLB, and execution units. Say there are only two busy threads. However, an OS may stupidly schedule these two threads on two logical processors from the same physical core. In such case, a significant resources are contended by two logical threads.
One good example is Windows 7. Windows 7 now supports a smart scheduling policy that consider SMT (related article). Windows 7 actually prevents the above 2. case. Here is a snapshot of task manger in Windows 7 with 20% load on Core i7 (quadcore with HyperThreading = 8 logical processors):
(source: egloos.com)
The CPU usage history is very interesting, isn't? :) You may see that only a single CPU in pairs is utilized, meaning Windows 7 avoids scheduling two threads on a same core simultaneously as possible. This policy will definitely decrease the negative effects of SMT such as resource conflict.
I'd like to say OS are not very smart to understand modern multicore architecture where a lot of caches, shared last-level cache, SMT, and even NUMA. So, there could be good reasons you may need to manually set CPU/process/thread affinity.
However, I won't say this is really needed. Only when you fully understand your workload patterns and your system architecture, then try it on. And, see the results whether your try is effective.
对于通用应用程序,没有理由设置CPU亲和力;你应该只允许操作系统调度程序选择哪个CPU应该运行进程或线程。但是,有些情况下需要设置 CPU 关联性。例如,在实时系统中,将线程从一个核心迁移到另一个核心的成本(如果未设置 CPU 关联性,这种情况随时可能发生)可能会引入不可预测的延迟,从而导致任务错过其最后期限,从而导致任务延迟。排除实时保证。
您可以查看这篇文章,了解< a href="http://www.cs.wustl.edu/~lu/papers/rtcsa09-mc.pdf" rel="nofollow noreferrer">实时 CORBA 的多核感知实现除此之外,还必须设置 CPU 关联性,以便 CPU 迁移不会导致错过最后期限。
论文是:多处理器和多核 Linux 的实时性能和中间件平台
For general-purpose applications, there is no reason to set the CPU affinity; you should just allow the OS scheduler to choose which CPU should run the process or thread. However, there are instances where it is necessary to set the CPU affinity. For example, in real-time systems where the cost of migrating a thread from one core to another (which can happen at any time if the CPU affinity has not been set) can introduce unpredictable delays that can cause tasks to miss their deadlines and which preclude real-time guarantees.
You can take a look at this article about a multi-core aware implementation of real-time CORBA that, among other things, had to set the CPU affinity so that CPU migration could not result in missed deadlines.
The paper is: Real-Time Performance and Middleware for Multiprocessor and Multicore Linux Platforms
对于在设计时考虑并行性和多核的应用程序,操作系统默认的线程亲和力有时是不够的。并行性的方法有很多种,但到目前为止,所有方法都需要程序员的参与以及至少在某种程度上了解解决方案所映射的架构的知识。这包括所涉及的机器、CPU 和线程。
这是一个积极研究的课题,麻省理工学院的开放课程有一个很好的课程深入研究了这些问题:http://ocw.mit.edu/OcwWeb/Electrical-Engineering-and-Computer-Science/6-189January--IAP--2007 /课程首页/
For applications designed with parallelism and multiple cores in mind, OS-default thread affinity is sometimes not enough. There are many approaches to parallelism, but so far all require involvement of the programmer and knowledge - at some level at least - of the architecture on which the solution will be mapped. This includes the machines, CPU's and threads that are involved.
This is an actively researched subject, and there is an excellent course on MIT's OpenCourseWare that delves into these issues: http://ocw.mit.edu/OcwWeb/Electrical-Engineering-and-Computer-Science/6-189January--IAP--2007/CourseHome/
很多人没有想到的是禁止两个进程在同一处理器(套接字)上运行的想法。帮助系统将不同的频繁使用的进程绑定到不同的处理器可能是值得的。如果调度程序不够聪明,无法自行解决,这可以避免争用。
但这更多的是一项系统管理任务,而不是程序员的任务。我见过一些高性能数据库服务器的类似优化。
Well something many people haven't thought here is the idea of forbidding two processes to run on the same processor (socket). It might be worth to help the system to bound different heavily used processes to different processors. This can avoid contention if the scheduler is not clever enough to figure it out itself.
But this is more a system admin task then one for the programmers. I have seen optimizations like this for a few high performance database servers.
大多数现代操作系统都会有效地在内核之间分配工作。他们还尝试保持线程在同一核心上运行,以获得您提到的缓存优势。
一般来说,除非有充分的理由,否则永远不应该设置线程关联性。您无法像操作系统那样深入了解系统上线程正在执行的其他工作。内核根据新的处理器技术不断更新(每个插槽单个 CPU 到超线程到每个插槽多个内核)。您设置硬关联的任何尝试都可能在未来的平台上适得其反。
Most modern operating systems will do an effective job of allocating work between cores. They also attempt to keep threads running on the same core, to get the cache benefits you mentioned.
In general, you should never be setting your thread affinity unless you have a very good reason to. You don't have as good an insight as the OS into the other work that threads on the system are doing. Kernels are constantly being updated based on new processor technology (single CPU per socket to hyper threading to multiple cores per sockets). Any attempt by you to set hard affinity may backfire on future platforms.
本文来自 MSDN 杂志,使用并发实现可扩展性< /em>,很好地概述了 Win32 上的多线程。关于CPU亲和力,
该文章还警告说,在没有深入了解问题的情况下,不应操纵 CPU 关联性。根据这些信息,我对你的问题的回答是“否”,除非是非常具体、易于理解的场景。
This article from MSDN Magazine, Using concurrency for scalability, gives a good overview of multithreading on Win32. Regarding CPU affinity,
The article also warns that CPU affinity shouldn't be manipulated without a deep understanding of the problem. Based on this information, my answer to your question would be No, except for very specific, well-understood scenarios.
我什至不确定你是否可以将进程固定到 Linux 上的特定 CPU。所以,我的答案是“不”——让操作系统来处理它,大多数时候它比你更聪明。编辑:
看来在 win32 上您可以控制要运行该进程的 CPU 系列。现在我只等待有人在 linux/posix 上证明我是错的......
I am not even sure you can pin processes to a specific CPU on linux.So, my answer is "NO" - let the OS handle it, it's smarter then you most of the time.Edit:
It seems that on win32 you have some control over which CPU family are you going to run this process. Now I only wait for someone to prove me wrong also on linux/posix ...