如何检测 Windows、Mac 和 Linux 上的物理处理器/内核数量
我有一个多线程 C++ 应用程序,可以在 Windows、Mac 和一些 Linux 版本上运行。
长话短说:为了使其以最高效率运行,我必须能够为每个物理处理器/核心实例化一个线程。创建比物理处理器/内核更多的线程会大大降低程序的性能。我已经可以在所有这三个平台上正确检测逻辑处理器/核心的数量。为了能够正确检测物理处理器/核心的数量,我必须检测超线程是否受支持且处于活动状态。
因此,我的问题是是否有办法检测是否支持并启用超线程?如果是这样,具体如何。
I have a multi threaded c++ application that runs on Windows, Mac and a few Linux flavors.
To make a long story short: In order for it to run at maximum efficiency, I have to be able to instantiate a single thread per physical processor/core. Creating more threads than there are physical processors/cores degrades the performance of my program considerably. I can already correctly detect the number of logical processors/cores correctly on all three of these platforms. To be able to detect the number of physical processors/cores correctly I'll have to detect if hyper-treading is supported AND active.
My question therefore is if there is a way to detect whether Hyper Threading is supported and enabled? If so, how exactly.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(14)
编辑:由于英特尔持续的困惑,这不再是100%正确的。
我理解这个问题的方式是,你问的是如何检测CPU核心与CPU线程的数量,这与检测不同系统中逻辑和物理核心的数量。 CPU 内核通常不被操作系统视为物理内核,除非它们有自己的封装或芯片。因此,操作系统将报告,例如,Core 2 Duo 有 1 个物理 CPU 和 2 个逻辑 CPU,并且具有超线程的 Intel P4 将以完全相同的方式报告,尽管 2 个超线程与 2 个 CPU 内核是非常不同的。不同的事情表现明智。
我一直在努力解决这个问题,直到我拼凑出下面的解决方案,我相信该解决方案适用于 AMD 和 Intel 处理器。据我所知,我可能是错的,AMD 还没有 CPU 线程,但他们提供了一种检测它们的方法,我认为这种方法将适用于未来可能有 CPU 线程的 AMD 处理器。
简而言之,以下是使用 CPUID 指令的步骤:
听起来很困难,但在这里希望是一个独立于平台的 C++ 程序,它可以实现这一点:
我还没有在 Windows 或 OSX 上实际测试过它,但它应该可以工作,因为 CPUID 指令在 i686 机器上有效。显然,这不适用于 PowerPC,但它们也没有超线程。
以下是几台不同 Intel 机器上的输出:
Intel(R) Core(TM)2 Duo CPU T7500 @ 2.20GHz:
Intel(R) Core(TM)2 Quad CPU Q8400 @ 2.66GHz:
Intel(R) Xeon(R) ) CPU E5520 @ 2.27GHz(带 x2 物理 CPU 包):
Intel(R) Pentium(R) 4 CPU 3.00GHz:
EDIT: This is no longer 100% correct due to Intel's ongoing befuddlement.
The way I understand the question is that you are asking how to detect the number of CPU cores vs. CPU threads which is different from detecting the number of logical and physical cores in a system. CPU cores are often not considered physical cores by the OS unless they have their own package or die. So an OS will report that a Core 2 Duo, for example, has 1 physical and 2 logical CPUs and an Intel P4 with hyper-threads will be reported exactly the same way even though 2 hyper-threads vs. 2 CPU cores is a very different thing performance wise.
I struggled with this until I pieced together the solution below, which I believe works for both AMD and Intel processors. As far as I know, and I could be wrong, AMD does not yet have CPU threads but they have provided a way to detect them that I assume will work on future AMD processors which may have CPU threads.
In short here are the steps using the CPUID instruction:
Sounds difficult but here is a, hopefully, platform independent C++ program that does the trick:
I haven't actually tested this on Windows or OSX yet but it should work as the CPUID instruction is valid on i686 machines. Obviously, this wont work for PowerPC but then they don't have hyper-threads either.
Here is the output on a few different Intel machines:
Intel(R) Core(TM)2 Duo CPU T7500 @ 2.20GHz:
Intel(R) Core(TM)2 Quad CPU Q8400 @ 2.66GHz:
Intel(R) Xeon(R) CPU E5520 @ 2.27GHz (w/ x2 physical CPU packages):
Intel(R) Pentium(R) 4 CPU 3.00GHz:
请注意,这里并没有给出预期的物理核心数量,而是逻辑核心数量。
如果您可以使用 C++11(感谢下面 alfC 的评论):
否则,Boost 库可能是一个选项你。相同的代码但不同的包括如上所述。包含
而不是
。Note this, does not give the number of physically cores as intended, but logical cores.
If you can use C++11 (thanks to alfC's comment beneath):
Otherwise maybe the Boost library is an option for you. Same code but different include as above. Include
<boost/thread.hpp>
instead of<thread>
.此处描述的仅限 Windows 的解决方案:
GetLogicalProcessorInformation
for linux, /proc/cpuinfo 文件。我没有运行Linux
现在无法向您提供更多详细信息。你可以算一下
物理/逻辑处理器实例。如果逻辑计数
是物理的两倍,那么你就启用了 HT
(仅适用于 x86)。
Windows only solution desribed here:
GetLogicalProcessorInformation
for linux, /proc/cpuinfo file. I am not running linux
now so can't give you more detail. You can count
physical/logical processor instances. If logical count
is twice as physical, then you have HT enabled
(true only for x86).
当前使用 CPUID 得票最高的答案似乎已过时。它报告了错误的逻辑和物理处理器数量。这个答案似乎证实了这一点cpuid-on-intel-i7-processors。
具体来说,使用 CPUID.1.EBX[23:16] 获取逻辑处理器或使用 CPUID.4.EAX[31:26]+1 获取具有 Intel 处理器的物理处理器,不会在任何 Intel 处理器上给出正确的结果有。
对于 Intel CPUID.Bh,应使用 Intel_thread/Fcore 和缓存拓扑。这个解决方案看起来并不简单。对于 AMD 来说,需要不同的解决方案。
以下是英特尔的源代码,它报告了正确的物理和逻辑核心数量以及正确的插槽数量https://software.intel.com/en-us/articles/intel-64-architecture-processor-topology-enumeration/。我在 80 个逻辑核心、40 个物理核心、4 插槽 Intel 系统上对此进行了测试。
这是 AMD 的源代码 http://developer .amd.com/resources/documentation-articles/articles-whitepapers/processor-and-core-enumeration-using-cpuid/。它在我的单插槽英特尔系统上给出了正确的结果,但在我的四插槽系统上却给出了正确的结果。我没有 AMD 系统可供测试。
我还没有剖析源代码来找到一个带有 CPUID 的简单答案(如果存在的话)。看来,如果解决方案可以改变(看起来确实如此),那么最好的解决方案是使用库或操作系统调用。
编辑:
这是适用于 CPUID leaf 11 (Bh) 的英特尔处理器的解决方案。执行此操作的方法是循环逻辑处理器,从 CPUID 获取每个逻辑处理器的 x2APIC ID,并在最低有效位为零时计算 x2APIC ID 的数量。对于没有超线程的系统,x2APIC ID 将始终是偶数。对于具有超线程的系统,每个 x2APIC ID 将具有偶数和奇数版本。
必须绑定线程才能使其工作。 OpenMP 默认情况下不绑定线程。设置
export OMP_PROC_BIND=true
将绑定它们,或者可以将它们绑定在代码中,如 线程关联性-与-windows-msvc-and-openmp。我在 4 核/8 HT 系统上对此进行了测试,无论是否在 BIOS 中禁用了超线程,它都返回 4。我还在 4 插槽系统上进行了测试,每个插槽有 10 个核心/20 个 HT,它返回了 40 个核心。
AMD 处理器或没有 CPUID leaf 11 的较旧 Intel 处理器必须执行不同的操作。
The current highest voted answer using CPUID appears to be obsolete. It reports both the wrong number of logical and physical processors. This appears to be confirmed from this answer cpuid-on-intel-i7-processors.
Specifically, using CPUID.1.EBX[23:16] to get the logical processors or CPUID.4.EAX[31:26]+1 to get the physical ones with Intel processors does not give the correct result on any Intel processor I have.
For Intel CPUID.Bh should be used Intel_thread/Fcore and cache topology. The solution does not appear to be trivial. For AMD a different solution is necessary.
Here is source code by by Intel which reports the correct number of physical and logical cores as well as the correct number of sockets https://software.intel.com/en-us/articles/intel-64-architecture-processor-topology-enumeration/. I tested this on a 80 logical core, 40 physical core, 4 socket Intel system.
Here is source code for AMD http://developer.amd.com/resources/documentation-articles/articles-whitepapers/processor-and-core-enumeration-using-cpuid/. It gave the correct result on my single socket Intel system but not on my four socket system. I don't have a AMD system to test.
I have not dissected the source code yet to find a simple answer (if one exists) with CPUID. It seems that if the solution can change (as it seems to have) that the best solution is to use a library or OS call.
Edit:
Here is a solution for Intel processors with CPUID leaf 11 (Bh). The way to do this is loop over the logical processors and get the x2APIC ID for each logical processor from CPUID and count the number of x2APIC IDs were the least significant bit is zero. For systems without hyper-threading the x2APIC ID will always be even. For systems with hyper-threading each x2APIC ID will have an even and odd version.
The threads must be bound for this to work. OpenMP by default does not bind threads. Setting
export OMP_PROC_BIND=true
will bind them or they can be bound in code as shown at thread-affinity-with-windows-msvc-and-openmp.I tested this on my 4 core/8 HT system and it returned 4 with and without hyper-threading disabled in the BIOS. I also tested in on a 4 socket system with each socket having 10 cores / 20 HT and it returned 40 cores.
AMD processors or older Intel processors without CPUID leaf 11 have to do something different.
由于每个核心看似不必要地添加更多线程,并且这些线程在每个核心上分布不均匀,因此这不再有效。
从上述一些想法中收集想法和概念,我已经想出了这个解决方案。请大家批评指正。
对于几乎每个操作系统,标准的“获取核心计数”功能都会返回逻辑核心计数。但为了获得物理核心数,我们必须首先检测CPU是否具有超线程。
我们现在有了逻辑核心数,现在为了获得预期结果,我们首先必须检查超线程是否正在使用或者是否可用。
因为没有一款具有超线程功能的 Intel CPU 只能对一个核心进行超线程(至少我读到的不是这样)。这让我们发现这是一种真正无痛的方法。如果超线程可用,逻辑处理器将恰好是物理处理器的两倍。否则,操作系统将为每个核心检测一个逻辑处理器。这意味着逻辑和物理核心数量将相同。
Due to the seemingly needless addition of even more threads per core, and these thread being unevenly distributed on a per core basis, this is no longer valid.
From gathering ideas and concepts from some of the above ideas, I have come up with this solution. Please critique.
For almost every OS, the standard "Get core count" feature returns the logical core count. But in order to get the physical core count, we must first detect if the CPU has hyper threading or not.
We now have the logical core count, now in order to get the intended results, we first must check if hyper threading is being used or if it's even available.
Because there is not an Intel CPU with hyper threading that will only hyper thread one core (at least not from what I have read). This allows us to find this is a really painless way. If hyper threading is available,the logical processors will be exactly double the physical processors. Otherwise, the operating system will detect a logical processor for every single core. Meaning the logical and the physical core count will be identical.
根据数学的答案,从 boost 1.56 开始,存在physical_concurrency 属性,它完全可以满足您的需求。
来自文档 - http ://www.boost.org/doc/libs/1_56_0/doc/html/thread/thread_management.html#thread.thread_management.thread.physical_concurrency
所以一个例子是
To follow on from math's answer, as of boost 1.56 there exists the physical_concurrency attribute which does exactly what you want.
From the documentation - http://www.boost.org/doc/libs/1_56_0/doc/html/thread/thread_management.html#thread.thread_management.thread.physical_concurrency
So an example would be
我知道这是一个旧线程,但没有人提到hwloc。 hwloc 库可在大多数 Linux 发行版上使用,也可以在 Windows 上编译。以下代码将返回物理处理器的数量。 4 在 i7 CPU 的情况下。
I know this is an old thread, but no one mentioned hwloc. The hwloc library is available on most Linux distributions and can also be compiled on Windows. The following code will return the number of physical processors. 4 in the case of a i7 CPU.
仅测试 Intel CPU 是否具有超线程是不够的,还需要测试超线程是否启用或禁用。没有记录的方法来检查这一点。一位 Intel 人员想出了这个技巧来检查是否启用了超线程:使用 CPUID[0xa].eax[15:8] 检查可编程性能计数器的数量,并假设如果该值为 8,则 HT 被禁用,如果值为 4,HT 已启用 (https://software .intel.com/en-us/forums/intel-isa-extensions/topic/831551)。
AMD 芯片上没有问题:CPUID 报告每个核心 1 或 2 个线程,具体取决于同时多线程是禁用还是启用。
您还必须将 CPUID 中的线程计数与操作系统报告的线程计数进行比较,以查看是否存在多个 CPU 芯片。
我创建了一个函数来实现这一切。它报告物理处理器的数量和逻辑处理器的数量。我已经在 Windows 和 Linux 中的 Intel 和 AMD 处理器上进行了测试。它应该也可以在 Mac 上运行。我已将这段代码发布于
https://github.com/vectorclass/add-on/tree/master/physical_processors
It is not sufficient to test if an Intel CPU has hyperthreading, you also need to test if hyperthreading is enabled or disabled. There is no documented way to check this. An Intel guy came up with this trick to check if hyperthreading is enabled: Check the number of programmable performance counters using CPUID[0xa].eax[15:8] and assume that if the value is 8, HT is disabled, and if the value is 4, HT is enabled (https://software.intel.com/en-us/forums/intel-isa-extensions/topic/831551).
There is no problem on AMD chips: The CPUID reports 1 or 2 threads per core depending on whether simultaneous multithreading is disabled or enabled.
You also have to compare the thread count from the CPUID with the thread count reported by the operating system to see if there are multiple CPU chips.
I have made a function that implements all of this. It reports both the number of physical processors and the number of logical processors. I have tested it on Intel and AMD processors in Windows and Linux. It should work on Mac as well. I have published this code at
https://github.com/vectorclass/add-on/tree/master/physical_processors
在 OS X 上,您可以从 sysctl(3)(C API 或同名的命令行实用程序)读取这些值。手册页应该为您提供使用信息。您可能会对以下键感兴趣:
On OS X, you can read these values from
sysctl(3)
(the C API, or the command line utility of the same name). The man page should give you usage information. The following keys may be of interest:在 Windows 上,
GetLogicalProcessorInformation
和GetLogicalProcessorInformationEx
分别适用于 Windows XP SP3 或更早版本和 Windows 7+。不同之处在于 GetLogicalProcessorInformation 不支持超过 64 个逻辑核心的设置,这对于服务器设置可能很重要,但如果您使用的是 XP,则可以随时回退到GetLogicalProcessorInformation
。GetLogicalProcessorInformationEx
的用法示例(源):在 Linux 上,解析
/proc/cpuinfo
可能会有所帮助。On Windows, there are
GetLogicalProcessorInformation
andGetLogicalProcessorInformationEx
available for Windows XP SP3 or older and Windows 7+ respectively. The difference is that GetLogicalProcessorInformation doesn't support setups with more than 64 logical cores, which might be important for server setups, but you can always fall back toGetLogicalProcessorInformation
if you're on XP. Example usage forGetLogicalProcessorInformationEx
(source):On Linux, parsing
/proc/cpuinfo
might help.我不知道这三个都以相同的方式公开信息,但是如果您可以安全地假设 NT 内核将根据 POSIX 标准(NT 应该支持该标准)报告设备信息,那么您可以解决这个问题标准。
然而,设备管理的不同经常被认为是跨平台开发的绊脚石之一。我最多将其实现为三股逻辑,我不会尝试编写一段代码来均匀地处理所有平台。
好吧,所有这些都是假设 C++ 的。对于 ASM,我猜你只会在 x86 或 amd64 CPU 上运行?您仍然需要两条分支路径,每个架构各一条,并且您需要将 Intel 与 AMD (IIRC) 分开进行测试,但总的来说,您只需检查 CPUID。这就是你想要寻找的吗? Intel/AMD 系列 CPU 上 ASM 的 CPUID?
I don't know that all three expose the information in the same way, but if you can safely assume that the NT kernel will report device information according to the POSIX standard (which NT supposedly has support for), then you could work off that standard.
However, differing of device management is often cited as one of the stumbling blocks to cross platform development. I would at best implement this as three strands of logic, I wouldn't try to write one piece of code to handle all platforms evenly.
Ok, all that's assuming C++. For ASM, I presume you'll only be running on x86 or amd64 CPUs? You'll still need two branch paths, one for each architecture, and you'll need to test Intel separate from AMD (IIRC) but by and large you just check for the CPUID. Is that what you're trying to find? The CPUID from ASM on Intel/AMD family CPUs?
OpenMP 应该可以解决这个问题:
大多数编译器都支持 OpenMP。如果您使用基于 gcc 的编译器(*nix、MacOS),则需要使用以下方式进行编译:(
您可能还需要告诉编译器使用 stdc++ 库):
据我所知,OpenMP 旨在解决此类问题的问题。
OpenMP should do the trick:
most compilers support OpenMP. If you are using a gcc-based compiler (*nix, MacOS), you need to compile using:
(you might also need to tell your compiler to use the stdc++ library):
As far as I know OpenMP was designed to solve this kind of problems.
这在 Python 中很容易做到:
也许你可以查看 psutil 源代码来看看发生了什么?
This is very easy to do in Python:
Maybe you could look at the
psutil
source code to see what is going on?您可以使用库
libcpuid
(也在 GitHub 上 -libcpuid
)。从其文档页面可以看出:
data.num_cores
,保存CPU的物理核心数。You may use the library
libcpuid
(Also on GitHub -libcpuid
).As can be seen in its documentation page:
As can be seen,
data.num_cores
, holds the number of Physical cores of the CPU.