如何检测 Windows、Mac 和 Linux 上的物理处理器/内核数量

发布于 2024-09-02 14:46:29 字数 250 浏览 9 评论 0原文

我有一个多线程 C++ 应用程序,可以在 Windows、Mac 和一些 Linux 版本上运行。

长话短说:为了使其以最高效率运行,我必须能够为每个物理处理器/核心实例化一个线程。创建比物理处理器/内核更多的线程会大大降低程序的性能。我已经可以在所有这三个平台上正确检测逻辑处理器/核心的数量。为了能够正确检测物理处理器/核心的数量,我必须检测超线程是否受支持且处于活动状态。

因此,我的问题是是否有办法检测是否支持并启用超线程?如果是这样,具体如何。

I have a multi threaded c++ application that runs on Windows, Mac and a few Linux flavors.

To make a long story short: In order for it to run at maximum efficiency, I have to be able to instantiate a single thread per physical processor/core. Creating more threads than there are physical processors/cores degrades the performance of my program considerably. I can already correctly detect the number of logical processors/cores correctly on all three of these platforms. To be able to detect the number of physical processors/cores correctly I'll have to detect if hyper-treading is supported AND active.

My question therefore is if there is a way to detect whether Hyper Threading is supported and enabled? If so, how exactly.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(14

盗心人 2024-09-09 14:46:29

编辑:由于英特尔持续的困惑,这不再是100%正确的。

我理解这个问题的方式是,你问的是如何检测CPU核心与CPU线程的数量,这与检测不同系统中逻辑和物理核心的数量。 CPU 内核通常不被操作系统视为物理内核,除非它们有自己的封装或芯片。因此,操作系统将报告,例如,Core 2 Duo 有 1 个物理 CPU 和 2 个逻辑 CPU,并且具有超线程的 Intel P4 将以完全相同的方式报告,尽管 2 个超线程与 2 个 CPU 内核是非常不同的。不同的事情表现明智。

我一直在努力解决这个问题,直到我拼凑出下面的解决方案,我相信该解决方案适用于 AMD 和 Intel 处理器。据我所知,我可能是错的,AMD 还没有 CPU 线程,但他们提供了一种检测它们的方法,我认为这种方法将适用于未来可能有 CPU 线程的 AMD 处理器。

简而言之,以下是使用 CPUID 指令的步骤:

  1. 使用 CPUID 函数检测 CPU 供应商 0
  2. 检查 CPU 功能中的 HTT 位 28 来自 CPUID 函数的 EDX 1 从
  3. CPUID 函数的 EBX[23:16] 获取逻辑核心计数 1
  4. 获取实际非- 线程CPU核心数
    1. 如果供应商 == 'GenuineIntel',则为 1 加上 CPUID 函数 4 中的 EAX[31:26]
    2. 如果供应商 == 'AuthenticAMD',则为 1 加上来自 CPUID 函数 0x80000008 的 ECX[7:0]

听起来很困难,但在这里希望是一个独立于平台的 C++ 程序,它可以实现这一点:

#include <iostream>
#include <string>

using namespace std;


void cpuID(unsigned i, unsigned regs[4]) {
#ifdef _WIN32
  __cpuid((int *)regs, (int)i);

#else
  asm volatile
    ("cpuid" : "=a" (regs[0]), "=b" (regs[1]), "=c" (regs[2]), "=d" (regs[3])
     : "a" (i), "c" (0));
  // ECX is set to zero for CPUID function 4
#endif
}


int main(int argc, char *argv[]) {
  unsigned regs[4];

  // Get vendor
  char vendor[12];
  cpuID(0, regs);
  ((unsigned *)vendor)[0] = regs[1]; // EBX
  ((unsigned *)vendor)[1] = regs[3]; // EDX
  ((unsigned *)vendor)[2] = regs[2]; // ECX
  string cpuVendor = string(vendor, 12);

  // Get CPU features
  cpuID(1, regs);
  unsigned cpuFeatures = regs[3]; // EDX

  // Logical core count per CPU
  cpuID(1, regs);
  unsigned logical = (regs[1] >> 16) & 0xff; // EBX[23:16]
  cout << " logical cpus: " << logical << endl;
  unsigned cores = logical;

  if (cpuVendor == "GenuineIntel") {
    // Get DCP cache info
    cpuID(4, regs);
    cores = ((regs[0] >> 26) & 0x3f) + 1; // EAX[31:26] + 1

  } else if (cpuVendor == "AuthenticAMD") {
    // Get NC: Number of CPU cores - 1
    cpuID(0x80000008, regs);
    cores = ((unsigned)(regs[2] & 0xff)) + 1; // ECX[7:0] + 1
  }

  cout << "    cpu cores: " << cores << endl;

  // Detect hyper-threads  
  bool hyperThreads = cpuFeatures & (1 << 28) && cores < logical;

  cout << "hyper-threads: " << (hyperThreads ? "true" : "false") << endl;

  return 0;
}

我还没有在 Windows 或 OSX 上实际测试过它,但它应该可以工作,因为 CPUID 指令在 i686 机器上有效。显然,这不适用于 PowerPC,但它们也没有超线程。

以下是几台不同 Intel 机器上的输出:

Intel(R) Core(TM)2 Duo CPU T7500 @ 2.20GHz:

 logical cpus: 2
    cpu cores: 2
hyper-threads: false

Intel(R) Core(TM)2 Quad CPU Q8400 @ 2.66GHz:

 logical cpus: 4
    cpu cores: 4
hyper-threads: false

Intel(R) Xeon(R) ) CPU E5520 @ 2.27GHz(带 x2 物理 CPU 包):

 logical cpus: 16
    cpu cores: 8
hyper-threads: true

Intel(R) Pentium(R) 4 CPU 3.00GHz:

 logical cpus: 2
    cpu cores: 1
hyper-threads: true

EDIT: This is no longer 100% correct due to Intel's ongoing befuddlement.

The way I understand the question is that you are asking how to detect the number of CPU cores vs. CPU threads which is different from detecting the number of logical and physical cores in a system. CPU cores are often not considered physical cores by the OS unless they have their own package or die. So an OS will report that a Core 2 Duo, for example, has 1 physical and 2 logical CPUs and an Intel P4 with hyper-threads will be reported exactly the same way even though 2 hyper-threads vs. 2 CPU cores is a very different thing performance wise.

I struggled with this until I pieced together the solution below, which I believe works for both AMD and Intel processors. As far as I know, and I could be wrong, AMD does not yet have CPU threads but they have provided a way to detect them that I assume will work on future AMD processors which may have CPU threads.

In short here are the steps using the CPUID instruction:

  1. Detect CPU vendor using CPUID function 0
  2. Check for HTT bit 28 in CPU features EDX from CPUID function 1
  3. Get the logical core count from EBX[23:16] from CPUID function 1
  4. Get actual non-threaded CPU core count
    1. If vendor == 'GenuineIntel' this is 1 plus EAX[31:26] from CPUID function 4
    2. If vendor == 'AuthenticAMD' this is 1 plus ECX[7:0] from CPUID function 0x80000008

Sounds difficult but here is a, hopefully, platform independent C++ program that does the trick:

#include <iostream>
#include <string>

using namespace std;


void cpuID(unsigned i, unsigned regs[4]) {
#ifdef _WIN32
  __cpuid((int *)regs, (int)i);

#else
  asm volatile
    ("cpuid" : "=a" (regs[0]), "=b" (regs[1]), "=c" (regs[2]), "=d" (regs[3])
     : "a" (i), "c" (0));
  // ECX is set to zero for CPUID function 4
#endif
}


int main(int argc, char *argv[]) {
  unsigned regs[4];

  // Get vendor
  char vendor[12];
  cpuID(0, regs);
  ((unsigned *)vendor)[0] = regs[1]; // EBX
  ((unsigned *)vendor)[1] = regs[3]; // EDX
  ((unsigned *)vendor)[2] = regs[2]; // ECX
  string cpuVendor = string(vendor, 12);

  // Get CPU features
  cpuID(1, regs);
  unsigned cpuFeatures = regs[3]; // EDX

  // Logical core count per CPU
  cpuID(1, regs);
  unsigned logical = (regs[1] >> 16) & 0xff; // EBX[23:16]
  cout << " logical cpus: " << logical << endl;
  unsigned cores = logical;

  if (cpuVendor == "GenuineIntel") {
    // Get DCP cache info
    cpuID(4, regs);
    cores = ((regs[0] >> 26) & 0x3f) + 1; // EAX[31:26] + 1

  } else if (cpuVendor == "AuthenticAMD") {
    // Get NC: Number of CPU cores - 1
    cpuID(0x80000008, regs);
    cores = ((unsigned)(regs[2] & 0xff)) + 1; // ECX[7:0] + 1
  }

  cout << "    cpu cores: " << cores << endl;

  // Detect hyper-threads  
  bool hyperThreads = cpuFeatures & (1 << 28) && cores < logical;

  cout << "hyper-threads: " << (hyperThreads ? "true" : "false") << endl;

  return 0;
}

I haven't actually tested this on Windows or OSX yet but it should work as the CPUID instruction is valid on i686 machines. Obviously, this wont work for PowerPC but then they don't have hyper-threads either.

Here is the output on a few different Intel machines:

Intel(R) Core(TM)2 Duo CPU T7500 @ 2.20GHz:

 logical cpus: 2
    cpu cores: 2
hyper-threads: false

Intel(R) Core(TM)2 Quad CPU Q8400 @ 2.66GHz:

 logical cpus: 4
    cpu cores: 4
hyper-threads: false

Intel(R) Xeon(R) CPU E5520 @ 2.27GHz (w/ x2 physical CPU packages):

 logical cpus: 16
    cpu cores: 8
hyper-threads: true

Intel(R) Pentium(R) 4 CPU 3.00GHz:

 logical cpus: 2
    cpu cores: 1
hyper-threads: true
∝单色的世界 2024-09-09 14:46:29

请注意,这里并没有给出预期的物理核心数量,而是逻辑核心数量。

如果您可以使用 C++11(感谢下面 alfC 的评论):

#include <iostream>
#include <thread>

int main() {
    std::cout << std::thread::hardware_concurrency() << std::endl;
    return 0;
}

否则,Boost 库可能是一个选项你。相同的代码但不同的包括如上所述。包含 而不是

Note this, does not give the number of physically cores as intended, but logical cores.

If you can use C++11 (thanks to alfC's comment beneath):

#include <iostream>
#include <thread>

int main() {
    std::cout << std::thread::hardware_concurrency() << std::endl;
    return 0;
}

Otherwise maybe the Boost library is an option for you. Same code but different include as above. Include <boost/thread.hpp> instead of <thread>.

素衣风尘叹 2024-09-09 14:46:29

此处描述的仅限 Windows 的解决方案:

GetLogicalProcessorInformation

for linux, /proc/cpuinfo 文件。我没有运行Linux
现在无法向您提供更多详细信息。你可以算一下
物理/逻辑处理器实例。如果逻辑计数
是物理的两倍,那么你就启用了 HT
(仅适用于 x86)。

Windows only solution desribed here:

GetLogicalProcessorInformation

for linux, /proc/cpuinfo file. I am not running linux
now so can't give you more detail. You can count
physical/logical processor instances. If logical count
is twice as physical, then you have HT enabled
(true only for x86).

寒冷纷飞旳雪 2024-09-09 14:46:29

当前使用 CPUID 得票最高的答案似乎已过时。它报告了错误的逻辑和物理处理器数量。这个答案似乎证实了这一点cpuid-on-intel-i7-processors

具体来说,使用 CPUID.1.EBX[23:16] 获取逻辑处理器或使用 CPUID.4.EAX[31:26]+1 获取具有 Intel 处理器的物理处理器,不会在任何 Intel 处理器上给出正确的结果有。

对于 Intel CPUID.Bh,应使用 Intel_thread/Fcore 和缓存拓扑。这个解决方案看起来并不简单。对于 AMD 来说,需要不同的解决方案。

以下是英特尔的源代码,它报告了正确的物理和逻辑核心数量以及正确的插槽数量https://software.intel.com/en-us/articles/intel-64-architecture-processor-topology-enumeration/。我在 80 个逻辑核心、40 个物理核心、4 插槽 Intel 系统上对此进行了测试。

这是 AMD 的源代码 http://developer .amd.com/resources/documentation-articles/articles-whitepapers/processor-and-core-enumeration-using-cpuid/。它在我的单插槽英特尔系统上给出了正确的结果,但在我的四插槽系统上却给出了正确的结果。我没有 AMD 系统可供测试。

我还没有剖析源代码来找到一个带有 CPUID 的简单答案(如果存在的话)。看来,如果解决方案可以改变(看起来确实如此),那么最好的解决方案是使用库或操作系统调用。

编辑:

这是适用于 CPUID leaf 11 (Bh) 的英特尔处理器的解决方案。执行此操作的方法是循环逻辑处理器,从 CPUID 获取每个逻辑处理器的 x2APIC ID,并在最低有效位为零时计算 x2APIC ID 的数量。对于没有超线程的系统,x2APIC ID 将始终是偶数。对于具有超线程的系统,每个 x2APIC ID 将具有偶数和奇数版本。

// input:  eax = functionnumber, ecx = 0
// output: eax = output[0], ebx = output[1], ecx = output[2], edx = output[3]
//static inline void cpuid (int output[4], int functionnumber)  

int getNumCores(void) {
    //Assuming an Intel processor with CPUID leaf 11
    int cores = 0;
    #pragma omp parallel reduction(+:cores)
    {
        int regs[4];
        cpuid(regs,11);
        if(!(regs[3]&1)) cores++; 
    }
    return cores;
}

必须绑定线程才能使其工作。 OpenMP 默认情况下不绑定线程。设置 export OMP_PROC_BIND=true 将绑定它们,或者可以将它们绑定在代码中,如 线程关联性-与-windows-msvc-and-openmp

我在 4 核/8 HT 系统上对此进行了测试,无论是否在 BIOS 中禁用了超线程,它都返回 4。我还在 4 插槽系统上进行了测试,每个插槽有 10 个核心/20 个 HT,它返回了 40 个核心。

AMD 处理器或没有 CPUID leaf 11 的较旧 Intel 处理器必须执行不同的操作。

The current highest voted answer using CPUID appears to be obsolete. It reports both the wrong number of logical and physical processors. This appears to be confirmed from this answer cpuid-on-intel-i7-processors.

Specifically, using CPUID.1.EBX[23:16] to get the logical processors or CPUID.4.EAX[31:26]+1 to get the physical ones with Intel processors does not give the correct result on any Intel processor I have.

For Intel CPUID.Bh should be used Intel_thread/Fcore and cache topology. The solution does not appear to be trivial. For AMD a different solution is necessary.

Here is source code by by Intel which reports the correct number of physical and logical cores as well as the correct number of sockets https://software.intel.com/en-us/articles/intel-64-architecture-processor-topology-enumeration/. I tested this on a 80 logical core, 40 physical core, 4 socket Intel system.

Here is source code for AMD http://developer.amd.com/resources/documentation-articles/articles-whitepapers/processor-and-core-enumeration-using-cpuid/. It gave the correct result on my single socket Intel system but not on my four socket system. I don't have a AMD system to test.

I have not dissected the source code yet to find a simple answer (if one exists) with CPUID. It seems that if the solution can change (as it seems to have) that the best solution is to use a library or OS call.

Edit:

Here is a solution for Intel processors with CPUID leaf 11 (Bh). The way to do this is loop over the logical processors and get the x2APIC ID for each logical processor from CPUID and count the number of x2APIC IDs were the least significant bit is zero. For systems without hyper-threading the x2APIC ID will always be even. For systems with hyper-threading each x2APIC ID will have an even and odd version.

// input:  eax = functionnumber, ecx = 0
// output: eax = output[0], ebx = output[1], ecx = output[2], edx = output[3]
//static inline void cpuid (int output[4], int functionnumber)  

int getNumCores(void) {
    //Assuming an Intel processor with CPUID leaf 11
    int cores = 0;
    #pragma omp parallel reduction(+:cores)
    {
        int regs[4];
        cpuid(regs,11);
        if(!(regs[3]&1)) cores++; 
    }
    return cores;
}

The threads must be bound for this to work. OpenMP by default does not bind threads. Setting export OMP_PROC_BIND=true will bind them or they can be bound in code as shown at thread-affinity-with-windows-msvc-and-openmp.

I tested this on my 4 core/8 HT system and it returned 4 with and without hyper-threading disabled in the BIOS. I also tested in on a 4 socket system with each socket having 10 cores / 20 HT and it returned 40 cores.

AMD processors or older Intel processors without CPUID leaf 11 have to do something different.

触ぅ动初心 2024-09-09 14:46:29

由于每个核心看似不必要地添加更多线程,并且这些线程在每个核心上分布不均匀,因此这不再有效。

从上述一些想法中收集想法和概念,我已经想出了这个解决方案。请大家批评指正。

//EDIT INCLUDES

#ifdef _WIN32
    #include <windows.h>
#elif MACOS
    #include <sys/param.h>
    #include <sys/sysctl.h>
#else
    #include <unistd.h>
#endif

对于几乎每个操作系统,标准的“获取核心计数”功能都会返回逻辑核心计数。但为了获得物理核心数,我们必须首先检测CPU是否具有超线程。

uint32_t registers[4];
unsigned logicalcpucount;
unsigned physicalcpucount;
#ifdef _WIN32
SYSTEM_INFO systeminfo;
GetSystemInfo( &systeminfo );

logicalcpucount = systeminfo.dwNumberOfProcessors;

#else
logicalcpucount = sysconf( _SC_NPROCESSORS_ONLN );
#endif

我们现在有了逻辑核心数,现在为了获得预期结果,我们首先必须检查超线程是否正在使用或者是否可用。

__asm__ __volatile__ ("cpuid " :
                      "=a" (registers[0]),
                      "=b" (registers[1]),
                      "=c" (registers[2]),
                      "=d" (registers[3])
                      : "a" (1), "c" (0));

unsigned CPUFeatureSet = registers[3];
bool hyperthreading = CPUFeatureSet & (1 << 28);

因为没有一款具有超线程功能的 Intel CPU 只能对一个核心进行超线程(至少我读到的不是这样)。这让我们发现这是一种真正无痛的方法。如果超线程可用,逻辑处理器将恰好是物理处理器的两倍。否则,操作系统将为每个核心检测一个逻辑处理器。这意味着逻辑和物理核心数量将相同。

if (hyperthreading){
    physicalcpucount = logicalcpucount / 2;
} else {
    physicalcpucount = logicalcpucount;
}

fprintf (stdout, "LOGICAL: %i\n", logicalcpucount);
fprintf (stdout, "PHYSICAL: %i\n", physicalcpucount);

Due to the seemingly needless addition of even more threads per core, and these thread being unevenly distributed on a per core basis, this is no longer valid.

From gathering ideas and concepts from some of the above ideas, I have come up with this solution. Please critique.

//EDIT INCLUDES

#ifdef _WIN32
    #include <windows.h>
#elif MACOS
    #include <sys/param.h>
    #include <sys/sysctl.h>
#else
    #include <unistd.h>
#endif

For almost every OS, the standard "Get core count" feature returns the logical core count. But in order to get the physical core count, we must first detect if the CPU has hyper threading or not.

uint32_t registers[4];
unsigned logicalcpucount;
unsigned physicalcpucount;
#ifdef _WIN32
SYSTEM_INFO systeminfo;
GetSystemInfo( &systeminfo );

logicalcpucount = systeminfo.dwNumberOfProcessors;

#else
logicalcpucount = sysconf( _SC_NPROCESSORS_ONLN );
#endif

We now have the logical core count, now in order to get the intended results, we first must check if hyper threading is being used or if it's even available.

__asm__ __volatile__ ("cpuid " :
                      "=a" (registers[0]),
                      "=b" (registers[1]),
                      "=c" (registers[2]),
                      "=d" (registers[3])
                      : "a" (1), "c" (0));

unsigned CPUFeatureSet = registers[3];
bool hyperthreading = CPUFeatureSet & (1 << 28);

Because there is not an Intel CPU with hyper threading that will only hyper thread one core (at least not from what I have read). This allows us to find this is a really painless way. If hyper threading is available,the logical processors will be exactly double the physical processors. Otherwise, the operating system will detect a logical processor for every single core. Meaning the logical and the physical core count will be identical.

if (hyperthreading){
    physicalcpucount = logicalcpucount / 2;
} else {
    physicalcpucount = logicalcpucount;
}

fprintf (stdout, "LOGICAL: %i\n", logicalcpucount);
fprintf (stdout, "PHYSICAL: %i\n", physicalcpucount);
黯淡〆 2024-09-09 14:46:29

根据数学的答案,从 boost 1.56 开始,存在physical_concurrency 属性,它完全可以满足您的需求。

来自文档 - http ://www.boost.org/doc/libs/1_56_0/doc/html/thread/thread_management.html#thread.thread_management.thread.physical_concurrency

当前系统上可用的物理核心数。与 hardware_concurrency() 相比,它不返回虚拟核心的数量,但仅计算物理核心的数量。

所以一个例子是

    #include <iostream>
    #include <boost/thread.hpp>

    int main()
    {
        std::cout << boost::thread::physical_concurrency();
        return 0;
    }

To follow on from math's answer, as of boost 1.56 there exists the physical_concurrency attribute which does exactly what you want.

From the documentation - http://www.boost.org/doc/libs/1_56_0/doc/html/thread/thread_management.html#thread.thread_management.thread.physical_concurrency

The number of physical cores available on the current system. In contrast to hardware_concurrency() it does not return the number of virtual cores, but it counts only physical cores.

So an example would be

    #include <iostream>
    #include <boost/thread.hpp>

    int main()
    {
        std::cout << boost::thread::physical_concurrency();
        return 0;
    }
烟燃烟灭 2024-09-09 14:46:29

我知道这是一个旧线程,但没有人提到hwloc。 hwloc 库可在大多数 Linux 发行版上使用,也可以在 Windows 上编译。以下代码将返回物理处理器的数量。 4 在 i7 CPU 的情况下。

#include <hwloc.h>

int nPhysicalProcessorCount = 0;

hwloc_topology_t sTopology;

if (hwloc_topology_init(&sTopology) == 0 &&
    hwloc_topology_load(sTopology) == 0)
{
    nPhysicalProcessorCount =
        hwloc_get_nbobjs_by_type(sTopology, HWLOC_OBJ_CORE);

    hwloc_topology_destroy(sTopology);
}

if (nPhysicalProcessorCount < 1)
{
#ifdef _OPENMP
    nPhysicalProcessorCount = omp_get_num_procs();
#else
    nPhysicalProcessorCount = 1;
#endif
}

I know this is an old thread, but no one mentioned hwloc. The hwloc library is available on most Linux distributions and can also be compiled on Windows. The following code will return the number of physical processors. 4 in the case of a i7 CPU.

#include <hwloc.h>

int nPhysicalProcessorCount = 0;

hwloc_topology_t sTopology;

if (hwloc_topology_init(&sTopology) == 0 &&
    hwloc_topology_load(sTopology) == 0)
{
    nPhysicalProcessorCount =
        hwloc_get_nbobjs_by_type(sTopology, HWLOC_OBJ_CORE);

    hwloc_topology_destroy(sTopology);
}

if (nPhysicalProcessorCount < 1)
{
#ifdef _OPENMP
    nPhysicalProcessorCount = omp_get_num_procs();
#else
    nPhysicalProcessorCount = 1;
#endif
}
酒与心事 2024-09-09 14:46:29

仅测试 Intel CPU 是否具有超线程是不够的,还需要测试超线程是否启用或禁用。没有记录的方法来检查这一点。一位 Intel 人员想出了这个技巧来检查是否启用了超线程:使用 CPUID[0xa].eax[15:8] 检查可编程性能计数器的数量,并假设如果该值为 8,则 HT 被禁用,如果值为 4,HT 已启用 (https://software .intel.com/en-us/forums/intel-isa-extensions/topic/831551)。

AMD 芯片上没有问题:CPUID 报告每个核心 1 或 2 个线程,具体取决于同时多线程是禁用还是启用。

您还必须将 CPUID 中的线程计数与操作系统报告的线程计数进行比较,以查看是否存在多个 CPU 芯片。

我创建了一个函数来实现这一切。它报告物理处理器的数量和逻辑处理器的数量。我已经在 Windows 和 Linux 中的 Intel 和 AMD 处理器上进行了测试。它应该也可以在 Mac 上运行。我已将这段代码发布于
https://github.com/vectorclass/add-on/tree/master/physical_processors

It is not sufficient to test if an Intel CPU has hyperthreading, you also need to test if hyperthreading is enabled or disabled. There is no documented way to check this. An Intel guy came up with this trick to check if hyperthreading is enabled: Check the number of programmable performance counters using CPUID[0xa].eax[15:8] and assume that if the value is 8, HT is disabled, and if the value is 4, HT is enabled (https://software.intel.com/en-us/forums/intel-isa-extensions/topic/831551).

There is no problem on AMD chips: The CPUID reports 1 or 2 threads per core depending on whether simultaneous multithreading is disabled or enabled.

You also have to compare the thread count from the CPUID with the thread count reported by the operating system to see if there are multiple CPU chips.

I have made a function that implements all of this. It reports both the number of physical processors and the number of logical processors. I have tested it on Intel and AMD processors in Windows and Linux. It should work on Mac as well. I have published this code at
https://github.com/vectorclass/add-on/tree/master/physical_processors

痞味浪人 2024-09-09 14:46:29

在 OS X 上,您可以从 sysctl(3)(C API 或同名的命令行实用程序)读取这些值。手册页应该为您提供使用信息。您可能会对以下键感兴趣:

$ sysctl hw
hw.ncpu: 24
hw.activecpu: 24
hw.physicalcpu: 12  <-- number of cores
hw.physicalcpu_max: 12
hw.logicalcpu: 24   <-- number of cores including hyper-threaded cores
hw.logicalcpu_max: 24
hw.packages: 2      <-- number of CPU packages
hw.ncpu = 24
hw.availcpu = 24

On OS X, you can read these values from sysctl(3) (the C API, or the command line utility of the same name). The man page should give you usage information. The following keys may be of interest:

$ sysctl hw
hw.ncpu: 24
hw.activecpu: 24
hw.physicalcpu: 12  <-- number of cores
hw.physicalcpu_max: 12
hw.logicalcpu: 24   <-- number of cores including hyper-threaded cores
hw.logicalcpu_max: 24
hw.packages: 2      <-- number of CPU packages
hw.ncpu = 24
hw.availcpu = 24
心房的律动 2024-09-09 14:46:29

在 Windows 上,GetLogicalProcessorInformationGetLogicalProcessorInformationEx 分别适用于 Windows XP SP3 或更早版本和 Windows 7+。不同之处在于 GetLogicalProcessorInformation 不支持超过 64 个逻辑核心的设置,这对于服务器设置可能很重要,但如果您使用的是 XP,则可以随时回退到 GetLogicalProcessorInformationGetLogicalProcessorInformationEx 的用法示例():

PSYSTEM_LOGICAL_PROCESSOR_INFORMATION_EX buffer = NULL;
PSYSTEM_LOGICAL_PROCESSOR_INFORMATION_EX ptr = NULL;
BOOL rc;
DWORD length = 0;
DWORD offset = 0;
DWORD ncpus = 0;
DWORD prev_processor_info_size = 0;
for (;;) {
    rc = psutil_GetLogicalProcessorInformationEx(
            RelationAll, buffer, &length);
    if (rc == FALSE) {
        if (GetLastError() == ERROR_INSUFFICIENT_BUFFER) {
            if (buffer) {
                free(buffer);
            }
            buffer = (PSYSTEM_LOGICAL_PROCESSOR_INFORMATION_EX)malloc(length);
            if (NULL == buffer) {
                return NULL;
            }
        }
        else {
            goto return_none;
        }
    }
    else {
        break;
    }
}
ptr = buffer;
while (offset < length) {
    // Advance ptr by the size of the previous
    // SYSTEM_LOGICAL_PROCESSOR_INFORMATION_EX struct.
    ptr = (SYSTEM_LOGICAL_PROCESSOR_INFORMATION_EX*)\
        (((char*)ptr) + prev_processor_info_size);

    if (ptr->Relationship == RelationProcessorCore) {
        ncpus += 1;
    }

    // When offset == length, we've reached the last processor
    // info struct in the buffer.
    offset += ptr->Size;
    prev_processor_info_size = ptr->Size;
}

free(buffer);
if (ncpus != 0) {
    return ncpus;
}
else {
    return NULL;
}

return_none:
if (buffer != NULL)
    free(buffer);
return NULL;

在 Linux 上,解析 /proc/cpuinfo 可能会有所帮助。

On Windows, there are GetLogicalProcessorInformation and GetLogicalProcessorInformationEx available for Windows XP SP3 or older and Windows 7+ respectively. The difference is that GetLogicalProcessorInformation doesn't support setups with more than 64 logical cores, which might be important for server setups, but you can always fall back to GetLogicalProcessorInformation if you're on XP. Example usage for GetLogicalProcessorInformationEx (source):

PSYSTEM_LOGICAL_PROCESSOR_INFORMATION_EX buffer = NULL;
PSYSTEM_LOGICAL_PROCESSOR_INFORMATION_EX ptr = NULL;
BOOL rc;
DWORD length = 0;
DWORD offset = 0;
DWORD ncpus = 0;
DWORD prev_processor_info_size = 0;
for (;;) {
    rc = psutil_GetLogicalProcessorInformationEx(
            RelationAll, buffer, &length);
    if (rc == FALSE) {
        if (GetLastError() == ERROR_INSUFFICIENT_BUFFER) {
            if (buffer) {
                free(buffer);
            }
            buffer = (PSYSTEM_LOGICAL_PROCESSOR_INFORMATION_EX)malloc(length);
            if (NULL == buffer) {
                return NULL;
            }
        }
        else {
            goto return_none;
        }
    }
    else {
        break;
    }
}
ptr = buffer;
while (offset < length) {
    // Advance ptr by the size of the previous
    // SYSTEM_LOGICAL_PROCESSOR_INFORMATION_EX struct.
    ptr = (SYSTEM_LOGICAL_PROCESSOR_INFORMATION_EX*)\
        (((char*)ptr) + prev_processor_info_size);

    if (ptr->Relationship == RelationProcessorCore) {
        ncpus += 1;
    }

    // When offset == length, we've reached the last processor
    // info struct in the buffer.
    offset += ptr->Size;
    prev_processor_info_size = ptr->Size;
}

free(buffer);
if (ncpus != 0) {
    return ncpus;
}
else {
    return NULL;
}

return_none:
if (buffer != NULL)
    free(buffer);
return NULL;

On Linux, parsing /proc/cpuinfo might help.

究竟谁懂我的在乎 2024-09-09 14:46:29

我不知道这三个都以相同的方式公开信息,但是如果您可以安全地假设 NT 内核将根据 POSIX 标准(NT 应该支持该标准)报告设备信息,那么您可以解决这个问题标准。

然而,设备管理的不同经常被认为是跨平台开发的绊脚石之一。我最多将其实现为三股逻辑,我不会尝试编写一段代码来均匀地处理所有平台。

好吧,所有这些都是假设 C++ 的。对于 ASM,我猜你只会在 x86 或 amd64 CPU 上运行?您仍然需要两条分支路径,每个架构各一条,并且您需要将 Intel 与 AMD (IIRC) 分开进行测试,但总的来说,您只需检查 CPUID。这就是你想要寻找的吗? Intel/AMD 系列 CPU 上 ASM 的 CPUID?

I don't know that all three expose the information in the same way, but if you can safely assume that the NT kernel will report device information according to the POSIX standard (which NT supposedly has support for), then you could work off that standard.

However, differing of device management is often cited as one of the stumbling blocks to cross platform development. I would at best implement this as three strands of logic, I wouldn't try to write one piece of code to handle all platforms evenly.

Ok, all that's assuming C++. For ASM, I presume you'll only be running on x86 or amd64 CPUs? You'll still need two branch paths, one for each architecture, and you'll need to test Intel separate from AMD (IIRC) but by and large you just check for the CPUID. Is that what you're trying to find? The CPUID from ASM on Intel/AMD family CPUs?

掩于岁月 2024-09-09 14:46:29

OpenMP 应该可以解决这个问题:

// test.cpp
#include <omp.h>
#include <iostream>

using namespace std;

int main(int argc, char** argv) {
  int nThreads = omp_get_max_threads();
  cout << "Can run as many as: " << nThreads << " threads." << endl;
}

大多数编译器都支持 OpenMP。如果您使用基于 gcc 的编译器(*nix、MacOS),则需要使用以下方式进行编译:(

$ g++ -fopenmp -o test.o test.cpp

您可能还需要告诉编译器使用 stdc++ 库):

$ g++ -fopenmp -o test.o -lstdc++ test.cpp

据我所知,OpenMP 旨在解决此类问题的问题。

OpenMP should do the trick:

// test.cpp
#include <omp.h>
#include <iostream>

using namespace std;

int main(int argc, char** argv) {
  int nThreads = omp_get_max_threads();
  cout << "Can run as many as: " << nThreads << " threads." << endl;
}

most compilers support OpenMP. If you are using a gcc-based compiler (*nix, MacOS), you need to compile using:

$ g++ -fopenmp -o test.o test.cpp

(you might also need to tell your compiler to use the stdc++ library):

$ g++ -fopenmp -o test.o -lstdc++ test.cpp

As far as I know OpenMP was designed to solve this kind of problems.

随遇而安 2024-09-09 14:46:29

这在 Python 中很容易做到:

$ python -c "import psutil; psutil.cpu_count(logical=False)"
4

也许你可以查看 psutil 源代码来看看发生了什么?

This is very easy to do in Python:

$ python -c "import psutil; psutil.cpu_count(logical=False)"
4

Maybe you could look at the psutil source code to see what is going on?

笨死的猪 2024-09-09 14:46:29

您可以使用库 libcpuid (也在 GitHub 上 - libcpuid)。

从其文档页面可以看出:

#include <stdio.h>
#include <libcpuid.h>

int main(void)
{
    if (!cpuid_present()) {                                                // check for CPUID presence
        printf("Sorry, your CPU doesn't support CPUID!\n");
        return -1;
    }

if (cpuid_get_raw_data(&raw) < 0) {                                    // obtain the raw CPUID data
        printf("Sorry, cannot get the CPUID raw data.\n");
        printf("Error: %s\n", cpuid_error());                          // cpuid_error() gives the last error description
        return -2;
}

if (cpu_identify(&raw, &data) < 0) {                                   // identify the CPU, using the given raw data.
        printf("Sorrry, CPU identification failed.\n");
        printf("Error: %s\n", cpuid_error());
        return -3;
}

printf("Found: %s CPU\n", data.vendor_str);                            // print out the vendor string (e.g. `GenuineIntel')
    printf("Processor model is `%s'\n", data.cpu_codename);                // print out the CPU code name (e.g. `Pentium 4 (Northwood)')
    printf("The full brand string is `%s'\n", data.brand_str);             // print out the CPU brand string
    printf("The processor has %dK L1 cache and %dK L2 cache\n",
        data.l1_data_cache, data.l2_cache);                            // print out cache size information
    printf("The processor has %d cores and %d logical processors\n",
        data.num_cores, data.num_logical_cpus);                        // print out CPU cores information

}

data.num_cores,保存CPU的物理核心数。

You may use the library libcpuid (Also on GitHub - libcpuid).

As can be seen in its documentation page:

#include <stdio.h>
#include <libcpuid.h>

int main(void)
{
    if (!cpuid_present()) {                                                // check for CPUID presence
        printf("Sorry, your CPU doesn't support CPUID!\n");
        return -1;
    }

if (cpuid_get_raw_data(&raw) < 0) {                                    // obtain the raw CPUID data
        printf("Sorry, cannot get the CPUID raw data.\n");
        printf("Error: %s\n", cpuid_error());                          // cpuid_error() gives the last error description
        return -2;
}

if (cpu_identify(&raw, &data) < 0) {                                   // identify the CPU, using the given raw data.
        printf("Sorrry, CPU identification failed.\n");
        printf("Error: %s\n", cpuid_error());
        return -3;
}

printf("Found: %s CPU\n", data.vendor_str);                            // print out the vendor string (e.g. `GenuineIntel')
    printf("Processor model is `%s'\n", data.cpu_codename);                // print out the CPU code name (e.g. `Pentium 4 (Northwood)')
    printf("The full brand string is `%s'\n", data.brand_str);             // print out the CPU brand string
    printf("The processor has %dK L1 cache and %dK L2 cache\n",
        data.l1_data_cache, data.l2_cache);                            // print out cache size information
    printf("The processor has %d cores and %d logical processors\n",
        data.num_cores, data.num_logical_cpus);                        // print out CPU cores information

}

As can be seen, data.num_cores, holds the number of Physical cores of the CPU.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文