Linux 找出超线程核心 ID
今天早上我试图找出如何确定哪个处理器 ID 是超线程核心,但没有运气。
我希望找到这些信息并使用 set_affinity()
将进程绑定到超线程线程或非超线程线程以分析其性能。
I spent this morning trying to find out how to determine which processor id is the hyper-threaded core, but without luck.
I wish to find out this information and use set_affinity()
to bind a process to hyper-threaded thread or non-hyper-threaded thread to profile its performance.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(6)
我发现了做我需要做的事情的简单技巧。
如果第一个数字等于 CPU 编号(本例中为 0),则它是真正的核心,如果不是,则它是超线程核心。
真实核心示例:
超线程核心示例
第二个示例的输出与第一个示例完全相同。然而,我们正在检查
cpu13
,第一个数字是1
,所以CPU 13这是一个超线程核心。I discovered the simply trick to do what I need.
If the first number is equal to the CPU number (0 in this example) then it's a real core, if not it is a hyperthreading core.
Real core example:
Hyperthreading core example
The output of the second example is exactly the same as the first one. However we are checking
cpu13
, and the first number is1
, so CPU 13 this is an hyperthreading core.我很惊讶还没有人提到
lscpu
。以下是具有四个物理核心并启用超线程的单插槽系统的示例:输出解释了如何解释 ID 表;具有相同 Core ID 的逻辑 CPU ID 是同级的。
I'm surprised nobody has mentioned
lscpu
yet. Here's an example on a single-socket system with four physical cores and hyper-threading enabled:The output explains how to interpret the table of IDs; logical CPU IDs with the same Core ID are siblings.
HT是对称的(就基本资源而言,系统模式可能是不对称的)。
因此,如果HT打开,物理核心的大量资源将在两个线程之间共享。打开一些额外的硬件来保存两个线程的状态。两个线程都可以对称地访问物理核心。
禁用 HT 的核心和启用 HT 的核心之间存在差异;但启用 HT 的核心的第一半和启用 HT 的核心的第二半之间没有区别。
在某一时刻,一个 HT 线程可能会比其他线程使用更多的资源,但这种资源平衡是动态的。如果两个线程都想要使用相同的资源,CPU 将尽可能地平衡线程。您只能在一个线程中执行
rep nop
或pause
,让 CPU 将更多资源分配给其他线程。好吧,您实际上可以在不了解事实的情况下衡量性能。当系统中唯一的线程绑定到 CPU0 时,只需进行分析即可;当它绑定到CPU1时重复此操作。我认为,结果几乎是相同的(如果操作系统将一些中断绑定到 CPU0,它会产生噪音;因此在测试时尝试减少中断数量,并尝试使用 CPU2 和 CPU3(如果有的话)。
PS
Agner(他是 x86 领域的大师)建议使用偶数核心 如果您不想使用 HT,但它已在 BIOS 中启用:
PPS 关于新轮回 HT(不是 P4 的,而是 Nehalem 和 Sandy) - 基于 Agner 对微架构的研究
...
...
PPPS:Intel Optimization书籍列出了第二代HT中的资源共享:(第93页,此列表适用于nehalem,但Sandy部分中此列表没有变化)
buffer, large-page ITLB //我的评论:这个HW有2组
缓冲区、小页 ITLB 静态分配在两个逻辑之间
处理器。 // 我的评论:这个硬件只有一套;它被静态地分为两半的两个 HT 虚拟核心
高速缓存层次结构、填充缓冲区、DTLB0 和 STLB。 // 注释:单组,但不分成两半。 CPU会动态地重新分配资源。
两个逻辑处理器之间以确保公平性。 // 注释:有单个前端(指令解码器),因此线程将按顺序解码:1,2,1,2。
第112页也有图片(图2-13),显示两个逻辑核心是对称的。
线程在每个物理处理器的逻辑处理器上同时执行
处理器
单线程正在消耗执行资源;更高级别的资源
利用率可以带来更高的系统吞吐量
HT is symmetric (in terms of basic resources, the system-mode may be asymmetric).
So, if the HT is turned on, large resources of Physical core will be shared between two threads. Some additional hardware is turned on to save state of both threads. Both threads have symmetric access to physical core.
There is a difference between HT-disabled core and HT-enabled core; but no difference between 1st half of HT-enabled core and 2nd half of HT-enabled core.
At single moment of time, one HT-thread may use more resources than other, but this resource balancing is dynamic. CPU will balance threads as it can and as it wants if both threads want to use the same resource. You can only do a
rep nop
orpause
in one thread to let CPU give more resources to other thread.Okay, you actually can measure performance without knowing a fact. Just do a profile when the only thread in system is binded to CPU0; and repeat it when it is binded to CPU1. I think, the results will be almost the same (OS can generate noise if it binds some interrupts to CPU0; so try to lower number of interrupts when do testing and try to use CPU2 and CPU3 if you have such).
PS
Agner (he is the Guru in x86) recommends to use even-numbered cores in the case when you want not to use HT, but it is enabled in BIOS:
PPS About New-reincarnation HT (not a P4 one, but Nehalem and Sandy) - based on Agner's research on microarchitecture
...
...
PPPS: Intel Optimization book lists resource sharing in second-generation HT: (page 93, this list is for nehalem, but there is no changes of this list in Sandy section)
buffer, large-page ITLB //comment by me: there are 2 sets of this HW
buffers, small-page ITLB are statically allocated between two logical
processors. // comment by me: there is single set of this HW; it is statically splitted between two HT-virtual cores in two halfs
cache hierarchy, fill buffers, both DTLB0 and STLB. // comment: Single set, but divided not in half. CPU will dynamically redivide resources.
between two logical processors to ensure fairness. // comment: there is single Frontend (instruction decoder), so threads will be decoded in order: 1, 2, 1, 2.
There are also pictures at page 112 (Figure 2-13), which shows that both logical cores are symmetric.
threads to execute simultaneously on the logical processors in each physical
processor
single thread is consuming the execution resources; higher level of resource
utilization can lead to higher system throughput
OpenMPI 项目有通用 (Linux/Windows) 和便携式硬件拓扑检测器(内核、HT、cacahes、南桥和磁盘/网络连接局部性) -
hwloc
。你可以使用它,因为linux可能使用不同的HT核心编号规则,我们无法知道它是偶数/奇数还是y和y+8编号规则。hwloc主页:
http://www.open-mpi.org/projects/hwloc/
下载页面:
http://www.open-mpi.org/software/hwloc/v1。 10/
说明:
它有
lstopo
命令以图形形式获取硬件拓扑,例如或以文本形式:
我们可以将物理内核视为
Core L#x
,每个内核都有两个逻辑内核PU L#y
和PU L#y+8
。There is universal (Linux/Windows) and portable HW topology detector (cores, HT, cacahes, south bridges and disk/net connection locality) -
hwloc
by OpenMPI project. You may use it, because linux may use different HT core numbering rules, and we can't know will it be even/odd or y and y+8 nubering rule.Home page of hwloc:
http://www.open-mpi.org/projects/hwloc/
Download page:
http://www.open-mpi.org/software/hwloc/v1.10/
Description:
It has
lstopo
command to get hw topology in graphic form likeor in text form:
We can see physical cores as
Core L#x
each having two logical coresPU L#y
andPU L#y+8
.在 bash 中获取 cpu 核心的超线程同级的简单方法:
还有 lscpu -e 它将提供相关的核心和 cpu 信息:
Simple way to get hyperthreading siblings of cpu cores in bash:
There's also
lscpu -e
which will give relevant core and cpu info:我尝试通过比较核心温度和 HT 核心负载来验证该信息。
I tried to verify the information by comparing the temperature of the core and load on the HT core.