关闭 6 核 Intel Xeon 中的超线程

发布于 2024-09-25 20:14:16 字数 1086 浏览 0 评论 0原文

我们有一台 12 核 MacPro 来进行一些蒙特卡洛计算。其 Intel Xeon 处理器启用了超线程 (HT),因此实际上应该有 24 个进程并行运行才能充分利用它们。然而,我们的计算在 12x100% 上运行比在 24x50% 上运行效率更高,因此我们尝试通过系统首选项中的“处理器”窗格关闭超线程,以获得更高的性能。我们还可以通过以下方式关闭 HT

hwprefs -v cpu_ht=false

然后我们进行了一些测试,结果如下:

  1. 12 个并行任务同时运行(带或不带 HT),这令我们失望。
  2. 如果 HT 关闭,24 个并行任务会损失 20%(不是我们想象的 -50%)
  3. 当 HT 打开时,从 24 个任务切换到 12 个任务会降低效率 20%(同样令人惊讶)
  4. 当 HT 关闭时,从 24 个任务切换到 12 个任务不会降低效率不改变任何东西。

看来超线程只是降低了我们的计算性能,而且没有办法避免。我们用于计算的程序是用 Fortran 编写的,并使用 gfortran 进行编译。有没有办法提高这个硬件的效率?


更新:我们的蒙特卡罗计算 (MCC) 通常是分步骤完成的,以避免数据丢失以及由于其他原因(并不总是能够避免此类步骤)。在我们的例子中,每个步骤都包含许多持续时间可变的模拟。由于每个步骤都分为多个并行任务,因此它们的持续时间也不同。本质上,所有较快的任务都必须等到最慢的任务完成。这一事实迫使我们采取更大的步骤,由于平均而导致的时间偏差更小,因此处理器不会浪费时间等待。这就是我们使用 12*2.66 GHz 而不是 24*1.33 GHz 的动机。如果可以关闭 HT,那么通过从 24 个带 HT 的任务切换到 12 个不带 HT 的任务,我们可以获得大约 +10% 的性能。然而,测试表明我们损失了 20%。所以我的结论是计算效率低了30%。

对于测试,我使用了相当大的步骤,但通常步骤较短,因此效率变得更高。

还有一个原因 - 我们的一些计算需要 3-5 GB 内存,因此您可能会看到,拥有 12 个快速任务对我们来说是多么经济。我们正在努力实现共享内存,但这将是一个长期项目。因此我们需要找出如何让现有的硬件/软件尽可能快。

We got a 12-core MacPro to do some Monte Carlo calculations. Its Intel Xeon processors have Hyper-Threading (HT) enabled, so in fact there should be 24 processes running in parallel to make them fully utilized. However, our calcs are more efficient to run on 12x100% than 24x50%, so we tried to turn Hyper-Threading off via Processor pane in system preferences in order to get higher performance. One can also turn HT off by

hwprefs -v cpu_ht=false

Then we ran some tests and here is what we got:

  1. 12 parallel tasks run the same time w/ or w/o HT to our disappointment.
  2. 24 parallel tasks loose 20% if HT is off (not -50% as we thought)
  3. When HT is on, switching from 24 to 12 tasks decreases efficiency by 20% (also surprising)
  4. When HT is off, switching from 24 to 12 doesn't change anything.

It seems that Hyper-Threading just decreases performance for our calculations and there is no way to avoid it. The program we use for the calcs is written in Fortran and compiled with gfortran. Is there a way to make it more efficient with this piece of hardware?


Update: Our Monte Carlo calculations (MCC) are typically done in steps to avoid data loss and due to other reasons (it's not always possible to avoid such steps). In our case each step consists of many simulations with variable duration. Since each step is splited between a number of parallel tasks, they also have variable duration. Essentially, all faster tasks have to wait until the slowest is done. This fact forces us to make bigger steps, which finish with less deviation in time due to averaging, so processors do not waste their time on waiting. This is our motivation for having 12*2.66 GHz instead of 24*1.33 GHz. If it would be possible to turn HT off, then we would get about +10% performance by switching from 24 tasks w/ HT to 12 tasks w/o HT. However, the tests show that we loose 20%. So my conclusion is that the calculation is 30% as inefficient.

For the tests I used quite large steps, however usually steps are shorter, so efficiency becomes even further.

There is one more reason - some of our calculations require 3-5 GB of memory, so you probably see how economical it would be for us to have 12 fast tasks. We are working on implementing shared memory, but it's going to be a looong term project. Therefore we need to find out how to make the existing hardware/software as fast as possible.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(4

唱一曲作罢 2024-10-02 20:14:16

这更像是一个扩展的评论,而不是一个答案:

我不认为你的观察结果非常令人惊讶。超线程是一种穷人的并行化方法,它允许您在一个 CPU 上拥有 2 个待处理指令的管道。但它不提供额外的浮点或整数运算单元或更多寄存器;当一个管道无法为 ALU(或现在所谓的任何名称)提供数据时,另一个管道会在一两个时钟周期内激活。这与没有超线程的 CPU 上的情况形成鲜明对比,在没有超线程的 CPU 上,当指令管道停止时,必须在 CPU 恢复速度之前刷新并重新填充来自另一个进程的指令。

有关超线程的维基百科文章很好地解释了这一切。

如果您正在运行的负载中管道停顿完全同步,并且占程序组合总执行时间的主要部分,那么您可以通过从非超线程处理器转到超线程处理器来使程序速度加倍。

如果(这是一个很大的如果)您可以编写一个永远不会在指令管道中停滞的程序,那么超线程将不会提供任何好处(在执行加速方面)。您测量到的不是由于 HT 导致的加速(嗯,这是由于 HT 导致的加速,但您实际上并不想要这样),而是线程无法保持管道移动。

你要做的实际上是降低 HT 带来的加速!或者,您必须通过保持管道填充来提高 12 个进程(每个核心一个)的执行率。就我个人而言,我会在优化 12 核上的程序执行时关闭超线程。

玩得开心。

This is more of an extended comment than an answer:

I don't find your observations terrifically surprising. Hyper-threading is a poor-man's approach to parallelisation, it allows you to have 2 pipelines of pending instructions on one CPU. But it doesn't provide extra floating-point or integer arithmetic units or more registers; when one pipeline is unable to feed the ALU (or whatever it's called these days) the other pipeline is activated within a clock cycle or two. This contrasts with the situation on a CPU without hyperthreading where, when the instruction pipeline stalls, it has to be flushed and refilled with instructions from another process before the CPU gets back up to speed.

The Wikipedia article on hyperthreading explains all this rather well.

If you are running loads in which pipeline stalls are perfectly synchronised and represent a major part of the total execution time of your program mix, then you might double the speed of a program by going from an unhyperthreaded processor to a hyperthreaded processor.

IF (that's a big if) you could write a program which never stalled in the instruction pipeline then hyperthreading would provide no benefit (in terms of execution acceleration) whatsoever. What you have measured is not a speedup due to HT (well, it is a speedup due to HT but you don't actually want that) but the failure of your threads to keep the pipeline moving.

What you have to do is actually decrease the speedup due to HT ! Or, rather, you have to increase the execution rate of the 12 processes (one per core) by keeping the pipeline filled. Personally, I'd switch off hyperthreading while I optimised the program's execution on 12 cores.

Have fun.

禾厶谷欠 2024-10-02 20:14:16

我有点难以理解您对基准的描述。

让我们将 100% 定义为您在不执行 12 项任务的情况下设法完成的工作量。如果你能够在相同的时间内完成两倍的工作,我们称之为 200%。那么,您在其他三个框中放入的数字是多少?

编辑:用您的号码更新。

             without HT     with HT
12 tasks     100%           100%
24 tasks     100%           125%

因此,我的理解是,在禁用 HT 的情况下,线程基本上会暂停(例如,当它们等待来自内存或磁盘的数据时),因此它们实际上并不是在 2.66 GHz 上运行,而是在少一点。启用超线程后,CUP 会切换任务,而不是因这些短暂的间隙而暂停,因此使用的处理能力总量会上升。

I'm having a bit a of difficulty understanding your description of the benchmarks.

Lets define 100% to be the amount of work you manage to get done with 12 tasks and ht off. And if you were to be able to get twice as much done in the same period of time, we would call it 200%. So, what are the numbers that you would put in the other three boxes?

Edit: Updated with your numbers.

             without HT     with HT
12 tasks     100%           100%
24 tasks     100%           125%

So, my understanding is that with HT disabled, there are gaps of time while your threads are basically paused (such as when they are waiting for data from memory or from disk), so they don't actually run at 2.66 GHz, but a bit less. With hyperthreading enabled, the CUP switches tasks instead of pausing for these momentary gaps, so the total amount of processing power being used goes up.

温柔女人霸气范 2024-10-02 20:14:16

嗯,这意味着在 HT 开启的情况下,从 12 个任务切换到 24 个任务可以提高 20% 的效率!很好的基准测试!

另一方面,如果您的程序编写为每个线程只能处理单独的任务(而不是能够将单个任务分割成更小的块并同时进行),那么为了减少每个线程的延迟任务(从开始到结束)您只需在软件中将线程数限制为 12 即可。硬件 HT 开关可以保持在任一位置。

Well, that means that with HT on, switching from 12 tasks to 24 tasks increases efficiency by 20%! Good benchmarking!

On the other hand, if your program is written so that each thread can only work on a separate task (as opposed to being able to split a single task into smaller chunks and proceed concurrently), then for the purpose of reducing the latency for each task (from start to finish) you simply need to limit the number of threads to 12 in software. The hardware HT switch can remain in either position.

缱绻入梦 2024-10-02 20:14:16

请参阅这篇文章,了解 Xcode 工具中的应用程序,以启用/禁用超线程(以及活动的 CPU 数量)。该设置在睡眠或重新启动后不会持续存在: http://www .logicprohelp.com/forum/viewtopic.php?f=5&t=88835

(运行 Instruments 应用程序,取消初始屏幕,然后更改 CPU 首选项)。

See this posting for an app in Xcode tools to enable / disable hyperthreading (and number of CPUs active). The setting does NOT persist across sleep or reboot: http://www.logicprohelp.com/forum/viewtopic.php?f=5&t=88835

(You run the Instruments app, cancel the initial screen, and then change the CPU Preferences).

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文