我使用用 python/numpy/cython 编写的模拟。
由于我需要对许多模拟运行进行平均,因此我使用多处理模块来批量运行所有单独的模拟运行。
在办公室,我有一个带 HT 的 i7-920 工作站。我家里有一台 i5-560,没有。
我认为我可以在办公室中每批运行两倍的模拟实例,并将运行时间减少一半。令人惊讶的是,与我的家庭工作站上花费的时间相比,每个实例的运行时间增加了一倍。也就是说,在家中并行运行 3 个模拟实例大约需要 8 分钟,而在办公室运行 6 个实例大约需要 15 分钟。使用“cat /proc/cpuinfo”我验证了“siblings”= 8 和“cpu cores”= 4,因此启用了 HT。
我不知道任何“总运行时间守恒”定律(尽管从科学的角度来看它可能非常有趣:)),并且希望这里有人可能会对这个难题有所启发。
I use a simulation written in python/numpy/cython.
Since i need to average over many simulation runs i use the multiprocessing module to run all the individual simulation runs in batches.
At the office i have an i7-920 workstation with HT. At home i have an i5-560 without.
I thought i could run twice as many instances of the simulation in each batch in the office and cut my running time in half. Surprisingly, the run time of each individual instance was doubled compared to the time it take on my home workstation. That it, running 3 simulation instances in parallel at home would take, say 8 minutes, while running 6 instances at the office take about 15 minutes. Using 'cat /proc/cpuinfo' i verified 'siblings' = 8 and 'cpu cores' = 4, so HT is enabled.
I am not aware of any "conservation of total runtime" law (though from s scientific point of view it could quite interesting :) ), and hopping someone here might shed some light on this conundrum.
发布评论
评论(4)
超线程可能适合某些类型的工作负载。密集的数值计算不是其中之一 - 当您想要进行一些数字运算时,您最好关闭超线程。
超线程提供的是任务之间“自由的上下文切换”,但CPU只有这么多的执行单元。
在这种情况下,它可能会使事情变得更糟,因为操作系统无法知道哪些进程在单独的内核上运行(它们将在其中获得全部性能),以及哪些进程在同一内核上运行,只是在不同的“超线程”上。
(实际上,我敢打赌 Linux 内核可以提供一种方法来对此进行精细控制,但 Python 的多处理模块只会启动使用默认资源分配的额外进程)。
底线:如果可以的话,关闭 HT - 至少你会充分利用 4 个核心。
Hyperthreading may be good for some kinds of workload. Intense numeric computations is not one of these - when you want to do some number crunching you better turn off hyperthreading.
What hyperthreading gives one is "free context switching" between tasks, but the CPU has only so many execution units.
In this case, it can make things worse, because the O.S. can't know which processes are running on separate cores (where they'd get full performance), and which are on the same core, just on different "hyperthreads".
(Actually, I'd bet the Linux kernel can provide a way for one to have fine control over that, but Python's multiprocessing module will just launch extra-processes which will use default resource allocation).
Bottomline: turn HT off if you can - at least you will make full use of the 4 cores.
也许上下文切换会产生更多开销,这是由 6 个大规模计算进程和只有 4 个真正的核心造成的。如果进程竞争 cpu 资源,它们可能会使用低效的 cpu 缓存。
如果只启用 4 核而不是 6 核,结果会怎样?
Maybe the context switches produce more overhead, caused by 6 massivly calculating processes and only 4 real cores. If the processes compete for the cpu-ressources, they may use inefficient the cpu-caches.
If you only enable 4 instead of 6 core, what's the result?
其他人已经让您对这个问题有了深入的了解,我只是想通过链接这篇文章来做出贡献,该文章详细解释了 HT 的工作原理以及对多线程程序性能的影响:http://software.intel.com/en-us/articles/performance-insights-to-intel-hyper-threading-technology/
The others have pretty much given you an insight on the problem, I just want to contribute by linking this article that explains a bit more about how HT works and what are the implications for the performance of a multithreaded program: http://software.intel.com/en-us/articles/performance-insights-to-intel-hyper-threading-technology/
使用我的HP工作站(16核/cpu,使用超线程达到32个处理器),当我运行数值模拟时,打开超线程甚至破坏了python,错误代码是0x000005
这让我困惑了很长时间,直到我关掉HT,模拟效果很好!
也许你可以检查并比较 HT 开启和关闭的运行时间
with my HP workstation(16 cores/cpu,using hyper-threading comes to 32 processors), turning hyper-threading on even broke python when I run the numerical simulation,the error code is 0x000005
this puzzled me a long time until I turned HT off,and the simulation works well!
maybe you could check and compare the run-time for both HT is on and off