我需要一个简短的 C 程序,该程序在具有超线程的处理器上运行速度比在没有超线程的处理器上运行慢
我想写一篇关于 HyperTreading 编译器优化的论文。第一步是调查为什么具有超线程(同时多线程)的处理器可能会导致比不具有该技术的处理器更差的性能。第一步是找到一个在没有超线程的情况下更好的应用程序,这样我就可以在它上面运行一些硬件性能计数器。关于如何或在哪里可以找到一个有什么建议吗?
所以,总结一下。我知道超线程的好处在 -10% 到 +30% 之间。我需要一个性能损失 10% 的 C 应用程序。
谢谢。
I want to write a paper with Compiler Optimizations for HyperTreading. First step would be to investigate why a processor with HyperThreading( Simultaneous Multithreading) could lead to poorer performances than a processor without this technology. First step is to find an application that is better without HyperThreading, so i can run some hardware performance counters on it. Any suggest on how or where i could find one?
So, to summarize. I know that HyperThreading benefits are between -10% and +30%. I need a C application that falls in the 10% performance penalty.
Thanks.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
超线程的主要缺点可能是缓存大小的有效减半。每个线程都将填充缓存,因此每个线程实际上拥有一半的缓存。
要创建一个使用超线程运行得比不使用超线程更差的程序,请创建一个单线程程序来执行刚好适合 L1 缓存的任务。然后添加第二个线程,它分担工作负载,即数据“另一端”的工作。您会发现性能下降 - 这是因为两个线程现在都必须访问 L2 缓存。
超线程可以显着提高或降低性能。它完全取决于使用情况。 -10%/+30% 的东西都不是——这太荒谬了。
Probably the main drawback of hyperthreading is the effective halving of cache sizes. Each thread will be populating the cache, and so each, in effect, has half the cache.
To create a programme which runs worse with hyperthreading than without, create a single threaded programme which performs a task which just fits inside L1 cache. Then add a second thread, which shares the workload, the works from "the other end" of the data, as it were. You will find performance falls through the floor - this is because both threads now must access L2 cache.
Hyperthreading can dramatically improve or worsen performance. It is completely dependent on use. None of this -10%/+30% stuff - that's ridiculous.
我不熟悉 HT 的编译器优化,也不熟悉 David 指出的 i7 HT 和 P4 之间的区别。但是,您可以预期一些一般行为。
上下文切换的开销非常大。因此,如果您有一个核心并同时在其上运行两个线程,那么在一个线程与另一个线程之间来回切换总是会给您带来性能损失。然而,线程并不总是使用核心。例如,如果线程读取或写入内存,它只是等待内存访问完成,而不使用核心,通常会超过100个周期。还有许多其他情况下线程需要像这样停止,例如 I/O 操作、数据依赖等。此时 HT 会有所帮助,因为它可以移出等待(或阻塞)的线程,并执行另一个线程。
因此,你可以想,如果所有线程真的不太可能被阻塞,那么上下文切换只会造成开销。考虑一下处理一小组数据的计算受限的应用程序。
I'm not familiar with compiler optimizations for HT, nor the different between i7 HT and P4's as David pointed out. However, you can expect some general behaviors.
Context switching is very expensive. So if you have one core and run two threads on it simultaneously, switching back and forth one thread from the other always gives you performance penalty. However, threads do not use the core all the time. For example, if the thread reads or writes memory, it just waits for the memory access to be done, without using the core, usually for more than 100 cycles. There are many other cases that a thread need to stall like this, e.g., I/O operations, data dependencies, etc. Here HT helps, because it can ships out the waiting (or blocked) thread, and executes another thread instead.
Therefore, you can think if all threads are really unlikely to be blocked, then context switching will only cause overhead. Think about very computation-bounded application working on a small set of data.