如果 Xeon X5355 上的线程数超过 2,性能会下降

发布于 2024-08-21 17:52:44 字数 778 浏览 4 评论 0原文

我有一个奇怪的问题,但对你们中的一些人来说可能并不那么奇怪。

我正在编写一个使用升压线程并使用升压屏障来同步线程的应用程序。我有两台机器来测试该应用程序。

机器 1 是一台 core2 duo (T8300) cpu 机器(windows XP professional - 4GB RAM),我得到以下性能数据:

线程数:1,TPS:21

线程数: 2、TPS:35(提高 66%)

线程数量的进一步增加会降低 TPS,但这是可以理解的,因为机器只有两个核心。

机器 2 是一台 2 个四核 (Xeon X5355) CPU 机器(带有 4GB RAM 的 Windows 2003 服务器),具有 8 个有效核心。

线程数:1,TPS:21

线程数:2,TPS:27(提升 28%)

线程数:4,TPS:25< /strong>

线程数:8,TPS:24

如您所见,在 2 个线程之后性能会下降(尽管它有 8 个核心)。如果程序有一些瓶颈,那么对于 2 线程来说它也应该降级。

有什么想法吗? ,解释? ,操作系统对性能有影响吗? - Core2duo (2.4GHz) 的扩展性似乎比 Xeon X5355 (2.66GHz) 更好,尽管它具有更好的时钟速度。

谢谢你

-Zoolii

I have a strange problem but may not be that much strange to some of you.

I am writing an application using boost threads and using boost barriers to synchronize the threads. I have two machines to test the application.

Machine 1 is a core2 duo (T8300) cpu machine (windows XP professional - 4GB RAM) where I am getting following performance figures :

Number of threads :1 , TPS :21

Number of threads :2 , TPS :35 (66 % improvement)

further increase in number of threads decreases the TPS but that is understandable as the machine has only two cores.

Machine 2 is a 2 quad core ( Xeon X5355) cpu machine (windows 2003 server with 4GB RAM) and has 8 effective cores.

Number of threads :1 , TPS :21

Number of threads :2 , TPS :27 (28 % improvement)

Number of threads :4 , TPS :25

Number of threads :8 , TPS :24

As you can see, performance is degrading after 2 threads (though it has 8 cores). If the program has some bottle neck , then for 2 thread also it should have degraded.

Any idea? , Explanations ? , Does the OS has some role in performance ? - It seems like the Core2duo (2.4GHz) scales better than Xeon X5355 (2.66GHz) though it has better clock speed.

Thank you

-Zoolii

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

夏の忆 2024-08-28 17:52:44

时钟速度和操作系统与代码编写方式的关系不大。要检查的事情可能包括:

  • 您实际上是否同时启动了两个以上的线程?
  • 您的代码中是否有不必要的同步工件?
  • 您是否在适当的位置同步代码?
  • 您的共享资源是什么?有多少?如果您的每个事务都依赖于一段代码、本机库、文件、数据库等,那么您拥有多少个 CPU 并不重要。

分析软件瓶颈时可以使用的一种工具是简单的线程转储。在软件执行的整个生命周期中进行几次转储应该会暴露软件中的瓶颈。您也许可以获取该输出并使用它来重新评估您的代码。

The clock speed and the operating system doesn't have as much to do with it as the way your code is written. Things to check might include:

  • Are you actually spinning up more than two threads at one time?
  • Do you have unnecessary synchronization artifacts in your code?
  • Are you synchronizing your code at the appropriate places?
  • What is your shareable resource and how many of then are there? If each of your transactions is relying on a single section of code, native library, file, database, whatever, then it doesn't matter how many CPUs you've got.

One tool at your disposal when analyzing software bottlenecks is the simple thread dump. Taking a few dumps throughout the life of an execution of your software should expose bottlenecks in your software. You may be able to take that output and use it to reevaluate your code.

旧城空念 2024-08-28 17:52:44

添加更多 CPU 并不总是意味着更好的性能,锁定和争用会严重降低性能。需要考虑的因素是:

  • 您的算法是否适合并行化?
  • 代码有任何固有的顺序部分吗?
  • 您可以将工作划分为粗粒度的“块”吗? Corase 通常比细粒度更好...
  • 您可以更改代码以使用更少的锁定吗?
  • 通过确保工作块的大小相似,通常可以减少同步开销。

Adding more CPU's does not always equate to better performance, locking and contention can severely degrade performance. Factors to consider are:

  • Is your algorithm suited to parallelisation?
  • Any inherently sequential portions of code?
  • Can you partition work into coarse grained 'chunks'? Corase is usually better than fine grained...
  • Can you alter your code to use less locking?
  • Synchronisation overheads can often be reduced by ensuring chunks of work are similiar sized.
昇り龍 2024-08-28 17:52:44

根据经验,英特尔的政策可能是仅在该处理器上使用 2 线程或双进程,该版本的操作系统只能使用 pthreads,这两个处理器的设计符合具有不同规定的不同法律,或者允许,不允许自己的线程进程,处理器正在回退超过 n 个线程,并且报告此情况的错误消息的处理会减慢两个内核的吞吐量,并可能导致内核 3 和 4 停用。

Based on experience it could be that the Intel policy is 2 threads or dual-process only on that processor, that only pthreads can be used with that version of operating system, that the two processors were designed to conform to different laws with different provisions or allows, that the own thread process is not allowed, that more than n threads are being backed-out by the processor and the processing of error messages reporting this is slowing down throughput of the two cores and may lead to deactivate of cores 3 and 4.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文