多处理器和性能
我在 .Net 服务方面遇到了一个非常奇怪的问题。
我开发了一个多线程 x64 windows 服务。
我在具有 8 核的 x64 服务器上测试了此服务。 表演太棒了!
现在,我将该服务移至生产服务器(x64 - 32 核)。 在测试过程中,我发现性能至少比测试服务器差 10 倍。
我检查了大量性能计数器,试图找到性能不佳的原因,但找不到问题所在。
可能是GC问题? 你曾经遇到过这样的问题吗?
先感谢您! 亚历山大
I'm facing a really strange problem with a .Net service.
I developed a multithreaded x64 windows service.
I tested this service in a x64 server with 8 cores. The performance was great!
Now I moved the service to a production server (x64 - 32 cores). During the tests I found out the performance is, at least, 10 times worst than in the test server.
I've checked loads of performance counters trying to find some reason for this poor performance, but I couldn't find a point.
Could be a GC problem? Have you ever faced a problem like this?
Thank you in advance!
Alexandre
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(7)
这是一个人们通常没有意识到的常见问题,因为很少有人有使用多 CPU 机器的经验。
基本问题是争用。
随着 CPU 数量的增加,所有共享数据结构中的争用也会增加。 对于较少的 CPU 数量,争用较少,而且拥有多个 CPU 可以提高性能。 随着 CPU 数量显着增加,争用开始淹没性能改进; 随着 CPU 数量变大,争用实际上开始将性能降低到低于较少数量 CPU 的性能。
您基本上面临着可扩展性问题的一方面。
但我不确定这个问题出在哪里; 在你的数据结构中,或者在操作系统的数据结构中。 您可以解决前者 - 无锁数据结构是一种出色的、高度可扩展的方法。 后者很困难,因为它本质上需要避免某些操作系统功能。
This is a common problem which people are generally unaware of, because very few people have experience on many-CPU machines.
The basic problem is contention.
As the CPU count increases, contention increases in all shared data structures. For low CPU counts, contention is low and the fact you have multiple CPUs improves performance. As the CPU count becomes significantly larger, contention begins to drown out your performance improvements; as the CPU count becomes large, contention actually starts reducing performance below that of a lower number of CPUs.
You are basically facing one of the aspects of the scalability problem.
I'm not sure however where this problem lies; in your data structures, or in the operating systems data structures. The former you can address - lock-free data structures are an excellent, highly scalable approach. The latter is difficult, since it essentially requires avoiding certain OS functionality.
有太多变量无法了解为什么一台机器比另一台机器慢。 32 核机器通常更专业,而八核机器可能只是双核四核机器。 是否有虚拟机或其他东西同时运行? 通常,对于如此多的核心,IO 带宽成为限制因素(即使 cpu 仍然有足够的带宽)。
首先,您可能应该在代码(或分析等)中添加大量计时器,以找出代码的哪一部分占用了最多时间。
性能故障排除101:瓶颈是什么(代码在哪里以及什么子系统(内存、磁盘、CPU))
There are way too many variables to know why one machine is slower than the other. 32 core machines are usually more specialized where an eight core could just be a dual proc quad core machine. Are there vm's or other things running at the same time? Usually with that many cores, IO bandwidth becomes the limiting factor (even if the cpu's still have plenty of bandwidth).
To start off, you should probably add lots of timers in your code (or profiling or whatever) to figure out what part of your code is taking up the most time.
Performance troublshooting 101: what is the bottleneck ( where in the code and what subsystem (memory, disk, cpu) )
这里有很多因素:
ETC
There are so many factors here:
etc
可能是内存或磁盘的差异? 如果存在瓶颈,您将无法获得额外处理能力的价值。 如果没有您的应用程序/配置的更多详细信息,无法真正判断。
Could it be down to differences in memory or the disk? If there were the bottleneck, you'd not get the value for the additional processing power. Can't really tell without more details of your application/configuration.
由于有这么多线程同时运行,您必须非常小心地解决线程相互争用以访问数据的问题。 阅读非阻塞同步。
With that many threads running concurrently, you're going to have to be really careful to get around issues of threads fighting with each other to access your data. Read up on Non-blocking synchronization.
您使用了多少个线程? 使用许多线程池线程可能会导致线程饥饿,从而使程序变慢。
一些文章:
http://www2.sys-con。 com/ITSG/virtualcd/Dotnet/archives/0112/gomez/index.html
http://codesith.blogspot.com/ 2007/03/thread-starvation-in-shared-thread-pool.html
(在其中搜索线程饥饿)
您可以使用 .net 分析器来查找瓶颈,这里有一个很好的免费分析器:
http://www.eqatec.com/tools/profiler
How many threads are you using? Using to many thread pool threads could cause thread starvation which would make your program slower.
Some articles:
http://www2.sys-con.com/ITSG/virtualcd/Dotnet/archives/0112/gomez/index.html
http://codesith.blogspot.com/2007/03/thread-starvation-in-shared-thread-pool.html
(search for thread starvation in them)
You could use a .net profiler to find your bottle necks, here are a good free one:
http://www.eqatec.com/tools/profiler
我同意布兰克的观点,这可能是某种形式的争论。 不幸的是,它可能很难追踪。 它可能位于您的应用程序代码、框架、操作系统或其某种组合中。 您的应用程序代码最有可能是罪魁祸首,因为 Microsoft 花费了大量精力来使 CLR 和操作系统能够在 32P 机器上扩展。
争用可能发生在某些热锁中,但也可能是某些处理器缓存行在 CPU 之间来回晃动。
你的衡量标准是差 10 倍吗? 吞吐量?
您是否尝试过用更少的 CPU 启动 32 进程的机器? 使用 boot.ini 或 BCDedit 中的 /NUMPROC 选项。
CPU 利用率是否达到 100%? 你的上下文切换率是多少? 与8P盒子相比如何?
I agree with Blank, it's likely to be some form of contention. It's likely to be very hard to track down, unfortunately. It could be in your application code, the framework, the OS, or some combination thereof. Your application code is the most likely culprit, since Microsoft has expended significant effort on making the CLR and the OS scale on 32P boxes.
The contention could be in some hot locks, but it could be that some processor cache lines are sloshing back and forth between CPUs.
What's your metric for 10x worse? Throughput?
Have you tried booting the 32-proc box with fewer CPUs? Use the /NUMPROC option in boot.ini or BCDedit.
Do you achieve 100% CPU utilization? What's your context switch rate like? And how does this compare to the 8P box?