多线程资源争用
我正在分析一个以不同数量的允许线程运行的多线程程序。以下是相同输入工作的三次运行的性能结果。
1 thread:
Total thread time: 60 minutes.
Total wall clock time: 60 minutes.
10 threads:
Total thread time: 80 minutes. (Worked 33% longer)
Total wall clock time: 18 minutes. 3.3 times speed up
20 threads
Total thread time: 120 minutes. (Worked 100% longer)
Total wall clock time: 12 minutes. 5 times speed up
由于完成相同的工作需要更多的线程时间,因此我认为线程一定会争夺资源。
我已经检查了应用程序计算机和数据库服务器上的四大支柱(CPU、内存、磁盘IO、网络)。内存是最初的竞争资源,但现在已修复(始终有超过 1G 的可用空间)。在 20 线程测试中,CPU 徘徊在 30% 到 70% 之间,所以足够了。 diskIO 在应用程序计算机上几乎没有,在数据库服务器上也很少。网络真的很棒。
我还使用 redgate 进行了代码分析,发现没有等待锁的方法。线程不共享实例会有所帮助。现在我正在检查更细微的项目,例如数据库连接建立/池(如果 20 个线程尝试连接到同一个数据库,它们是否必须互相等待?)。
我正在尝试识别并解决资源争用问题,以便 20 个线程的运行将如下所示:
20 threads
Total thread time: 60 minutes. (Worked 0% longer)
Total wall clock time: 6 minutes. 10 times speed up
我应该查看哪些最有可能的来源(除了 4 个大的来源)以查找该争用?< WCF
每个线程执行的代码大致如下:
Run ~50 compiled LinqToSql queries
Run ILOG Rules
Call WCF Service which runs ~50 compiled LinqToSql queries, returns some data
Run more ILOG Rules
Call another WCF service which uses devexpress to render a pdf, returns as binary data
Store pdf to network
Use LinqToSql to update/insert. DTC is involved: multiple databases, one server.
服务运行在同一台计算机上,并且是无状态的,并且能够处理多个同时请求。
机器有8个cpu。
I'm profiling a multithreaded program running with different numbers of allowed threads. Here are the performance results of three runs of the same input work.
1 thread:
Total thread time: 60 minutes.
Total wall clock time: 60 minutes.
10 threads:
Total thread time: 80 minutes. (Worked 33% longer)
Total wall clock time: 18 minutes. 3.3 times speed up
20 threads
Total thread time: 120 minutes. (Worked 100% longer)
Total wall clock time: 12 minutes. 5 times speed up
Since it takes more thread time to do the same work, I feel the threads must be contending for resources.
I've already examined the four pillars (cpu, memory, diskIO, network) on both the app machine and the database server. Memory was the original contended resource, but that's fixed now (more than 1G free at all times). CPU hovers between 30% and 70% on the 20 thread test, so plenty there. diskIO is practically none on the app machine, and minimal on the database server. The network is really great.
I've also code-profiled with redgate and see no methods waiting on locks. It helps that the threads are not sharing instances. Now I'm checking more nuanced items like database connection establishing/pooling (if 20 threads attempt to connect to the same database, do they have to wait on each other?).
I'm trying identify and address the resource contention, so that the 20 thread run would look like this:
20 threads
Total thread time: 60 minutes. (Worked 0% longer)
Total wall clock time: 6 minutes. 10 times speed up
What are the most likely sources (other than the big 4) that I should be looking at to find that contention?
The code that each thread performs is approximately:
Run ~50 compiled LinqToSql queries
Run ILOG Rules
Call WCF Service which runs ~50 compiled LinqToSql queries, returns some data
Run more ILOG Rules
Call another WCF service which uses devexpress to render a pdf, returns as binary data
Store pdf to network
Use LinqToSql to update/insert. DTC is involved: multiple databases, one server.
The WCF Services are running on the same machine and are stateless and able to handle multiple simultaneous requests.
Machine has 8 cpu's.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
您所描述的是您想要 100% 的可扩展性,即线程的增加和 wallcklock 时间的减少之间存在 1:1 的关系...这通常是一个目标,但很难达到...
例如您写下没有内存争用,因为有 1 GB 可用空间...恕我直言,这是一个错误的假设...内存争用还意味着,如果两个线程尝试分配内存,则可能会出现一个线程必须等待另一个线程的情况。另一个需要记住的点是发生的中断。 GC 会暂时冻结所有线程...GC 可以通过配置(gcServer)进行一些自定义 - 请参阅 http://blogs.msdn.com/b/clyon/archive/2004/09/08/226981.aspx
另一点是 WCF 服务调用...如果它无法扩展 -例如PDF渲染-那么这也是一种争用形式...
可能的争用列表是“无止境的”...并且几乎总是在您提到的明显区域...
编辑-根据评论:
一些检查要点:
您使用什么提供商?它是如何配置的?
可能的争用将在您使用的库内的某个位置进行测量...
检查所有这些查询的执行计划...某些查询可能会采取任何类型的锁定,因此可能会创建争用数据库服务器端...
编辑2:
线程
这些线程是来自 ThreadPool 吗?如果是这样,那么您将无法扩展:-(
编辑 3:
ThreadPool 线程不适合长时间运行的任务,这就是您场景中的情况...有关详细信息,请参阅
来自http://www.yoda.arachsys.com/csharp/threads/printable.shtml
如果您想要极致的性能,那么可能值得查看 CQRS 和所描述的实际示例如 LMAX 。
What you describe is that you want a scalability of a 100% that is a 1:1 relation between the increase in thread s and the decrease in wallcklock time... this is usally a goal but hard to reach...
For example you write that there is no memory contention because there is 1 GB free... this is IMHO a wrong assumption... memory contention means also that if two threads try to allocate memory it could happen that one has to wait for the other... another ponint to keep in mind are the interruptions happening by GC which freezes all threads temporarily... the GC can be customzed a bit via configuration (gcServer) - see http://blogs.msdn.com/b/clyon/archive/2004/09/08/226981.aspx
Another point is the WCF service called... if it can't scale up -for example the PDF rendering- then that is also a form of contention for example...
The list of possible contention is "endless"... and hardly always on the obvious areas you mentioned...
EDIT - as per comments:
Some points to check:
what provider do you use ? how is it configured ?
possible contention would be measured somewhere inside the library you use...
Check the execution plans for all these queries... it can be that some take any sort of lock and thus possibly create a contention DB-server-side...
EDIT 2:
Threads
Are these threads from the ThreadPool ? If so then you won't scale :-(
EDIT 3:
ThreadPool threads are bad for long-running tasks which is the case in your scenario... for details see
From http://www.yoda.arachsys.com/csharp/threads/printable.shtml
If you want extreme performance then it could be worth checking out CQRS and the real-world example described as LMAX .
不要测量总线程时间,而是测量执行某种 I/O 操作(数据库、磁盘、网络等)的每个操作的时间。
我怀疑您会发现,当您有更多线程时,这些操作会花费更长的时间,这是因为争用发生在该 I/O 的另一端。例如,您的数据库可能正在序列化数据一致性请求。
Instead of measuring the total thread time, measure the time for each of the operations that you do that do I/O of some sort (database, disk, net, etc.).
I suspect you are going to find that these operations are the ones that take longer when you have more threads, and this is because the contention is on the other end of that I/O. For example, your database might be serializing requests for data consistency.
是的,存在资源争夺。所有线程都必须将数据读/写到相同的内存总线,例如定向到相同的 RAM 模块。有多少 RAM 可用并不重要,重要的是读/写操作由同一 RAM 模块上的同一内存控制器执行,并且数据通过同一总线传送。
如果任何地方存在任何类型的同步,那么这也是一个竞争资源。如果有任何 I/O,那就是资源竞争。
从 1 个线程到 N 个线程时,您永远不会看到 N 倍的加速。这是不可能的,因为最终,CPU 中的所有内容都是共享资源,并且会存在一定程度的争用。
有很多因素阻碍您获得完整的线性加速。您假设数据库、运行数据库的服务器、将其连接到客户端的网络、客户端计算机、两端的操作系统和驱动程序、内存子系统、磁盘 I/O 和一切<当您从 1 个线程增加到 20 个线程时,/em> 之间的速度可以提高 20 倍。
两个字:梦想。
每个瓶颈只需要让你的速度减慢几个百分点,那么总体结果就会像你所看到的那样。
我相信你可以调整它以更好地扩展,但不要指望奇迹。
但您可能会寻找的一件事是缓存行共享。线程是否访问与其他线程使用的数据非常接近的数据?您多久可以避免这种情况发生?
yes, there's resource contention. All the threads have to read/write data to the same memory bus, directed to the same RAM modules, for example. It doesn't matter how much RAM is free, it matters that the reads/writes are carried out by the same memory controller on the same RAM modules, and that the data is carried over the same bus.
If there's any kind of synchronization anywhere, then that too is a contended resource. If there's any I/O, that's a contended resource.
You're never going to see a N x speedup when going from 1 to N threads. It's not possible because ultimately, everything in the CPU is a shared resource on which there will be some degree of contention.
There are plenty of factors preventing you from getting the full linear speedup. You're assuming that the database, the server the database is running on, the network connecting it to the client, the client computer, the OS and drivers on both ends, the memory subsystem, disk I/O and everything in between is capable of just going 20 times faster when you go from 1 to 20 threads.
Two words: dream on.
Each of these bottlenecks only has to slow you down by a few percent, then the overall result will be something like what you're seeing.
I'm sure you can tweak it to scale a bit better, but don't expect miracles.
But one thing you might look for is cache line sharing. Do threads access data that is very close to the data used by other threads? How often can you avoid that occurring?