挂起和恢复线程(Windows、C)
我目前正在开发一个高度多线程的应用程序,处理大量的小批量数据。
它的问题是产生太多线程,这会大大减慢系统速度。为了避免这种情况,我有一个句柄表,它限制并发线程的数量。然后我“WaitForMultipleObjects”,当一个槽被释放时,我创建一个新线程,用它自己的数据批次来处理。
现在,我已经拥有了所需数量的线程(通常每个核心一个)。即便如此,多线程带来的负载也是非常明显的。原因是:数据批量很小,所以我不断创建新线程。
我当前实现的第一个想法是将作业重新组合成更长的序列列表。因此,当我创建一个新线程时,它在终止之前将有 128 或 512 个数据批次需要处理。它运作良好,但在某种程度上破坏了粒度。
我被要求寻找另一种情况:如果问题来自于过于频繁地“创建”线程,那么“暂停”它们、批量加载数据并“恢复”线程怎么样?
不幸的是,我不太成功。 问题是:当线程处于“挂起”模式时,“WaitForMultipleObjects”不会检测到它可用。事实上,我无法有效地区分活动线程和挂起线程。
所以我有两个问题:
如何检测“挂起的线程”,以便我可以将新数据加载到其中并恢复它?
这是个好主意吗?毕竟,“CreateThread”真的是一个资源大户吗?
编辑
经过多次测试,以下是我对线程池和 IO 完成端口的发现,两者均在本文中得到建议。
线程池使用旧版本“QueueUserWorkItem”进行测试。 IO Completion Port需要使用CreateIoCompletionPort、GetQueuedCompletionStatus和PostQueuedCompletionStatus;
1)首先是性能:创建许多线程的成本非常高,线程池和 io 完成端口都在避免这种成本方面做得很好。现在,我从之前的每批 512 个作业减少到每批 8 个作业,而且没有任何放缓。这是相当可观的。即使每批处理 1 个作业,性能影响也小于 5%。确实很了不起。
从性能的角度来看,QueueUserWorkItem 获胜,尽管差距很小(大约好 1%),几乎可以忽略不计。
2)关于使用简单性: 关于启动线程:毫无疑问,QueueUserWorkItem 是迄今为止最容易设置的。相比之下,IO 完成端口是重量级的。 关于结束线程:Win for IO Completion Port。 由于某些未知的原因,MS 没有在 C 中提供函数来了解 QueueUserWorkItem 的所有作业何时完成。它需要一些令人讨厌的技巧才能成功实现这个基本但关键的功能。如此缺乏功能是没有理由的。
3)关于资源控制:IO完成端口的大胜,它允许微调并发线程的数量,而QueueUserWorkItem则没有这样的控制,它会很高兴地使用所有可用的CPU周期核心。这本身可能会成为 QueueUserWorkItem 的一个障碍。 请注意,较新版本的完成端口似乎允许该控制,但仅在 Windows Vista 及更高版本上可用。
4)关于兼容性:IO Completion Port的小win,从Windows NT4开始可用。 QueueUserWorkItem 自 Windows 2000 以来才存在。不过,这已经足够好了。新版本的 Completion Port 不适用于 Windows XP。
可以猜到,我在这两种解决方案之间几乎紧密相连。他们都正确地满足了我的需求。 对于一般情况,我建议使用I/O Completion Port,主要用于资源控制。 另一方面,QueueUserWorkItem 更容易设置。非常遗憾的是,由于要求程序员单独处理作业结束检测,它失去了大部分简单性。
I'm currently developing a heavily multi-threaded application, dealing with lots of small data batch to process.
The problem with it is that too many threads are being spawns, which slows down the system considerably. In order to avoid that, I've got a table of Handles which limits the number of concurrent threads. Then I "WaitForMultipleObjects", and when one slot is being freed, I create a new thread, with its own data batch to handle.
Now, I've got as many threads as I want (typically, one per core). Even then, the load incurred by multi-threading is extremely sensible. The reason for this: the data batch is small, so I'm constantly creating new threads.
The first idea I'm currently implementing is simply to regroup jobs into longer serial lists. Therefore, when I'm creating a new thread, it will have 128 or 512 data batch to handle before being terminated. It works well, but somewhat destroys granularity.
I was asked to look for another scenario: if the problem comes from "creating" threads too often, what about "pausing" them, loading data batch and "resuming" the thread?
Unfortunately, I'm not too successful.
The problem is: when a thread is in "suspend" mode, "WaitForMultipleObjects" does not detect it as available. In fact, I can't efficiently distinguish between an active and suspended thread.
So I've got 2 questions:
How to detect "suspended thread", so that i can load new data into it and resume it?
Is it a good idea? After all, is "CreateThread" really a ressource hog?
Edit
After much testings, here are my findings concerning Thread Pooling and IO Completion Port, both advised in this post.
Thread Pooling is tested using the older version "QueueUserWorkItem".
IO Completion Port requires using CreateIoCompletionPort, GetQueuedCompletionStatus and PostQueuedCompletionStatus;
1) First on performance : Creating many threads is very costly, and both thread pooling and io completion ports are doing a great job to avoid that cost. I am now down to 8-jobs per batch, from an earlier 512-jobs per batch, with no slowdown. This is considerable. Even when going to 1-job per batch, performance impact is less than 5%. Truly remarkable.
From a performance standpoint, QueueUserWorkItem wins, albeit by such a small margin (about 1% better) that it is almost negligible.
2) On usage simplicity :
Regarding starting threads : No question, QueueUserWorkItem is by far the easiest to setup. IO Completion port is heavyweight in comparison.
Regarding ending threads : Win for IO Completion Port.
For some unknown reason, MS provides no function in C to know when all jobs are completed with QueueUserWorkItem. It requires some nasty tricks to successfully implement this basic but critical function. There is no excuse for such a lack of feature.
3) On resource control : Big win for IO Completion Port, which allows to finely tune the number of concurrent threads, while there is no such control with QueueUserWorkItem, which will happily spend all CPU cycles from all available cores. That, in itself, could be a deal breaker for QueueUserWorkItem.
Note that newer version of Completion Port seems to allow that control, but are only available on Windows Vista and later.
4) On compatibility : small win for IO Completion Port, which is available since Windows NT4. QueueUserWorkItem only exists since Windows 2000. This is however good enough. Newer version of Completion Port is a no-go for Windows XP.
As can be guessed, I'm pretty much tied between the 2 solutions. They both answer correctly to my needs.
For a general situation, I suggest I/O Completion Port, mostly for resource control.
On the other hand, QueueUserWorkItem is easier to setup. Quite a pity that it loses most of this simplicity on requiring the programmer to deal alone with end-of-jobs detection.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
不要实现自己的,请考虑使用 CreateThreadpool()。操作系统会为您完成工作,您不必担心是否正确。
Instead of implementing your own, consider using CreateThreadpool(). The OS will do the work for you, and you don't have to worry about getting it right.
是的,CreateThread 涉及相当多的开销。一种解决方案是使用线程池,QueueUserWorkItem。另一种方法是启动一组线程并让它们从线程安全队列中检索“作业项”。
Yes, there's a fair amount of overhead involved with CreateThread. One solution is to use a thread pool, QueueUserWorkItem. Another is to just start a set of threads and have them retrieve a 'job item' from a thread-safe queue.
如果您还想支持Windows XP,则不能使用CreateThreadpool——否则,如果Vista及更新版本就足够了,Windows线程池是最简单的方法。
如果需要 Windows XP 支持,请生成多个线程并将它们分配给 IO 完成端口,然后让每个线程阻塞在 GetQueuedCompletionStatus() 上。完成端口允许您将事件发布到端口,每个事件将恰好唤醒一个线程,并且它们非常高效。他们也使用后进先出策略来唤醒线程来保持缓存温暖。
无论如何,您永远都不会想要挂起线程。从来没有。阻止,等待,但不要暂停。
原因是,通过挂起,您会遇到您所描述的问题,此外,您还会创建死锁,例如,如果您的线程位于临界区或互斥体中。除了调试器之外,没有人需要挂起线程。
If you want to also support Windows XP, you cannot use CreateThreadpool -- otherwise, if Vista and newer is sufficient, Windows thread pools are the easiest way.
If Windows XP support is needed, spawn a number of threads and assign them to an IO completion port, then have each thread block on GetQueuedCompletionStatus(). Completion ports let you post events to the port which will wake exactly one thread per event, and they are very efficient. They use a LIFO strategy on waking threads to keep caches warm, too.
In any case, you will never want to suspend a thread. Never ever. Block, wait, but don't suspend.
The reason is that with suspend you get the problem that you describe, plus you will create deadlocks, e.g. if your thread is within a critical section or mutex. Aside from a debugger, nobody should ever need to suspend a thread.