当前位置：文江博客话题详情

挂起和恢复线程（Windows、C）

发布于 2024-10-20 20:17:19 字数 1752 浏览 1 评论 0原文

我目前正在开发一个高度多线程的应用程序，处理大量的小批量数据。

它的问题是产生太多线程，这会大大减慢系统速度。为了避免这种情况，我有一个句柄表，它限制并发线程的数量。然后我“WaitForMultipleObjects”，当一个槽被释放时，我创建一个新线程，用它自己的数据批次来处理。

现在，我已经拥有了所需数量的线程（通常每个核心一个）。即便如此，多线程带来的负载也是非常明显的。原因是：数据批量很小，所以我不断创建新线程。

我当前实现的第一个想法是将作业重新组合成更长的序列列表。因此，当我创建一个新线程时，它在终止之前将有 128 或 512 个数据批次需要处理。它运作良好，但在某种程度上破坏了粒度。

我被要求寻找另一种情况：如果问题来自于过于频繁地“创建”线程，那么“暂停”它们、批量加载数据并“恢复”线程怎么样？

不幸的是，我不太成功。问题是：当线程处于“挂起”模式时，“WaitForMultipleObjects”不会检测到它可用。事实上，我无法有效地区分活动线程和挂起线程。

所以我有两个问题：

如何检测“挂起的线程”，以便我可以将新数据加载到其中并恢复它？
这是个好主意吗？毕竟，“CreateThread”真的是一个资源大户吗？

编辑

经过多次测试，以下是我对线程池和 IO 完成端口的发现，两者均在本文中得到建议。

线程池使用旧版本“QueueUserWorkItem”进行测试。 IO Completion Port需要使用CreateIoCompletionPort、GetQueuedCompletionStatus和PostQueuedCompletionStatus；

1）首先是性能：创建许多线程的成本非常高，线程池和 io 完成端口都在避免这种成本方面做得很好。现在，我从之前的每批 512 个作业减少到每批 8 个作业，而且没有任何放缓。这是相当可观的。即使每批处理 1 个作业，性能影响也小于 5%。确实很了不起。

从性能的角度来看，QueueUserWorkItem 获胜，尽管差距很小（大约好 1%），几乎可以忽略不计。

2）关于使用简单性：关于启动线程：毫无疑问，QueueUserWorkItem 是迄今为止最容易设置的。相比之下，IO 完成端口是重量级的。关于结束线程：Win for IO Completion Port。由于某些未知的原因，MS 没有在 C 中提供函数来了解 QueueUserWorkItem 的所有作业何时完成。它需要一些令人讨厌的技巧才能成功实现这个基本但关键的功能。如此缺乏功能是没有理由的。

3）关于资源控制：IO完成端口的大胜，它允许微调并发线程的数量，而QueueUserWorkItem则没有这样的控制，它会很高兴地使用所有可用的CPU周期核心。这本身可能会成为 QueueUserWorkItem 的一个障碍。请注意，较新版本的完成端口似乎允许该控制，但仅在 Windows Vista 及更高版本上可用。

4）关于兼容性：IO Completion Port的小win，从Windows NT4开始可用。 QueueUserWorkItem 自 Windows 2000 以来才存在。不过，这已经足够好了。新版本的 Completion Port 不适用于 Windows XP。

可以猜到，我在这两种解决方案之间几乎紧密相连。他们都正确地满足了我的需求。对于一般情况，我建议使用I/O Completion Port，主要用于资源控制。另一方面，QueueUserWorkItem 更容易设置。非常遗憾的是，由于要求程序员单独处理作业结束检测，它失去了大部分简单性。

原文

I'm currently developing a heavily multi-threaded application, dealing with lots of small data batch to process.

The problem with it is that too many threads are being spawns, which slows down the system considerably. In order to avoid that, I've got a table of Handles which limits the number of concurrent threads. Then I "WaitForMultipleObjects", and when one slot is being freed, I create a new thread, with its own data batch to handle.

Now, I've got as many threads as I want (typically, one per core). Even then, the load incurred by multi-threading is extremely sensible. The reason for this: the data batch is small, so I'm constantly creating new threads.

The first idea I'm currently implementing is simply to regroup jobs into longer serial lists. Therefore, when I'm creating a new thread, it will have 128 or 512 data batch to handle before being terminated. It works well, but somewhat destroys granularity.

I was asked to look for another scenario: if the problem comes from "creating" threads too often, what about "pausing" them, loading data batch and "resuming" the thread?

Unfortunately, I'm not too successful.
The problem is: when a thread is in "suspend" mode, "WaitForMultipleObjects" does not detect it as available. In fact, I can't efficiently distinguish between an active and suspended thread.

So I've got 2 questions:

How to detect "suspended thread", so that i can load new data into it and resume it?
Is it a good idea? After all, is "CreateThread" really a ressource hog?

Edit

After much testings, here are my findings concerning Thread Pooling and IO Completion Port, both advised in this post.

Thread Pooling is tested using the older version "QueueUserWorkItem".
IO Completion Port requires using CreateIoCompletionPort, GetQueuedCompletionStatus and PostQueuedCompletionStatus;

1) First on performance : Creating many threads is very costly, and both thread pooling and io completion ports are doing a great job to avoid that cost. I am now down to 8-jobs per batch, from an earlier 512-jobs per batch, with no slowdown. This is considerable. Even when going to 1-job per batch, performance impact is less than 5%. Truly remarkable.

From a performance standpoint, QueueUserWorkItem wins, albeit by such a small margin (about 1% better) that it is almost negligible.

2) On usage simplicity :
Regarding starting threads : No question, QueueUserWorkItem is by far the easiest to setup. IO Completion port is heavyweight in comparison.
Regarding ending threads : Win for IO Completion Port.
For some unknown reason, MS provides no function in C to know when all jobs are completed with QueueUserWorkItem. It requires some nasty tricks to successfully implement this basic but critical function. There is no excuse for such a lack of feature.

3) On resource control : Big win for IO Completion Port, which allows to finely tune the number of concurrent threads, while there is no such control with QueueUserWorkItem, which will happily spend all CPU cycles from all available cores. That, in itself, could be a deal breaker for QueueUserWorkItem.
Note that newer version of Completion Port seems to allow that control, but are only available on Windows Vista and later.

4) On compatibility : small win for IO Completion Port, which is available since Windows NT4. QueueUserWorkItem only exists since Windows 2000. This is however good enough. Newer version of Completion Port is a no-go for Windows XP.

As can be guessed, I'm pretty much tied between the 2 solutions. They both answer correctly to my needs.
For a general situation, I suggest I/O Completion Port, mostly for resource control.
On the other hand, QueueUserWorkItem is easier to setup. Quite a pity that it loses most of this simplicity on requiring the programmer to deal alone with end-of-jobs detection.

分享到QQ

分享到微博