多线程:限制并发线程数
我需要开发一个使用多线程的应用程序。
基本上,我有一个包含大约 200k 行的 DataTable。 我需要从每一行中取出一个字段,将其与网页进行比较, 然后将其从数据表中删除。
问题是,为这些页面提供服务的服务器对并发请求有限制。 所以我最多可以同时请求 3 页。
我想通过使用线程池来做到这一点, 我什至设法构建了一个简单的应用程序来做到这一点(锁定数据表) 但我无法限制并发线程(即使使用 SetMaxThreads ),它似乎只是忽略了限制。
有人有现成的东西可以做类似的事情吗? 我很想看看。
我尝试过使用信号量,但遇到了问题:
static SemaphoreSlim _sem = new SemaphoreSlim(3); // Capacity of 3
static List<string> records = new List<string>();
static void Main()
{
records.Add("aaa");
records.Add("bbb");
records.Add("ccc");
records.Add("ddd");
records.Add("eee");
records.Add("fff");
records.Add("ggg");
records.Add("iii");
records.Add("jjj");
for (int i = 0; i < records.Count; i++ )
{
new Thread(ThreadJob).Start(records[i]);
}
Console.WriteLine(records.Count);
Console.ReadLine();
}
static void ThreadJob(object id)
{
Console.WriteLine(id + " wants to enter");
_sem.Wait();
Console.WriteLine(id + " is in!"); // Only three threads
//Thread.Sleep(1000 * (int)id); // can be here at
Console.WriteLine(id + " is leaving"); // a time.
lock (records)
{
records.Remove((string)id);
}
_sem.Release();
}
运行得很好,唯一的问题是
Console.WriteLine(records.count);
返回不同的结果。 即使我知道这种情况发生,因为并非所有线程都已完成(在删除所有记录之前,我正在调用records.count),我找不到如何等待所有线程完成。
I need to develop an app that is using multithreading.
Basicly, I have a DataTable that contains around 200k rows.
From each row, I need to take a field, compare it to a webpage,
and then remove it from the datatable.
The thing is, the server serving those pages has a limit on concurrent requests.
so at max I can ask for 3 pages at the same time.
I want to do this by using the threadpool,
I even managed building a simple app that does that ( locks the datatable )
but I couldn't limit the concurrent threads ( even with SetMaxThreads ) it seems like it just ignored the limit.
does anyone have something ready made that does something similar ?
I would love to see.
i have tried using semaphores, but got into problems:
static SemaphoreSlim _sem = new SemaphoreSlim(3); // Capacity of 3
static List<string> records = new List<string>();
static void Main()
{
records.Add("aaa");
records.Add("bbb");
records.Add("ccc");
records.Add("ddd");
records.Add("eee");
records.Add("fff");
records.Add("ggg");
records.Add("iii");
records.Add("jjj");
for (int i = 0; i < records.Count; i++ )
{
new Thread(ThreadJob).Start(records[i]);
}
Console.WriteLine(records.Count);
Console.ReadLine();
}
static void ThreadJob(object id)
{
Console.WriteLine(id + " wants to enter");
_sem.Wait();
Console.WriteLine(id + " is in!"); // Only three threads
//Thread.Sleep(1000 * (int)id); // can be here at
Console.WriteLine(id + " is leaving"); // a time.
lock (records)
{
records.Remove((string)id);
}
_sem.Release();
}
this runs quite nicely, the only problem is,
Console.WriteLine(records.count);
returns diffrent results.
even due i understand that it happens since not all the threads have finished ( an i a m calling the records.count before all records have been removed) i couldnt find how to wait for all to finish.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(4)
要等待多个线程完成,您可以使用多个
EventWaitHandle
,然后调用WaitHandle.WaitAll
阻塞主线程,直到所有事件都发出信号:由于大多数这些线程在大多数情况下都会挂起,因此最好在这种情况下使用
ThreadPool
,因此您可以将new Thread
替换为:当您完成事件时,不要忘记处理它们:
[编辑]
为了将所有内容放在一个地方,您的示例代码应如下所示:(
请注意,我使用了
Semaphore
而不是SemaphoreSlim
,因为我不这台机器上没有 .NET 4,我想在更新答案之前测试代码)To wait for multiple threads to finish, you can use multiple
EventWaitHandle
's and then callWaitHandle.WaitAll
to block the main thread until all events are signalled:Since most of these threads would end up suspended most of the time, it would be better to use
ThreadPool
in this case, so you can replacenew Thread
with:When you are done with the events, don't forget to Dispose them:
[Edit]
To put it all in one place, here is what your example code should look like:
(note that I've used
Semaphore
instead ofSemaphoreSlim
because I don't have .NET 4 on this machine and I wanted to test the code before updating the answer)为什么不使用并行扩展 - 这会让事情变得更容易。
不管怎样,你可能想看看信号量之类的东西。一两个月前我写了一篇关于这个主题的博客文章,您可能会觉得有用:https://colinmackay.scot/2011/03/30/using-semaphores-to-restrict-access-to-resources/
Why not use the Parallel Extensions - That would make things a lot easier.
Anyway, what you probably want to look at is something like Semaphores. I wrote a blog post on this subject a month or two back that you might find useful: https://colinmackay.scot/2011/03/30/using-semaphores-to-restrict-access-to-resources/
你可以使用
Semaphore(如果您使用的是
.net 3.5)
.
net 4.0 中的 SemaphoreSlim代码>
you can use
Semaphore if you are under
.net 3.5
SemaphoreSlim in .
net 4.0
首先,应该
Console.WriteLine(id + "要离开");
不是晚一点,在锁定之后、释放信号量之前吗?
至于实际等待所有线程完成,从长远来看,Groo 的答案看起来更好、更稳健,但作为此特定代码段的更快/更简单的解决方案,我认为您也可以只调用 .按顺序对您想要等待的所有线程进行 Join() 。
然后在启动线程时,将当前的新线程行替换为:
然后在 Console.WriteLine 之前:
如果任何线程没有终止,并且如果您想知道哪个线程还没有终止,则这将锁定还没完,这个方法不行。
First, should
Console.WriteLine(id + " is leaving");
not be a bit later, after the lock and just before it releases the semaphore?
As to the actual waiting for all of the threads to finish, Groo's answer looks better and more robust in the long term, but as a quicker/simpler solution to this specific piece of code, I think you could also get away with just calling .Join() on all of the threads you want to wait for, in sequence.
then when starting the threads, replace the current new Thread line with:
and then just before the Console.WriteLine:
This will lock up if any of the threads don't terminate though, and if you ever want to know -which- threads haven't finished, this method won't work.