我正在编写一个程序,需要递归搜索文件夹结构,并且希望与多个线程并行执行此操作。
我已经编写了相当简单的同步方法 - 最初将根目录添加到队列中,然后将目录出队,对其子目录进行排队等,直到队列为空。我将使用 ConcurrentQueue
对于我的队列,但已经意识到我的循环将过早停止。第一个线程将使根目录出列,并且立即所有其他线程都可以看到队列为空并退出,使第一个线程成为唯一正在运行的线程。我希望每个线程循环直到队列为空,然后等到另一个线程将更多目录排队,然后继续。我的循环中需要某种检查点,以便在每个线程都到达循环末尾之前,没有任何线程会退出,但我不确定当确实没有更多目录时,在不发生死锁的情况下执行此操作的最佳方法过程。
I'm writing a program which needs to recursively search through a folder structure, and would like to do so in parallel with several threads.
I've written the rather trivial synchronous method already - adding the root directory to the queue initially, then dequeuing a directory, queuing its subdirectories, etc., until the queue is empty. I'll use a ConcurrentQueue<T>
for my queue, but have already realized that my loops will stop prematurely. The first thread will dequeue the root directory, and immediately every other thread could see that the queue is empty and exit, leaving the first thread as the only one running. I would like each thread to loop until the queue is empty, then wait until another thread queues some more directories, and keep going. I need some sort of checkpoint in my loop so that none of the threads will exit until every thread has reached the end of the loop, but I'm not sure the best way to do this without deadlocking when there really are no more directories to process.
发布评论
评论(3)
使用任务并行库。
创建
任务
处理第一个文件夹。在此创建一个Task
来处理每个子文件夹(递归地),并为每个相关文件创建一个任务。然后等待此文件夹的所有任务。TPL 运行时将利用线程池来避免创建线程,这是一项昂贵的操作。对于小件工作。
注意:
1 据我了解,在TPL 在等待任务时(使用 TPL 方法)TPL 将重用该线程来执行其他任务,直到等待完成。
Use the Task Parallel Library.
Create a
Task
to process the first folder. In this create aTask
to process each subfolder (recursively) and a task for each relevant file. Then wait on all the tasks for this folder.The TPL runtime will make use of the thread pool avoiding creating threads, which is an expensive operation. for small pieces of work.
Note:
1 As I understand it, in the TPL when waiting on tasks—using a TPL method—TPL will reuse that thread for other tasks until the wait is fulfilled.
如果您想坚持显式队列的概念,请查看 BlockingCollection< /a> 类。方法 GetConsumingEnumerable() 返回一个 IEnumerable,当集合具有商品用完后,一旦有新商品可用,就会继续。这意味着只要集合为空,线程就会被阻塞,从而防止其过早停止。
但是:基本上这对于生产者-消费者场景非常有用。我不确定你的问题是否属于这一类。
If you want to stick to the concept of an explicit queue have a look on the BlockingCollection class. The method GetConsumingEnumerable() returns a IEnumerable which blocks, when the collection has run out of items and continues as soon new items are available. This means whenever the collection is empty the thread is blocked and thus prevents a premature stop of it.
However: Basically this is very useful for producer-consumer scenarios. I am not sure if your problem falls into this category.
在这种情况下,您最好的选择是创建一个线程来启动,然后每当您加载子目录时,您应该从线程池中分配线程来处理它们。允许线程在完成后退出,并在每次进一步进入目录时从池中调用新线程。这样就不会出现死锁,并且您的系统会根据需要使用线程。您甚至可以根据找到的文件夹数量指定启动多少个线程。
编辑:将上面的内容更改为更清楚,您不想显式创建新线程,而是希望利用线程池根据需要添加和删除线程,而不产生开销。
It would seem like in this case that your best bet would be to create one thread to start, then whenever you load sub-directories, you should task threads from the thread pool to handle them. Allow your threads to exit when they are done and call new ones from the pool every time you go one step further into the directories. This way there is no deadlock and your system uses threads as it needs them. You could even specify how many threads to start based upon how many folders were found.
Edit: Changed the above to be more clear that you don't want to explicitly create new threads but instead you want to take advantage of the thread pool to add and remove threads as needed without the overhead.