使用 TPL 的并行 I/O

发布于 2024-12-28 05:46:41 字数 475 浏览 5 评论 0原文

假设有一个文档 ID 列表,我想从 Web 服务检索文档。我是 TPL 的新手,对一些我未能通过 google 搜索到的最佳实践感兴趣。

我是否正确地认为 PLINQ 的 AsParallel() 在这里不适合,因为它将对源 ID 列表进行分区,从而在单个分区中一一检索文档?

我应该使用 LINQ 的 Select() 方法将列表转换为 Task 列表,然后对其进行 WaitAll() 吗?

Parallel 类和 AsParallel() 扩展方法都在底层使用了 Task,不是吗?是否可以将本地状态传递给委托,就像我将其传递给 Task(Action, Object) 重载一样?

Say there is a list of document IDs and i want to retrieve the documents from a web service. I'm a newbie with TPL and interested with some best practices i failed to google.

Am i correct that PLINQ's AsParallel() is not suitable here as it will partition the source ID list thus retrieving documents in a single partition one by one?

Should i use LINQ's Select() method to convert the list to Task<Document> list and then WaitAll() on it?

Parallel class and AsParallel() extension method both use Task<T> underneath, don't they? Is it possible to pass local state into the delegates just like i pass it to Task(Action<Object>, Object) overload?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

爱本泡沫多脆弱 2025-01-04 05:46:41

使用 AsParallel 进行 IO 是危险的,因为您无法精确控制并行度 (DOP)。您的 IO 设备将具有特定的最佳 DOP,但这与 TPL 将使用的不同。

另外,在调用网络函数时,我发现 TPL 使用的线程数远多于处理器的数量。这会导致网络过饱和和吞吐量不佳。它还可能导致超时。我不会将这样的东西投入生产,因为它的脆弱性。

我并不完全清楚 TPL 用于选择线程数的算法。我认为它试图检测添加比 CPU 多的线程是否会增加吞吐量。但恕我直言,它使用的 CPU 数量绝不会少于该数量。想象 64 个线程正在冲击您的 Web 服务。

如果您需要精确的并行度,我建议您自己创建所需数量的任务/线程。您可以将此代码放入可重用的辅助函数(“ParallelForeachWithExactDOP”)中。

我的建议:如果您只想并行运行所有内容,从而冒着过饱和和超时的风险,您确实可以使用 Select 一次生成所有任务。仅当您知道任务数量在合理范围内(例如,最多有 10 个文档)时才应执行此操作。

您还可以使用以下一个技巧:将文档分成 10 个块。然后,对于每个块,您一次生成所有任务并等待所有任务完成。这样您一次就只有 10 个任务在执行。这个方法相当简单。但它会提供次优的吞吐量,因为大多数时候运行的任务少于 10 个,有时甚至没有。将此视为简单的初学者技术。

Using AsParallel for IO is dangerous because you cannot precisely control the degree of parallelism (DOP). Your IO device will have a certain optimal DOP but this will be different from what TPL will use.

Also, when calling network functions, I have seen TPL use much more threads than the number of processors. This leads to oversaturation of the network and suboptimal throughput. It can also lead to timeouts. I would not put such a thing into production because of its fragile nature.

The algorithm that TPL uses to choose the number of threads is not entirely clear to me. I think it tries to detect if adding more threads than there are CPUs increases throughput. But it will IMHO never use less than the number of CPUs. Imaging 64 threads hammering your web-service.

If you need a precise degree of parallelism I suggest you create the wanted amount of Tasks/Threads yourself. You can put this code into a reusable helper function ("ParallelForeachWithExactDOP").

My recommendation: If you just want to run everything you have in parallel, thereby risking oversaturation and timeouts, you can indeed just use Select to spawn all tasks at once. You should only do this if you know that the number of tasks will be in a sane range (say, there are at most 10 documents).

Here is a trick that you could also use: split your documents into chunks of 10. Then, foreach chunk, you spawn all tasks at once and wait for all of them to complete. This way you have only 10 tasks in flight at once. This method is fairly simple. But it will provide suboptimal throughput because most of the time there are less than 10 tasks running and sometimes even none. Consider this to be a simple beginners technique.

稀香 2025-01-04 05:46:41

不确定这是否是并行化的良好目标,瓶颈将是常见的客户端网络连接。不能从这里说,但除非您有大量未使用的容量(有占用网络的风险),或者由于某种原因对一个文档的请求可能会被阻止,以便您可以处理另一个文档,否则不要认为您会得到很多出于这个。

通过 Web 服务实现并行化,这将是一个热门话题。

Not sure that's a good target for parallelisation, the bottleneck is going to be the client network connection which is common. Can't say from here but unless you have a lot of unused capacity (risks hogging the network) or there's some reason a request for one document might block so you can work on another, don't think you are going to get a lot out of this.

Parallelisation by web service, that would be a goer.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文