PLinq 本质上比 System.Threading.Tasks.Parallel.ForEach 更快
总结:我从 System.Threading.Tasks.Parallel.ForEach 和并发数据结构更改为简单的 plinq(并行 Linq)查询。速度惊人。
那么 plinq 本质上比 Parallel.ForEach 更快吗?或者它是否特定于任务。
// Original Code
// concurrent dictionary to store results
var resultDict = new ConcurrentDictionary<string, MyResultType>();
Parallel.ForEach(items, item =>
{
resultDict.TryAdd(item.Name, PerformWork(source));
});
// new code
var results =
items
.AsParallel()
.Select(item => new { item.Name, queryResult = PerformWork(item) })
.ToDictionary(kv => kv.SourceName, kv => kv.queryResult);
注释: 现在,每个任务 (PerformWork) 的运行时间在 0 到 200 毫秒之间。我以前花了更长的时间才优化它。这就是我首先使用 Tasks.Parallel 库的原因。因此,我将总时间从 2 秒缩短到约 100-200 毫秒,执行大致相同的工作,只是使用不同的方法。 (哇 linq 和 plinq 太棒了!)
问题:
- 速度是否因使用 plinq 与 Parallel.ForEach 而提高?
- 是否只是简单地删除并发数据结构(ConcurrentDictionary)? (因为它不需要同步线程)。
- 基于此相关问题的答案
虽然 PLINQ 主要基于没有副作用的函数式编程风格,但副作用正是 TPL 的目的。如果您想真正并行工作而不只是并行搜索/选择内容,则可以使用 TPL。
我是否可以假设因为我的模式基本上是功能性的(让输入产生新的输出而不发生突变),所以 plinq 是正确的技术?
我正在寻找我的假设是否正确的验证,或者是否有迹象表明我遗漏了某些内容。
Summary: I changed from System.Threading.Tasks.Parallel.ForEach and Concurrent Data structure to a simple plinq (Parallel Linq) query. The speed up was amazing.
So is plinq inherently faster than Parallel.ForEach? Or is it specific to the task.
// Original Code
// concurrent dictionary to store results
var resultDict = new ConcurrentDictionary<string, MyResultType>();
Parallel.ForEach(items, item =>
{
resultDict.TryAdd(item.Name, PerformWork(source));
});
// new code
var results =
items
.AsParallel()
.Select(item => new { item.Name, queryResult = PerformWork(item) })
.ToDictionary(kv => kv.SourceName, kv => kv.queryResult);
Notes:
Each task (PerformWork) now runs between 0 and 200 ms. It used to take longer before I optimized it. That's why I was using the Tasks.Parallel library in the fist place. So I went from 2 seconds total time to ~100-200 ms total time, performing roughly the same work, just with different methods. (Wow linq and plinq are awesome!)
Questions:
- Is the speed up due to using plinq vs Parallel.ForEach?
- Is it instead simply the removal of the concurrent data structure (ConcurrentDictionary)? (Because it doesn't need to synchronize threads).
- Based on the answer from this related question
Whereas PLINQ is largely based on a functional style of programming with no side-effects, side-effects are precisely what the TPL is for. If you want to actually do work in parallel as opposed to just searching/selecting things in parallel, you use the TPL.
Can I assume that because my pattern is basically functional (giving inputs produce new outputs without mutation), that plinq is the correct technology to use?
I'm looking for validation that my assumptions are correct, or an indication that I'm missing something.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
无法使用这 2 个代码示例在
Parallel.ForEach
和 PLINQ 之间进行明确的比较。代码示例实在是太不同了。我首先想到的是第一个示例使用
ConcurrentDictionary
,第二个示例使用Dictionary
。这两种类型具有截然不同的用途和性能特征。为了准确比较两种技术,您需要在此处保持类型一致。It's not possible to use these 2 code samples to do a definitive comparison between
Parallel.ForEach
and PLINQ. The code samples are simply too different.The first item that jumps out at me is the first sample uses
ConcurrentDictionary
and the second usesDictionary
. These two types have very different uses and performance characteristics. In order to get an accurate comparison between the two technologies you need to be consistent here with the types.根据您在示例中提供的有限信息(我在OP的评论中要求提供更多详细信息),我猜您肯定会看到由于使用的分区算法而导致的差异。您应该阅读块分区与范围分区 在这篇博文中,他讨论了它们的不同之处以及它们可能最适合哪种类型的工作。强烈建议您阅读该博客文章以及这篇文章< /a> 更详细地介绍了这两种类型以及可以使用的其他两种类型的分区(尽管不适用于您的示例),并提供了一些视觉帮助以更好地理解分区。最后,这是另一篇博客文章< /a> 讨论工作分区以及当默认分区算法对您的特定工作负载没有意义时它如何影响您。该文章实际上引用了一个很棒的程序,可以帮助您可视化工作中的分区程序,该程序是来自以下位置的一组并行示例的一部分PFX 团队。
Based on the limited information you've provided in your sample (I asked for more details in a comment on the OP), I'm guessing sure you're seeing differences due to the partitioning algorithm that is used. You should read up on Chunk Partitioning vs. Range Partitioning in this blog post where he discusses how they differ and for which types of work they might be best suited for. Highly recommend you read that blog article as well as this one which goes into a little more detail on those two types along with two other types of partitioning that can be used, though not applicable to your sample, as well as giving some visual aids to better understand the partitioning. Finally, here's yet another blog post that discusses work partitioning and how it can affect you when the default partitioning algorithm doesn't make sense for your particular workload. That post actually refers to a great program that helps you visualize the partitioners at work that's part of a set of parallel samples from the PFX team.