并行 Linq 查询优化
一段时间以来,我一直围绕没有副作用的方法构建代码,以便使用并行 linq 来加快速度。在此过程中,我不止一次偶然发现惰性求值使事情变得更糟而不是更好,我想知道是否有任何工具可以帮助优化并行 linq 查询。
我问这个问题是因为我最近通过修改一些方法并在某些关键位置添加 AsParallel
来重构了一些令人尴尬的并行代码。运行时间从 2 分钟缩短到 45 秒,但从性能监视器可以清楚地看出,在某些地方 CPU 上的所有核心都没有得到充分利用。在几次错误启动后,我强制使用 ToArray 执行一些查询,运行时间进一步缩短至 16 秒。减少代码的运行时间感觉很好,但也有点令人不安,因为不清楚代码中的哪些位置需要使用 ToArray
强制查询。等到最后一刻才执行查询并不是最佳策略,但根本不清楚代码中的哪些点需要强制执行某些子查询才能利用所有 CPU 核心。
事实上,我不知道如何正确地使用 ToArray
或其他强制执行 linq 计算以获得最大 CPU 利用率的方法。那么有没有一些通用的指南和工具来优化并行 linq 查询呢?
这是一个伪代码示例:
var firstQuery = someDictionary.SelectMany(FirstTransformation);
var secondQuery = firstQuery.Select(SecondTransformation);
var thirdQuery = secondQuery.Select(ThirdTransformation).Where(SomeConditionCheck);
var finalQuery = thirdQuery.Select(FinalTransformation).Where(x => x != null);
FirstTransformation
、SecondTransformation
、ThirdTransformation
都是 CPU 密集型的,就复杂性而言,它们是一些 3x3 矩阵乘法,一些 if
分支。 SomeConditionCheck
几乎是一个 null
检查。 FinalTransformation 是代码中 CPU 最密集的部分,因为它将执行一大堆线平面相交,并检查这些相交的多边形包含情况,然后提取最接近某个点的相交。线。
我不知道为什么我放置 AsParallel
的地方会减少代码的运行时间。我现在已经达到了运行时间的局部最小值,但我不知道为什么。我偶然发现它只是运气不好。如果您想知道放置 AsParallel
的位置是第一行和最后一行。将 AsParallel
放在其他地方只会增加运行时间,有时最多会增加 20 秒。第一行还隐藏着一个隐藏的 ToArray
。
For some time now I've been structuring my code around methods with no side-effects in order to use parallel linq to speed things up. Along the way I've more than once stumbled on lazy evaluation making things worse instead of better and I would like to know if there are any tools to help with optimizing parallel linq queries.
I ask because I recently refactored some embarrassingly parallel code by modifying some methods and peppering AsParallel
in certain key places. The run time went down from 2 minutes to 45 seconds but it was clear from the performance monitor that there were some places where all the cores on the CPU were not being fully utilized. After a few false starts I forced some of the queries to execute by using ToArray
and the run time went down even further to 16 seconds. It felt good to reduce the run time of the code but it was also slightly disconcerting because it was not clear where in the code queries needed to be forced with ToArray
. Waiting until the last minute for the query to execute was not the optimal strategy but it was not clear at all at what points in the code some of the subqueries needed to be forced in order to utilize all the CPU cores.
As it is I have no idea how to properly pepper ToArray
or other methods that force linq computations to execute in order to gain maximum CPU utilization. So are there any general guidelines and tools for optimizing parallel linq queries?
Here's a pseudo-code sample:
var firstQuery = someDictionary.SelectMany(FirstTransformation);
var secondQuery = firstQuery.Select(SecondTransformation);
var thirdQuery = secondQuery.Select(ThirdTransformation).Where(SomeConditionCheck);
var finalQuery = thirdQuery.Select(FinalTransformation).Where(x => x != null);
FirstTransformation
, SecondTransformation
, ThirdTransformation
are all CPU bound and in terms of complexity they are a few 3x3 matrix multiplications and some if
branches. SomeConditionCheck
is pretty much a null
check. FinalTransformation
is the most CPU intensive part of the code because it will perform a whole bunch of line-plane intersections and will check polygon containment for those intersections and then extract the intersection that is closest to a certain point on the line.
I have no idea why the places where I put AsParallel
reduced the run time of the code as much as it did. I have now reached a local minimum in terms of run time but I have no idea why. It was just dumb luck that I stumbled on it. In case you're wondering the places to put AsParallel
are the first and last lines. Putting AsParallel
anywhere else will only increase the run time, sometimes by up to 20 seconds. There is also a hidden ToArray
hiding in there on the first line.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
这里发生了一些事情:
因此,这里的总体指导原则是:确保在开始之前您已经拥有一个数组(如果可能),并且仅在评估之前将 AsParallel 放在最后一个查询上。所以像下面这样的东西应该工作得很好:
There are a couple things going on here:
So the overall guideline here is: make sure that before you start you've got an array if possible, and only put AsParallel on the very last query before evaluation. So something like the following should work pretty well:
如果不看到实际的代码,几乎不可能判断。但作为一般准则,您应该考虑在复杂数字运算期间避免使用 P/LINQ,因为委托和 IEnumerable 开销太高了。通过使用线程获得的速度很可能会被 LINQ 提供的方便抽象所消耗。
下面是一些代码,它计算 2 个整数列表的总和,进行一些 int 与 float 比较,然后计算它的 cos。非常基本的东西可以用 LINQ 的 .Zip 操作符很好地完成......或者使用 for 循环的老式方法。
更新 1,在我的 Haswell 8 核机器上使用更新的 ParallelLinq
更新1 结束
由于 IEnumerable 惰性和方法调用,时间差异几乎是 3 倍开销(我确实使用了发布模式 x32 Windows 7,.NET 4 双核)。我尝试在 LINQ 版本中使用 AsParallel,但它确实变慢了(2,3 秒)。如果您是数据驱动的,则应该使用 Parallel.For 构造来获得良好的可扩展性。 IEnumerable 本身不适合并行化,因为
下面是一个代码示例来说明这一点。如果您想对裸机进行更多优化,您首先需要摆脱抽象,因为每个项目的成本过高。与非内联 MoveNext() 和 Current 方法调用相比,数组访问要便宜得多。
It is nearly impossible to tell without seeing the actual code. But as a general guideline you should consider to avoid P/LINQ during complex number crunching because the delegate and IEnumerable overhead is just too high. The speed you gain by using threads is very likely eaten up by the convenient abstractions LINQ does provide.
Here is some code which does calculate the sum of 2 integer lists does some int to float comparison and then calculates the cos of it. Pretty basic stuff that can be nicely done with LINQ the .Zip operator ... or the old fashioned way with a for loop.
Update 1 with updated ParallelLinq on my Haswell 8 core machine
Update 1 End
The time difference is nearly a factor 3 because of the IEnumerable laziness and method call overhead (I did use Release mode x32 Windows 7, .NET 4 dual core). I have tried to use AsParallel in the LINQ version but it did get actually slower (2,3s). If you are data driven you should use the Parallel.For construct to get good scalbility. IEnumerable in itself is a bad candidate for parallelization since
Below is a code sample to illustrate the point. If you want to optimize more towards the bare metal you need first to get rid of abstractions which do cost too much per item. An array access is much much cheaper compared to non inlined MoveNext() and Current method calls.