并行 Linq 查询优化

发布于 2025-01-03 03:40:09 字数 1407 浏览 0 评论 0原文

一段时间以来，我一直围绕没有副作用的方法构建代码，以便使用并行 linq 来加快速度。在此过程中，我不止一次偶然发现惰性求值使事情变得更糟而不是更好，我想知道是否有任何工具可以帮助优化并行 linq 查询。

我问这个问题是因为我最近通过修改一些方法并在某些关键位置添加 AsParallel 来重构了一些令人尴尬的并行代码。运行时间从 2 分钟缩短到 45 秒，但从性能监视器可以清楚地看出，在某些地方 CPU 上的所有核心都没有得到充分利用。在几次错误启动后，我强制使用 ToArray 执行一些查询，运行时间进一步缩短至 16 秒。减少代码的运行时间感觉很好，但也有点令人不安，因为不清楚代码中的哪些位置需要使用 ToArray 强制查询。等到最后一刻才执行查询并不是最佳策略，但根本不清楚代码中的哪些点需要强制执行某些子查询才能利用所有 CPU 核心。

事实上，我不知道如何正确地使用 ToArray 或其他强制执行 linq 计算以获得最大 CPU 利用率的方法。那么有没有一些通用的指南和工具来优化并行 linq 查询呢？

这是一个伪代码示例：

var firstQuery = someDictionary.SelectMany(FirstTransformation);
var secondQuery = firstQuery.Select(SecondTransformation);
var thirdQuery = secondQuery.Select(ThirdTransformation).Where(SomeConditionCheck);
var finalQuery = thirdQuery.Select(FinalTransformation).Where(x => x != null);

FirstTransformation、SecondTransformation、ThirdTransformation 都是 CPU 密集型的，就复杂性而言，它们是一些 3x3 矩阵乘法，一些 if 分支。 SomeConditionCheck 几乎是一个 null 检查。 FinalTransformation 是代码中 CPU 最密集的部分，因为它将执行一大堆线平面相交，并检查这些相交的多边形包含情况，然后提取最接近某个点的相交。线。

我不知道为什么我放置 AsParallel 的地方会减少代码的运行时间。我现在已经达到了运行时间的局部最小值，但我不知道为什么。我偶然发现它只是运气不好。如果您想知道放置 AsParallel 的位置是第一行和最后一行。将 AsParallel 放在其他地方只会增加运行时间，有时最多会增加 20 秒。第一行还隐藏着一个隐藏的 ToArray 。

原文

For some time now I've been structuring my code around methods with no side-effects in order to use parallel linq to speed things up. Along the way I've more than once stumbled on lazy evaluation making things worse instead of better and I would like to know if there are any tools to help with optimizing parallel linq queries.

I ask because I recently refactored some embarrassingly parallel code by modifying some methods and peppering AsParallel in certain key places. The run time went down from 2 minutes to 45 seconds but it was clear from the performance monitor that there were some places where all the cores on the CPU were not being fully utilized. After a few false starts I forced some of the queries to execute by using ToArray and the run time went down even further to 16 seconds. It felt good to reduce the run time of the code but it was also slightly disconcerting because it was not clear where in the code queries needed to be forced with ToArray. Waiting until the last minute for the query to execute was not the optimal strategy but it was not clear at all at what points in the code some of the subqueries needed to be forced in order to utilize all the CPU cores.

As it is I have no idea how to properly pepper ToArray or other methods that force linq computations to execute in order to gain maximum CPU utilization. So are there any general guidelines and tools for optimizing parallel linq queries?

Here's a pseudo-code sample:

var firstQuery = someDictionary.SelectMany(FirstTransformation);
var secondQuery = firstQuery.Select(SecondTransformation);
var thirdQuery = secondQuery.Select(ThirdTransformation).Where(SomeConditionCheck);
var finalQuery = thirdQuery.Select(FinalTransformation).Where(x => x != null);

FirstTransformation, SecondTransformation, ThirdTransformation are all CPU bound and in terms of complexity they are a few 3x3 matrix multiplications and some if branches. SomeConditionCheck is pretty much a null check. FinalTransformation is the most CPU intensive part of the code because it will perform a whole bunch of line-plane intersections and will check polygon containment for those intersections and then extract the intersection that is closest to a certain point on the line.

I have no idea why the places where I put AsParallel reduced the run time of the code as much as it did. I have now reached a local minimum in terms of run time but I have no idea why. It was just dumb luck that I stumbled on it. In case you're wondering the places to put AsParallel are the first and last lines. Putting AsParallel anywhere else will only increase the run time, sometimes by up to 20 seconds. There is also a hidden ToArray hiding in there on the first line.

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

墨小沫ゞ 2025-01-10 03:40:09

这里发生了一些事情：

PLINQ 比不可计数的 IEnumerables 更有效地并行化集合。如果您有一个数组，它会将数组长度除以 CPU 核心数，然后均匀地分配它们。但是，如果您有一个未知长度的 IEnumerable，它会执行一种愚蠢的指数斜坡上升类型的操作，其中任务将一次处理 1、2、4、8 等元素，直到到达 IEnumerable 的末尾。
通过并行化所有查询，您可以将工作分解为小块。如果您有跨 N 个元素的 M 个并行查询，则最终会得到 M*N 任务。与仅并行化最后一个查询相比，这会产生更多的线程开销，在这种情况下，您最终只会执行 N 个任务。
当每个任务的处理时间大致相同时，PLINQ 效果最佳。这样它就可以将它们均匀地分配给核心。通过并行化每个具有不同性能行为的查询，您将拥有 M*N 个任务，这些任务需要花费不同的时间，并且 PLINQ 无法以最佳方式安排它们（因为它不提前知道每个任务可能需要多长时间））。

因此，这里的总体指导原则是：确保在开始之前您已经拥有一个数组（如果可能），并且仅在评估之前将 AsParallel 放在最后一个查询上。所以像下面这样的东西应该工作得很好：

var firstQuery = someDictionary.SelectMany().ToArray().Select(FirstTransformation);
var secondQuery = firstQuery.Select(SecondTransformation);
var thirdQuery = secondQuery.Select(ThirdTransformation).AsParallel().Where(SomeConditionCheck).ToArray();
var finalQuery = thirdQuery.Select(FinalTransformation).AsParallel().Where(x => x != null);

There are a couple things going on here:

PLINQ parallelizes collections more efficiently than uncounted IEnumerables. If you have an array, it divides the array length by your number of CPU cores and tasks them out evenly. But if you have an IEnumerable with an unknown length, it does a goofy exponential ramp-up type of thing, where tasks will process 1, 2, 4, 8, etc. elements at a time until it hits the end of the IEnumerable.
By parallelizing all your queries, you're breaking up work into tiny chunks. If you have M parallel queries across N elements, you end up with M*N tasks. There's more thread overhead in this than if you just parallelize the last query, in which case you'd end up with just N tasks.
PLINQ does best when each task takes roughly the same amount of time to process. That way it can divide them up evenly among the cores. By parallelizing each of your queries that have different performance behavior, you have M*N tasks that take varying amounts of time, and PLINQ is not able to schedule them optimally (because it doesn't know ahead of time how long each one might take).

So the overall guideline here is: make sure that before you start you've got an array if possible, and only put AsParallel on the very last query before evaluation. So something like the following should work pretty well:

var firstQuery = someDictionary.SelectMany().ToArray().Select(FirstTransformation);
var secondQuery = firstQuery.Select(SecondTransformation);
var thirdQuery = secondQuery.Select(ThirdTransformation).AsParallel().Where(SomeConditionCheck).ToArray();
var finalQuery = thirdQuery.Select(FinalTransformation).AsParallel().Where(x => x != null);

回复收藏 0 原文

王权女流氓 2025-01-10 03:40:09

如果不看到实际的代码，几乎不可能判断。但作为一般准则，您应该考虑在复杂数字运算期间避免使用 P/LINQ，因为委托和 IEnumerable 开销太高了。通过使用线程获得的速度很可能会被 LINQ 提供的方便抽象所消耗。

下面是一些代码，它计算 2 个整数列表的总和，进行一些 int 与 float 比较，然后计算它的 cos。非常基本的东西可以用 LINQ 的 .Zip 操作符很好地完成......或者使用 for 循环的老式方法。

更新 1，在我的 Haswell 8 核机器上使用更新的 ParallelLinq

Linq 0.95s
Linq 并行 0,19s
优化 0.45 秒
优化并行 0,08s

更新1 结束

LINQ 1.65s
优化 0.64 秒
优化并行 0.40 秒

由于 IEnumerable 惰性和方法调用，时间差异几乎是 3 倍开销（我确实使用了发布模式 x32 Windows 7，.NET 4 双核）。我尝试在 LINQ 版本中使用 AsParallel，但它确实变慢了（2,3 秒）。如果您是数据驱动的，则应该使用 Parallel.For 构造来获得良好的可扩展性。 IEnumerable 本身不适合并行化，因为

在枚举结束之前您不知道有多少工作。
您无法进行急切分块，因为您不知道 IEnumerable 返回下一个元素（可能是 Web 服务调用或数组索引访问）的速度。

下面是一个代码示例来说明这一点。如果您想对裸机进行更多优化，您首先需要摆脱抽象，因为每个项目的成本过高。与非内联 MoveNext() 和 Current 方法调用相比，数组访问要便宜得多。

    class Program
    {
        static void Main(string[] args)
        {
            var A = new List<int>(Enumerable.Range(0, 10*1000*1000));
            var B = new List<int>(Enumerable.Range(0, 10*1000*1000));

            double[] Actual = UseLinq(A, B);
            double[] pActual = UseLinqParallel(A, B);

            var other = Optimized(A, B);
            var pother = OptimizedParallel(A, B);
        }

        private static double[] UseLinq(List<int> A, List<int> B)
        {
            var sw = Stopwatch.StartNew();
            var Merged = A.Zip(B, (a, b) => a + b);
            var Converted = A.Select(a => (float)a);

            var Result = Merged.Zip(Converted, (m, c) => Math.Cos((double)m / ((double)c + 1)));

            double[] Actual = Result.ToArray();
            sw.Stop();

            Console.WriteLine("Linq {0:F2}s", sw.Elapsed.TotalSeconds);
            return Actual;
        }

    private static double[] UseLinqParallel(List<int> A, List<int> B)
    {
        var sw = Stopwatch.StartNew();
        var x = A.AsParallel();
        var y = B.AsParallel();

        var Merged = x.Zip(y, (a, b) => a + b);
        var Converted = x.Select(a => (float)a);

        var Result = Merged.Zip(Converted, (m, c) => Math.Cos((double)m / ((double)c + 1)));

        double[] Actual = Result.ToArray();
        sw.Stop();

        Console.WriteLine("Linq Parallel {0:F2}s", sw.Elapsed.TotalSeconds);
        return Actual;
    }        

        private static double[] OptimizedParallel(List<int> A, List<int> B)
        {
            double[] result = new double[A.Count];
            var sw = Stopwatch.StartNew();
            Parallel.For(0, A.Count, (i) =>
            {
                var sum = A[i] + B[i];
                result[i] = Math.Cos((double)sum / ((double)((float)A[i]) + 1));
            });
            sw.Stop();

            Console.WriteLine("Optimized Parallel {0:F2}s", sw.Elapsed.TotalSeconds);
            return result;
        }

        private static double[] Optimized(List<int> A, List<int> B)
        {
            double[] result = new double[A.Count];
            var sw = Stopwatch.StartNew();
            for(int i=0;i<A.Count;i++)
            {
                var sum = A[i] + B[i];
                result[i] = Math.Cos((double)sum / ((double)((float)A[i]) + 1));
            }
            sw.Stop();

            Console.WriteLine("Optimized {0:F2}s", sw.Elapsed.TotalSeconds);
            return result;
        }
    }
}

It is nearly impossible to tell without seeing the actual code. But as a general guideline you should consider to avoid P/LINQ during complex number crunching because the delegate and IEnumerable overhead is just too high. The speed you gain by using threads is very likely eaten up by the convenient abstractions LINQ does provide.

Here is some code which does calculate the sum of 2 integer lists does some int to float comparison and then calculates the cos of it. Pretty basic stuff that can be nicely done with LINQ the .Zip operator ... or the old fashioned way with a for loop.

Update 1 with updated ParallelLinq on my Haswell 8 core machine

Linq 0,95s
Linq Parallel 0,19s
Optimized 0,45s
Optimized Parallel 0,08s

Update 1 End

LINQ 1,65s
Optimized 0,64s
Optimized Parallel 0,40s

The time difference is nearly a factor 3 because of the IEnumerable laziness and method call overhead (I did use Release mode x32 Windows 7, .NET 4 dual core). I have tried to use AsParallel in the LINQ version but it did get actually slower (2,3s). If you are data driven you should use the Parallel.For construct to get good scalbility. IEnumerable in itself is a bad candidate for parallelization since

You do not know how many work you have before you did enumerate until the end.
You cannot do eager chunking because you do not know how fast the IEnumerable will return the next element (could be a web service call or an array index access).

Below is a code sample to illustrate the point. If you want to optimize more towards the bare metal you need first to get rid of abstractions which do cost too much per item. An array access is much much cheaper compared to non inlined MoveNext() and Current method calls.

    class Program
    {
        static void Main(string[] args)
        {
            var A = new List<int>(Enumerable.Range(0, 10*1000*1000));
            var B = new List<int>(Enumerable.Range(0, 10*1000*1000));

            double[] Actual = UseLinq(A, B);
            double[] pActual = UseLinqParallel(A, B);

            var other = Optimized(A, B);
            var pother = OptimizedParallel(A, B);
        }

        private static double[] UseLinq(List<int> A, List<int> B)
        {
            var sw = Stopwatch.StartNew();
            var Merged = A.Zip(B, (a, b) => a + b);
            var Converted = A.Select(a => (float)a);

            var Result = Merged.Zip(Converted, (m, c) => Math.Cos((double)m / ((double)c + 1)));

            double[] Actual = Result.ToArray();
            sw.Stop();

            Console.WriteLine("Linq {0:F2}s", sw.Elapsed.TotalSeconds);
            return Actual;
        }

    private static double[] UseLinqParallel(List<int> A, List<int> B)
    {
        var sw = Stopwatch.StartNew();
        var x = A.AsParallel();
        var y = B.AsParallel();

        var Merged = x.Zip(y, (a, b) => a + b);
        var Converted = x.Select(a => (float)a);

        var Result = Merged.Zip(Converted, (m, c) => Math.Cos((double)m / ((double)c + 1)));

        double[] Actual = Result.ToArray();
        sw.Stop();

        Console.WriteLine("Linq Parallel {0:F2}s", sw.Elapsed.TotalSeconds);
        return Actual;
    }        

        private static double[] OptimizedParallel(List<int> A, List<int> B)
        {
            double[] result = new double[A.Count];
            var sw = Stopwatch.StartNew();
            Parallel.For(0, A.Count, (i) =>
            {
                var sum = A[i] + B[i];
                result[i] = Math.Cos((double)sum / ((double)((float)A[i]) + 1));
            });
            sw.Stop();

            Console.WriteLine("Optimized Parallel {0:F2}s", sw.Elapsed.TotalSeconds);
            return result;
        }

        private static double[] Optimized(List<int> A, List<int> B)
        {
            double[] result = new double[A.Count];
            var sw = Stopwatch.StartNew();
            for(int i=0;i<A.Count;i++)
            {
                var sum = A[i] + B[i];
                result[i] = Math.Cos((double)sum / ((double)((float)A[i]) + 1));
            }
            sw.Stop();

            Console.WriteLine("Optimized {0:F2}s", sw.Elapsed.TotalSeconds);
            return result;
        }
    }
}

回复收藏 0 原文

~没有更多了~