我应该始终使用 Parallel.Foreach 因为更多线程必须加快一切速度吗?
对每个正常的 foreach 使用 parallel.foreach 循环对您来说有意义吗?
我什么时候应该开始使用parallel.foreach,只迭代1,000,000个项目?
Does it make sense to you to use for every normal foreach a parallel.foreach loop ?
When should I start using parallel.foreach, only iterating 1,000,000 items?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(8)
不,这对每个 foreach 都没有意义。一些原因:
Parallel.ForEach
为此,但你不应该盲目地这样做。。基本上,线程中的任何事情都不应该盲目地完成。想想并行化实际上在哪里有意义。哦,还要衡量影响,以确保所带来的好处值得增加的复杂性。 (调试之类的事情将会变得更加困难。)TPL 很棒,但它不是免费的午餐。
No, it doesn't make sense for every foreach. Some reasons:
Parallel.ForEach
for this, but you shouldn't just do it blindlyBasically nothing in threading should be done blindly. Think about where it actually makes sense to parallelize. Oh, and measure the impact to make sure the benefit is worth the added complexity. (It will be harder for things like debugging.) TPL is great, but it's no free lunch.
简短的答案是不,您不应该只在每个循环上使用
Parallel.ForEach
或相关的构造。并行有一些开销,这在迭代次数少且快速的循环中是不合理的。此外,这些循环内的
break
明显更加复杂。Parallel.ForEach 是根据循环中的迭代次数、硬件上的 CPU 核心数以及该硬件上的当前负载,按照任务调度程序认为合适的方式调度循环的请求。实际的并行执行并不总是得到保证,并且如果内核较少、迭代次数较低和/或当前负载较高,则不太可能保证并行执行。
另请参阅 Parallel.ForEach 是否限制活动线程的数量?< /a> 和 是否并行。每次迭代使用一个任务?
长答案:
我们可以根据循环在两个轴上的分布情况对循环进行分类:
第三个因素是任务的持续时间是否变化很大——例如,如果您正在计算 Mandelbrot 集上的点,有些点计算起来很快,有些则需要更长的时间。
当迭代很少且快速时,可能不值得以任何方式使用并行化,很可能由于开销而最终会变慢。即使并行化确实加速了特定的小而快速的循环,也不太可能引起人们的兴趣:收益很小,并且它不是应用程序的性能瓶颈,因此要针对可读性而不是性能进行优化。
如果循环的迭代次数很少且缓慢,并且您需要更多控制,则可以考虑使用任务来处理它们,如下所示:
如果迭代次数较多,则
Parallel.ForEach
位于其元素中。Microsoft 文档指出
随着循环迭代次数的减少,这种分区和动态重新调度将更难有效地进行,并且如果迭代的持续时间不同并且存在在同一台机器上运行的其他任务,则更有必要进行。
我运行了一些代码。
下面的测试结果显示机器上没有运行其他任何东西,并且没有使用 .Net 线程池中的其他线程。这并不典型(事实上,在 Web 服务器场景中这是非常不现实的)。在实践中,您可能看不到少量迭代的任何并行化。
测试代码为:
在 4 核 Windows 7 机器上的结果为:
运行在 .Net 4 和 .Net 4.5 中编译的代码给出了大致相同的结果。
串行工作运行都是相同的。无论你如何切片,它的运行时间大约为 2.28 秒。
具有 1 次迭代的并行工作比完全没有并行稍微长一些。 2 个项目更短,3 个项目也更短,并且 4 次或更多迭代的时间都约为 0.8 秒。
它使用所有核心,但效率并未达到 100%。如果串行工作在没有开销的情况下被分为 4 种方式,它将在 0.57 秒内完成 (2.28 / 4 = 0.57)。
在其他场景中,我发现并行 2-3 次迭代根本没有加速。您无法使用 Parallel.ForEach 对其进行细粒度控制,并且如果机器繁忙,算法可能会决定将它们“分区”为 1 个块并在 1 个核心上运行。
The short answer is no, you should not just use
Parallel.ForEach
or related constructs on each loop that you can.Parallel has some overhead, which is not justified in loops with few, fast iterations. Also,
break
is significantly more complex inside these loops.Parallel.ForEach
is a request to schedule the loop as the task scheduler sees fit, based on number of iterations in the loop, number of CPU cores on the hardware and current load on that hardware. Actual parallel execution is not always guaranteed, and is less likely if there are fewer cores, the number of iterations is low and/or the current load is high.See also Does Parallel.ForEach limits the number of active threads? and Does Parallel.For use one Task per iteration?
The long answer:
We can classify loops by how they fall on two axes:
A third factor is if the tasks vary in duration very much – for instance if you are calculating points on the Mandelbrot set, some points are quick to calculate, some take much longer.
When there are few, fast iterations it's probably not worth using parallelisation in any way, most likely it will end up slower due to the overheads. Even if parallelisation does speed up a particular small, fast loop, it's unlikely to be of interest: the gains will be small and it's not a performance bottleneck in your application so optimise for readability not performance.
Where a loop has very few, slow iterations and you want more control, you may consider using Tasks to handle them, along the lines of:
Where there are many iterations,
Parallel.ForEach
is in its element.The Microsoft documentation states that
This partitioning and dynamic re-scheduling is going to be harder to do effectively as the number of loop iterations decreases, and is more necessary if the iterations vary in duration and in the presence of other tasks running on the same machine.
I ran some code.
The test results below show a machine with nothing else running on it, and no other threads from the .Net Thread Pool in use. This is not typical (in fact in a web-server scenario it is wildly unrealistic). In practice, you may not see any parallelisation with a small number of iterations.
The test code is:
the results on a 4-core Windows 7 machine are:
Running code Compiled in .Net 4 and .Net 4.5 give much the same results.
The serial work runs are all the same. It doesn't matter how you slice it, it runs in about 2.28 seconds.
The parallel work with 1 iteration is slightly longer than no parallelism at all. 2 items is shorter, so is 3 and with 4 or more iterations is all about 0.8 seconds.
It is using all cores, but not with 100% efficiency. If the serial work was divided 4 ways with no overhead it would complete in 0.57 seconds (2.28 / 4 = 0.57).
In other scenarios I saw no speed-up at all with parallel 2-3 iterations. You do not have fine-grained control over that with
Parallel.ForEach
and the algorithm may decide to "partition " them into just 1 chunk and run it on 1 core if the machine is busy.不,你绝对不应该这样做。这里重要的一点并不是迭代次数,而是要完成的工作。如果您的工作非常简单,并行执行 1000000 个委托将增加巨大的开销,并且很可能比传统的单线程解决方案慢。您可以通过对数据进行分区来解决这个问题,这样您就可以执行大量的工作。
例如,考虑下面的情况:
这里的操作非常简单,并行执行此操作的开销将使使用多个内核的增益相形见绌。此代码的运行速度比常规 foreach 循环慢得多。
通过使用分区,我们可以减少开销并实际观察到性能的提高。
这个故事的主旨是并行性很困难,只有在仔细观察当前情况后才能使用它。此外,您应该在添加并行性之前和之后分析代码。
请记住,无论有任何潜在的性能增益,并行性总是会增加代码的复杂性,因此如果性能已经足够好,则没有理由增加复杂性。
No, you should definitely not do that. The important point here is not really the number of iterations, but the work to be done. If your work is really simple, executing 1000000 delegates in parallel will add a huge overhead and will most likely be slower than a traditional single threaded solution. You can get around this by partitioning the data, so you execute chunks of work instead.
E.g. consider the situation below:
The operation here is so simple, that the overhead of doing this in parallel will dwarf the gain of using multiple cores. This code runs significantly slower than a regular foreach loop.
By using a partition we can reduce the overhead and actually observe a gain in performance.
The morale of the story here is that parallelism is hard and you should only employ this after looking closely at the situation at hand. Additionally, you should profile the code both before and after adding parallelism.
Remember that regardless of any potential performance gain parallelism always adds complexity to the code, so if the performance is already good enough, there's little reason to add the complexity.
并行操作没有下限。如果您只有 2 个项目需要处理,但每个项目都需要一段时间,那么使用 Parallel.ForEach 可能仍然有意义。另一方面,如果您有 1000000 个项目,但它们没有做太多事情,则并行循环可能不会比常规循环快。
例如,我编写了一个简单的程序来计时嵌套循环,其中外部循环使用
for
循环和Parallel.ForEach
运行。我在我的 4-CPU(双核、超线程)笔记本电脑上进行了计时。这是一个只有 2 个项目需要处理的运行,但每个都需要一段时间:
这是一个有数百万个项目需要处理的运行,但它们做的并不多:
唯一真正了解的方法就是尝试。
There is no lower limit for doing parallel operations. If you have only 2 items to work on but each one will take a while, it might still make sense to use
Parallel.ForEach
. On the other hand if you have 1000000 items but they don't do very much, the parallel loop might not go any faster than the regular loop.For example, I wrote a simple program to time nested loops where the outer loop ran both with a
for
loop and withParallel.ForEach
. I timed it on my 4-CPU (dual-core, hyperthreaded) laptop.Here's a run with only 2 items to work on, but each takes a while:
Here's a run with millions of items to work on, but they don't do very much:
The only real way to know is to try it.
一般来说,一旦每个核心超过一个线程,操作中涉及的每个额外线程都会使其变慢,而不是更快。
然而,如果每个操作的一部分会阻塞(典型的例子是等待磁盘或网络 I/O,另一个是彼此不同步的生产者和消费者),那么比核心更多的线程可以开始再次加速,因为任务可以完成,而其他线程在 I/O 操作返回之前无法取得进展。
因此,当单核机器成为常态时,多线程唯一真正的理由是存在 I/O 引入的阻塞或提高响应能力(执行任务稍慢,但速度更快)再次开始响应用户输入)。
尽管如此,如今单核机器越来越罕见,因此看来您应该能够通过并行处理使所有事情的速度至少提高两倍。
如果顺序很重要,或者任务固有的某些因素迫使它具有同步瓶颈,或者如果操作数量太小以至于并行处理所带来的速度增加超过了所涉及的开销,则情况仍然不会如此。设置并行处理。如果共享资源需要线程阻塞执行相同并行操作的其他线程(取决于锁争用的程度),则可能会出现这种情况,也可能不会出现这种情况。
另外,如果您的代码本质上是多线程的,那么您可能会陷入本质上与自己竞争资源的情况(典型的情况是处理并发请求的 ASP.NET 代码)。这里并行操作的优势可能意味着在 4 核机器上的单个测试操作接近 4 倍的性能,但是一旦需要执行相同任务的请求数量达到 4,那么由于这 4 个请求中的每一个都尝试使用每个核心,它比每个核心都有一个核心好不了多少(也许稍微好一点,也许稍微差一点)。因此,随着使用从单个请求测试变为现实世界中的多个请求,并行操作的好处就消失了。
In general, once you go above a thread per core, each extra thread involved in an operation will make it slower, not faster.
However, if part of each operation will block (the classic example being waiting on disk or network I/O, another being producers and consumers that are out of synch with each other) then more threads than cores can begin to speed things up again, because tasks can be done while other threads are unable to make progress until the I/O operation returns.
For this reason, when single-core machines were the norm, the only real justifications in multi-threading were when either there was blocking of the sort I/O introduces or else to improve responsiveness (slightly slower to perform a task, but much quicker to start responding to user-input again).
Still, these days single-core machines are increasingly rare, so it would appear that you should be able to make everything at least twice as fast with parallel processing.
This will still not be the case if order is important, or something inherent to the task forces it to have a synchronised bottleneck, or if the number of operations is so small that the increase in speed from parallel processing is outweighed by the overheads involved in setting up that parallel processing. It may or may not be the case if a share resource requires threads to block on other threads performing the same parallel operation (depending on the degree of lock contention).
Also, if your code is inherently multithreaded to begin with, you can be in a situation where you are essentially competing for resources with yourself (a classic case being ASP.NET code handling simultaneous requests). Here the advantage in parallel operation may mean that a single test operation on a 4-core machine approaches 4 times the performance, but once the number of requests needing the same task to be performed reaches 4, then since each of those 4 requests are each trying to use each core, it becomes little better than if they had a core each (perhaps slightly better, perhaps slightly worse). The benefits of parallel operation hence disappears as the use changes from a single-request test to a real-world multitude of requests.
您不应该盲目地将应用程序中的每个 foreach 循环替换为并行 foreach。更多线程并不一定意味着您的应用程序会运行得更快。如果您想真正从多线程中受益,您需要将任务分割成可以并行运行的较小任务。如果您的算法不可并行化,您将不会获得任何好处。
You shouldn't blindly replace every single foreach loop in your application with the parallel foreach. More threads doesn't necessary mean that your application will work faster. You need to slice the task into smaller tasks which could run in parallel if you want to really benefit from multiple threads. If your algorithm is not parallelizable you won't get any benefit.
不。您需要了解代码在做什么以及它是否适合并行化。数据项之间的依赖性可能会使并行化变得困难,即,如果线程使用为前一个元素计算的值,则它必须等到该值计算出来并且无法并行运行。不过,您还需要了解您的目标架构,现在您购买的几乎所有产品上通常都会配备多核 CPU。即使在单核上,您也可以从更多线程中获得一些好处,但前提是您有一些阻塞任务。您还应该记住,创建和组织并行线程会产生开销。如果此开销占任务所需时间的很大一部分(或更多),您可能会减慢速度。
No. You need to understand what the code is doing and whether it is amenable to parallelization. Dependencies between your data items can make it hard to parallelize, i.e., if a thread uses the value calculated for the previous element it has to wait until the value is calculated anyway and can't run in parallel. You also need to understand your target architecture, though, you will typically have a multicore CPU on just about anything you buy these days. Even on a single core, you can get some benefits from more threads but only if you have some blocking tasks. You should also keep in mind that there is overhead in creating and organizing the parallel threads. If this overhead is a significant fraction of (or more than) the time your task takes you could slow it down.
这些是我的基准测试,显示纯串行以及各种级别的分区是最慢的。
These are my benchmarks showing pure serial is slowest, along with various levels of partitioning.