PLINQ 的性能比通常的 LINQ 差
令人惊讶的是,使用 PLINQ 并没有对我创建的小型测试用例产生任何好处;事实上,它比通常的 LINQ 还要糟糕。
这是测试代码:
int repeatedCount = 10000000;
private void button1_Click(object sender, EventArgs e)
{
var currTime = DateTime.Now;
var strList = Enumerable.Repeat(10, repeatedCount);
var result = strList.AsParallel().Sum();
var currTime2 = DateTime.Now;
textBox1.Text = (currTime2.Ticks-currTime.Ticks).ToString();
}
private void button2_Click(object sender, EventArgs e)
{
var currTime = DateTime.Now;
var strList = Enumerable.Repeat(10, repeatedCount);
var result = strList.Sum();
var currTime2 = DateTime.Now;
textBox2.Text = (currTime2.Ticks - currTime.Ticks).ToString();
}
结果?
textbox1: 3437500
textbox2: 781250
因此,LINQ 完成类似操作所需的时间比 PLINQ 少!
我做错了什么?还是有什么我不知道的扭曲?
编辑:我已经更新了代码以使用秒表,但是相同的行为仍然存在。为了降低 JIT 的效果,我实际上尝试了几次单击 button1
和 button2
,并且没有特定的顺序。尽管我得到的时间可能有所不同,但定性行为仍然存在:在这种情况下,PLINQ 确实较慢。
Amazingly, using PLINQ did not yield benefits on a small test case I created; in fact, it was even worse than usual LINQ.
Here's the test code:
int repeatedCount = 10000000;
private void button1_Click(object sender, EventArgs e)
{
var currTime = DateTime.Now;
var strList = Enumerable.Repeat(10, repeatedCount);
var result = strList.AsParallel().Sum();
var currTime2 = DateTime.Now;
textBox1.Text = (currTime2.Ticks-currTime.Ticks).ToString();
}
private void button2_Click(object sender, EventArgs e)
{
var currTime = DateTime.Now;
var strList = Enumerable.Repeat(10, repeatedCount);
var result = strList.Sum();
var currTime2 = DateTime.Now;
textBox2.Text = (currTime2.Ticks - currTime.Ticks).ToString();
}
The result?
textbox1: 3437500
textbox2: 781250
So, LINQ is taking less time than PLINQ to complete a similar operation!
What am I doing wrong? Or is there a twist that I don't know about?
Edit: I've updated my code to use stopwatch, and yet, the same behavior persisted. To discount the effect of JIT, I actually tried a few times with clicking both button1
and button2
and in no particular order. Although the time I got might be different, but the qualitative behavior remained: PLINQ was indeed slower in this case.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(9)
第一:停止使用 DateTime 来测量运行时间。请改用秒表。测试代码如下所示:
第二: 并行运行会增加开销。在这种情况下,PLINQ 必须找出划分集合的最佳方法,以便它可以安全地并行对元素求和。之后,您需要连接创建的各个线程的结果并对这些结果求和。这不是一项微不足道的任务。
使用上面的代码,我可以看到使用 Sum() 可以实现约 95 毫秒的调用。调用 .AsParallel().Sum() 大约需要 185 毫秒。
只有当你通过执行任务获得一些东西时,并行执行任务才是一个好主意。在本例中,Sum 是一个足够简单的任务,您无法通过使用 PLINQ 来完成。
First: Stop using DateTime to measure run time. Use a Stopwatch instead. The test code would look like:
Second: Running things in Parallel adds overhead. In this case, PLINQ has to figure out the best way to divide your collection so that it can Sum the elements safely in parallel. After that, you need to join the results from the various threads created and Sum those as well. This isn't a trivial task.
Using the code above I can see that using Sum() nets a ~95ms call. Calling .AsParallel().Sum() nets around ~185ms.
Doing a task in Parallel is only a good idea if you gain something by doing it. In this case, Sum is a simple enough task that you don't gain by using PLINQ.
这是一个典型的错误——思考,“我将运行一个简单的测试来比较这个单线程代码与这个多线程代码的性能。”
简单测试是衡量多线程性能最差的测试类型。
通常,并行化某些操作会带来性能优势< em>当您并行化的步骤需要大量工作时。当步骤很简单时(例如,快速*),并行化工作的开销最终会使您原本获得的微小性能提升相形见绌。
考虑这个类比。
你正在建造一座建筑物。如果你有一个工人,他必须一块一块地砌砖,直到砌完一堵墙,然后再砌下一堵墙,依此类推,直到所有墙都建成并连接起来。这是一项缓慢而费力的任务,可以从并行化中受益。
正确的方法是并行建造墙体——比如说,再雇佣 3 名工人,让每个工人建造自己的墙,这样 4 堵墙就可以建造起来。同时建设。与建造 4 堵墙所节省的时间(之前建造 1 堵墙所需的时间)相比,找到 3 名额外工人并为他们分配任务所花费的时间微不足道。
错误砌砖——再雇用大约一千名工人,让每个工人一次负责砌一块砖。你可能会想:“如果一个工人每分钟可以砌2块砖,那么一千个工人每分钟应该可以砌2000块砖,所以我很快就能完成这项工作!”但现实情况是,通过在如此微观的层面上并行化你的工作量,你会浪费大量的精力来收集和协调所有员工,为他们分配任务(“把这块砖放在那里”),确保没有人工作干扰了其他人的工作,等等。
因此,这个类比的寓意是:一般来说,使用并行化来分割实质性工作单元(如墙),但保留非实质性单位(如砖块)以通常的顺序方式处理。
*出于这个原因,您实际上可以通过采用任何快速执行的代码并添加 Thread.Sleep(100) 来在工作更加密集的环境中对并行化的性能增益进行相当好的近似> (或其他一些随机数)到其末尾。突然间,此代码的顺序执行每次迭代都会减慢 100 毫秒,而并行执行的减慢速度会明显减少。
This is a classic mistake -- thinking, "I'll run a simple test to compare the performance of this single-threaded code with this multi-threaded code."
A simple test is the worst kind of test you can run to measure multi-threaded performance.
Typically, parallelizing some operation yields a performance benefit when the steps you're parallelizing require substantial work. When the steps are simple -- as in, quick* -- the overhead of parallelizing your work ends up dwarfing the miniscule performance gain you would have otherwise gotten.
Consider this analogy.
You're constructing a building. If you have one worker, he has to lay bricks one by one until he's made one wall, then do the same for the next wall, and so on until all walls are built and connected. This is a slow and laborious task that could benefit from parallelization.
The right way to do this would be to parallelize the wall building -- hire, say, 3 more workers, and have each worker construct his own wall so that 4 walls can be built simultaneously. The time it takes to find the 3 extra workers and assign them their tasks is insignificant in comparison to the savings you get by getting 4 walls up in the amount of time it would have previously taken to build 1.
The wrong way to do it would be to parallelize the brick laying -- hire about a thousand more workers and have each worker responsible for laying a single brick at a time. You may think, "If one worker can lay 2 bricks per minute, then a thousand workers should be able to lay 2000 bricks per minute, so I'll finish this job in no time!" But the reality is that by parallelizing your workload at such a microscopic level, you're wasting a tremendous amount of energy gathering and coordinating all of your workers, assigning tasks to them ("lay this brick right there"), making sure no one's work is interfering with anyone else's, etc.
So the moral of this analogy is: in general, use parallelization to split up the substantial units of work (like walls), but leave the insubstantial units (like bricks) to be handled in the usual sequential manner.
*For this reason, you can actually make a pretty good approximation of the performance gain of parallelization in a more work-intensive context by taking any fast-executing code and adding
Thread.Sleep(100)
(or some other random number) to the end of it. Suddenly sequential executions of this code will be slowed down by 100 ms per iteration, while parallel executions will be slowed significantly less.其他人指出了您的基准测试中的一些缺陷。这是一个简短的控制台应用程序,使其更简单:
编译:
我的四核 i7 笔记本电脑上的结果;最多可快速运行 2 个核心,或较慢地运行 4 个核心。基本上,
ParallelEnumerable.Repeat
获胜,其次是序列版本,最后是并行化正常的Enumerable.Repeat
。请注意,这个答案的早期版本由于元素数量错误而存在令人尴尬的缺陷 - 我对上面的结果更有信心。
Others have pointed out some flaws in your benchmarks. Here's a short console app to make it simpler:
Compilation:
Results on my quad core i7 laptop; runs up to 2 cores fast, or 4 cores more slowly. Basically
ParallelEnumerable.Repeat
wins, followed by the sequence version, followed by parallelising the normalEnumerable.Repeat
.Note that earlier versions of this answer were embarrassingly flawed by having the wrong number of elements - I'm much more confident in the results above.
您是否有可能没有考虑 JIT 时间?您应该运行测试两次并丢弃第一组结果。
另外,您不应该使用 DateTime 来获取性能计时,请使用
Stopwatch
类:PLINQ 确实会为序列的处理增加一些开销。但你的情况的巨大差异似乎太大了。当在多个内核/CPU 上运行逻辑的好处超过了开销成本时,PLINQ 就有意义了。如果没有多个核心,并行运行处理不会提供真正的优势 - PLINQ 应该检测到这种情况并按顺序执行处理。
编辑:创建此类嵌入式性能测试时,您应该确保不在调试器下运行它们,或者启用 Intellitrace,因为这些可能会严重影响性能时序。
Is it possible you are not taking into account JIT time? You should run your test twice and discard the first set of results.
Also, you shouldn't use DateTime to get performance timing, use the
Stopwatch
class instead:PLINQ does add some overhead to the processing of a sequence. But the magnitute difference in your case seems excessive. PLINQ makes sense when the overhead cost is outweighed by the benefit of running the logic on multiple cores/CPUs. If you don't have multiple core, running processing in parallel offers no real advantage - and PLINQ should detect such a case and perform the processing sequentially.
EDIT: When creating embedded performance tests of this kind, you should make sure that you are not running them under the debugger, or with Intellitrace enabled, as those can significantly skew performance timings.
我没有看到提到的更重要的事情是 .AsParallel 根据使用的集合会有不同的性能。
在我的测试中,当不在 IEnumerable (
Enumerable.Repeat
) 上使用时,PLINQ 比 LINQ 更快:代码是用 VB 编写的,但提供来表明使用 .ToArray 制作了 PLINQ 版本速度快几倍
以不同的顺序运行测试会产生一些不同的结果,因此将它们放在一行中可以让我更轻松地上下移动它们。
Something more important that I didn't see mentioned is that .AsParallel will have different performance depending on the collection used.
In my tests PLINQ is faster than LINQ when NOT used on IEnumerable (
Enumerable.Repeat
) :Code is in VB, but provided to show that using .ToArray made the PLINQ version few times faster
Running the tests in different order will have a bit different results, so having them in one line makes moving them up and down a bit easier for me.
确实可能是这种情况,因为您正在增加上下文切换的数量,并且您没有执行任何有利于线程等待 i/o 完成等操作的操作。如果您在单个 cpu 机箱中运行,情况会更糟。
That indeed may be the case because you are increasing the number of context switches and you are not performing any action that would benefit of having threads waiting for something like i/o completion. This is going to be even worse if you are running in a single cpu box.
我建议使用 Stopwatch 类来进行计时指标。在你的情况下,这是一个更好的间隔测量。
I'd recommend using the Stopwatch class for timing metrics. In your case it's a better measure of the interval.
请阅读本文的副作用部分。
http://msdn.microsoft.com/en-us/magazine/cc163329。 aspx
我认为您可能会遇到许多情况,其中 PLINQ 具有额外的数据处理模式,在您选择认为它总是纯粹具有更快的响应时间之前,您必须了解这些模式。
Please read the Side Effects section of this article.
http://msdn.microsoft.com/en-us/magazine/cc163329.aspx
I think you can run into many conditions where PLINQ has additional data processing patterns you must understand before you opt to think that is will always purely have faster response times.
贾斯汀关于开销的评论是完全正确的。
除了使用 PLINQ 之外,编写一般并发软件时还需要考虑一些问题:
您始终需要考虑工作项的“粒度”。有些问题非常适合并行化,因为它们可以在非常高的级别上进行“分块”,例如同时对整个帧进行光线跟踪(此类问题称为“尴尬并行”)。当存在非常大的工作“块”时,与您想要完成的实际工作相比,创建和管理多个线程的开销可以忽略不计。
PLINQ 使并发编程变得更加容易,但这并不意味着您可以忽略对工作粒度的思考。
Justin's comment about overhead is exactly right.
Just something to consider when writing concurrent software in general, beyond the use of PLINQ:
You always need to be thinking about the "granularity" of your work items. Some problems are very well suited to parallelization because they can be "chunked" at a very high level, like raytracing entire frames concurrently (these sorts of problems are called embarrassingly parallel). When there are very large "chunks" of work, then the overhead of creating and managing multiple threads becomes negligible compared to the actual work that you want to get done.
PLINQ makes concurrent programming easier, but it doesn't mean that you can ignore thinking about the granularity of your work.