Parallel.ForEach 比 foreach 慢

发布于 2024-11-08 01:23:18 字数 852 浏览 0 评论 0原文

代码如下:

using (var context = new AventureWorksDataContext())
{
    IEnumerable<Customer> _customerQuery = from c in context.Customers
                                           where c.FirstName.StartsWith("A")
                                           select c;

    var watch = new Stopwatch();
    watch.Start();

    var result = Parallel.ForEach(_customerQuery, c => Console.WriteLine(c.FirstName));

    watch.Stop();
    Debug.WriteLine(watch.ElapsedMilliseconds);

    watch = new Stopwatch();
    watch.Start();

    foreach (var customer in _customerQuery)
    {
        Console.WriteLine(customer.FirstName);
    }

    watch.Stop();
    Debug.WriteLine(watch.ElapsedMilliseconds);
}

问题是,Parallel.ForEach 大约需要 400 毫秒,而常规的 foreach 大约需要 40 毫秒。我到底做错了什么,为什么这没有像我预期的那样工作?

Here is the code:

using (var context = new AventureWorksDataContext())
{
    IEnumerable<Customer> _customerQuery = from c in context.Customers
                                           where c.FirstName.StartsWith("A")
                                           select c;

    var watch = new Stopwatch();
    watch.Start();

    var result = Parallel.ForEach(_customerQuery, c => Console.WriteLine(c.FirstName));

    watch.Stop();
    Debug.WriteLine(watch.ElapsedMilliseconds);

    watch = new Stopwatch();
    watch.Start();

    foreach (var customer in _customerQuery)
    {
        Console.WriteLine(customer.FirstName);
    }

    watch.Stop();
    Debug.WriteLine(watch.ElapsedMilliseconds);
}

The problem is, Parallel.ForEach takes about 400ms vs a regular foreach, which takes about 40ms. What exactly am I doing wrong and why doesn't this work as I expect it to?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(6

天赋异禀 2024-11-15 01:23:18

假设您有一项任务要执行。假设您是一名数学老师,有二十篇论文需要批改。给一篇论文评分需要两分钟,所以大约需要四十分钟。

现在假设您决定雇用一些助理来帮助您对论文进行评分。你需要一个小时才能找到四名助理。你们每人拿四份试卷,八分钟内就可以完成。您用 40 分钟的工作时间换来了总共 68 分钟的工作时间,其中包括寻找助理的额外时间,因此这并不是节省。寻找助手的开销比自己做这项工作的成本还要大。

现在假设您有两万篇论文要评分,那么大约需要 40000 分钟。现在,如果你花一个小时寻找助手,那就是一场胜利。你们每人做 4000 篇论文,总共需要 8060 分钟而不是 40000 分钟,节省了近 5 倍。寻找助手的开销基本上是无关紧要的。

并行化不是免费的与每个线程完成的工作量相比,在不同线程之间分配工作的成本需要很小。

进一步阅读:

阿姆达尔定律

给出了固定工作负载下任务执行延迟的理论加速,这是资源得到改善的系统所期望的。

古斯塔夫森定律

给出了固定执行时间任务执行延迟的理论加速,这是资源得到改善的系统所期望的。

Suppose you have a task to perform. Let's say you're a math teacher and you have twenty papers to grade. It takes you two minutes to grade a paper, so it's going to take you about forty minutes.

Now let's suppose that you decide to hire some assistants to help you grade papers. It takes you an hour to locate four assistants. You each take four papers and you are all done in eight minutes. You've traded 40 minutes of work for 68 total minutes of work including the extra hour to find the assistants, so this isn't a savings. The overhead of finding the assistants is larger than the cost of doing the work yourself.

Now suppose you have twenty thousand papers to grade, so it is going to take you about 40000 minutes. Now if you spend an hour finding assistants, that's a win. You each take 4000 papers and are done in a total of 8060 minutes instead of 40000 minutes, a savings of almost a factor of 5. The overhead of finding the assistants is basically irrelevant.

Parallelization is not free. The cost of splitting up work amongst different threads needs to be tiny compared to the amount of work done per thread.

Further reading:

Amdahl's law

Gives the theoretical speedup in latency of the execution of a task at fixed workload, that can be expected of a system whose resources are improved.

Gustafson's law

Gives the theoretical speedup in latency of the execution of a task at fixed execution time, that can be expected of a system whose resources are improved.

清风挽心 2024-11-15 01:23:18

您应该意识到的第一件事是,并非所有并行都是有益的。并行性会产生大量开销,并且该开销可能很大也可能不很大,具体取决于并行化的复杂性。由于并行函数中的工作非常小,因此并行性必须执行的管理开销变得很大,从而减慢了整体工作的速度。

The first thing you should realize is that not all parallelism is beneficial. There is an amount of overhead to parallelism, and this overhead may or may not be significant depending on the complexity what is being parallelized. Since the work in your parallel function is very small, the overhead of the management the parallelism has to do becomes significant, thus slowing down the overall work.

謌踐踏愛綪 2024-11-15 01:23:18

为可枚举 VS 仅仅执行可枚举创建所有线程的额外开销很可能是导致速度减慢的原因。 Parallel.ForEach 并不是一个全面提高性能的举措;需要权衡每个元素要完成的操作是否有可能发生阻塞。

例如,如果您要发出 Web 请求或其他内容,而不是简单地写入控制台,则并行版本可能会更快。事实上,简单地写入控制台是一个非常快的操作,因此创建线程和启动线程的开销会更慢。

The additional overhead of creating all the threads for your enumerable VS just executing the numerable is more than likely the cause for the slowdown. Parallel.ForEach is not a blanket performance increasing move; it needs to be weighed whether or not the operation that is to be completed for each element is likely to block.

For example, if you were to make a web request or something instead of simply writing to the console, the parallel version might be faster. As it is, simply writing to the console is a very fast operation, so the overhead of creating the threads and starting them is going to be slower.

往昔成烟 2024-11-15 01:23:18

正如之前的作者所说,Parallel.ForEach 会带来一些开销,但这并不是您看不到性能改进的原因。 Console.WriteLine 是一种同步操作,因此一次只有一个线程在工作。尝试将主体更改为非阻塞的东西,您将看到性能的提高(只要主体中的工作量足以超过开销)。

As previous writer has said there are some overhead associated with Parallel.ForEach, but that is not why you can't see your performance improvement. Console.WriteLine is a synchronous operation, so only one thread is working at a time. Try changing the body to something non-blocking and you will see the performance increase (as long as the amount of work in the body is big enough to outweight the overhead).

扛刀软妹 2024-11-15 01:23:18

我喜欢所罗门的回答,并想补充一点,您还有

  1. 分配代表的额外开销。
  2. 通过他们打电话。

I like salomons answer and would like to add that you also have additional overhead of

  1. Allocating delegates.
  2. Calling through them.
寄居人 2024-11-15 01:23:18

您还可以使用分区器,以便将任务分成一定大小的分区,以避免创建许多任务所产生的开销。

如何:加速小循环体

调整 Partitioner.Create 的第三个参数来决定分区的大小可能会帮助您获得更好的性能。就我而言,我尝试将其设置为 2 个分区(分区大小 = (总元素 / 2) + 1),并获得了比使用 foreach 循环执行简单任务稍好的性能(好 10%)。

请记住,对于非常简单的任务(例如您的情况),这可能没有多大帮助,并且您的性能可能低于使用简单的 foreach,因为先前的答案指出了原因。

You can as well use a Partitioner so you break the task in sized partitions to avoid the overhead from the many tasks creation.

How to: Speed Up Small Loop Bodies

Tweaking the third parameter of Partitioner.Create to decide the size of your partitions may help you in attaining better performance. In my case, I tried setting it to 2 partitions (partitions size = (total elements / 2) + 1) and achieved a slightly better performance (10% better) than with a foreach loop for a simple task.

Bear in mind that for very simple tasks, like in your case, this may not help much and your performance may be lower than using a simple foreach as prior anwers pointed out why.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文