当前位置：文江博客话题详情

我应该始终使用 Parallel.Foreach 因为更多线程必须加快一切速度吗？

发布于 2024-10-02 01:21:42 字数 106 浏览 0 评论 0原文

对每个正常的 foreach 使用 parallel.foreach 循环对您来说有意义吗？

我什么时候应该开始使用parallel.foreach，只迭代1,000,000个项目？

原文

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

度的依靠╰つ 2024-10-09 01:21:42

不，这对每个 foreach 都没有意义。一些原因：

您的代码可能实际上不是可并行的。例如，如果您在下一次迭代中使用“到目前为止的结果”并且顺序很重要）
如果您正在聚合（例如求和值），则可以使用 Parallel.ForEach为此，但你不应该盲目地这样做。
如果你的工作无论如何都会很快完成，那就没有任何好处，而且很可能会减慢速度

。基本上，线程中的任何事情都不应该盲目地完成。想想并行化实际上在哪里有意义。哦，还要衡量影响，以确保所带来的好处值得增加的复杂性。（调试之类的事情将会变得更加困难。）TPL 很棒，但它不是免费的午餐。

回复收藏 0 原文

苦行僧 2024-10-09 01:21:42

简短的答案是不，您不应该只在每个循环上使用Parallel.ForEach或相关的构造。
并行有一些开销，这在迭代次数少且快速的循环中是不合理的。此外，这些循环内的 break 明显更加复杂。

Parallel.ForEach 是根据循环中的迭代次数、硬件上的 CPU 核心数以及该硬件上的当前负载，按照任务调度程序认为合适的方式调度循环的请求。实际的并行执行并不总是得到保证，并且如果内核较少、迭代次数较低和/或当前负载较高，则不太可能保证并行执行。

另请参阅 Parallel.ForEach 是否限制活动线程的数量？< /a> 和是否并行。每次迭代使用一个任务？

长答案：

我们可以根据循环在两个轴上的分布情况对循环进行分类：

很少迭代到多次迭代。
每次迭代都从快到慢。

第三个因素是任务的持续时间是否变化很大——例如，如果您正在计算 Mandelbrot 集上的点，有些点计算起来很快，有些则需要更长的时间。

当迭代很少且快速时，可能不值得以任何方式使用并行化，很可能由于开销而最终会变慢。即使并行化确实加速了特定的小而快速的循环，也不太可能引起人们的兴趣：收益很小，并且它不是应用程序的性能瓶颈，因此要针对可读性而不是性能进行优化。

如果循环的迭代次数很少且缓慢，并且您需要更多控制，则可以考虑使用任务来处理它们，如下所示：

var tasks = new List<Task>(actions.Length); 
foreach(var action in actions) 
{ 
    tasks.Add(Task.Factory.StartNew(action)); 
} 
Task.WaitAll(tasks.ToArray());

如果迭代次数较多，则 Parallel.ForEach 位于其元素中。

Microsoft 文档指出

当并行循环运行时，TPL 会对数据源进行分区，以便
循环可以同时对多个部分进行操作。后面的
场景中，任务调度器根据系统对任务进行分区
资源和工作量。如果可能的话，调度程序会重新分配
如果工作负载变为，则在多个线程和处理器之间工作
不平衡。

随着循环迭代次数的减少，这种分区和动态重新调度将更难有效地进行，并且如果迭代的持续时间不同并且存在在同一台机器上运行的其他任务，则更有必要进行。

我运行了一些代码。

下面的测试结果显示机器上没有运行其他任何东西，并且没有使用 .Net 线程池中的其他线程。这并不典型（事实上，在 Web 服务器场景中这是非常不现实的）。在实践中，您可能看不到少量迭代的任何并行化。

测试代码为：

namespace ParallelTests 
{ 
    class Program 
    { 
        private static int Fibonacci(int x) 
        { 
            if (x <= 1) 
            { 
                return 1; 
            } 
            return Fibonacci(x - 1) + Fibonacci(x - 2); 
        } 

        private static void DummyWork() 
        { 
            var result = Fibonacci(10); 
            // inspect the result so it is no optimised away. 
            // We know that the exception is never thrown. The compiler does not. 
            if (result > 300) 
            { 
                throw new Exception("failed to to it"); 
            } 
        } 

        private const int TotalWorkItems = 2000000; 

        private static void SerialWork(int outerWorkItems) 
        { 
            int innerLoopLimit = TotalWorkItems / outerWorkItems; 
            for (int index1 = 0; index1 < outerWorkItems; index1++) 
            { 
                InnerLoop(innerLoopLimit); 
            } 
        } 

        private static void InnerLoop(int innerLoopLimit) 
        { 
            for (int index2 = 0; index2 < innerLoopLimit; index2++) 
            { 
                DummyWork(); 
            } 
        } 

        private static void ParallelWork(int outerWorkItems) 
        { 
            int innerLoopLimit = TotalWorkItems / outerWorkItems; 
            var outerRange = Enumerable.Range(0, outerWorkItems); 
            Parallel.ForEach(outerRange, index1 => 
            { 
                InnerLoop(innerLoopLimit); 
            }); 
        } 

        private static void TimeOperation(string desc, Action operation) 
        { 
            Stopwatch timer = new Stopwatch(); 
            timer.Start(); 
            operation(); 
            timer.Stop(); 

            string message = string.Format("{0} took {1:mm}:{1:ss}.{1:ff}", desc, timer.Elapsed); 
            Console.WriteLine(message); 
        } 

        static void Main(string[] args) 
        { 
            TimeOperation("serial work: 1", () => Program.SerialWork(1)); 
            TimeOperation("serial work: 2", () => Program.SerialWork(2)); 
            TimeOperation("serial work: 3", () => Program.SerialWork(3)); 
            TimeOperation("serial work: 4", () => Program.SerialWork(4)); 
            TimeOperation("serial work: 8", () => Program.SerialWork(8)); 
            TimeOperation("serial work: 16", () => Program.SerialWork(16)); 
            TimeOperation("serial work: 32", () => Program.SerialWork(32)); 
            TimeOperation("serial work: 1k", () => Program.SerialWork(1000)); 
            TimeOperation("serial work: 10k", () => Program.SerialWork(10000)); 
            TimeOperation("serial work: 100k", () => Program.SerialWork(100000)); 

            TimeOperation("parallel work: 1", () => Program.ParallelWork(1)); 
            TimeOperation("parallel work: 2", () => Program.ParallelWork(2)); 
            TimeOperation("parallel work: 3", () => Program.ParallelWork(3)); 
            TimeOperation("parallel work: 4", () => Program.ParallelWork(4)); 
            TimeOperation("parallel work: 8", () => Program.ParallelWork(8)); 
            TimeOperation("parallel work: 16", () => Program.ParallelWork(16)); 
            TimeOperation("parallel work: 32", () => Program.ParallelWork(32)); 
            TimeOperation("parallel work: 64", () => Program.ParallelWork(64)); 
            TimeOperation("parallel work: 1k", () => Program.ParallelWork(1000)); 
            TimeOperation("parallel work: 10k", () => Program.ParallelWork(10000)); 
            TimeOperation("parallel work: 100k", () => Program.ParallelWork(100000)); 

            Console.WriteLine("done"); 
            Console.ReadLine(); 
        } 
    } 
}

在 4 核 Windows 7 机器上的结果为：

serial work: 1 took 00:02.31 
serial work: 2 took 00:02.27 
serial work: 3 took 00:02.28 
serial work: 4 took 00:02.28 
serial work: 8 took 00:02.28 
serial work: 16 took 00:02.27 
serial work: 32 took 00:02.27 
serial work: 1k took 00:02.27 
serial work: 10k took 00:02.28 
serial work: 100k took 00:02.28 

parallel work: 1 took 00:02.33 
parallel work: 2 took 00:01.14 
parallel work: 3 took 00:00.96 
parallel work: 4 took 00:00.78 
parallel work: 8 took 00:00.84 
parallel work: 16 took 00:00.86 
parallel work: 32 took 00:00.82 
parallel work: 64 took 00:00.80 
parallel work: 1k took 00:00.77 
parallel work: 10k took 00:00.78 
parallel work: 100k took 00:00.77 
done

运行在 .Net 4 和 .Net 4.5 中编译的代码给出了大致相同的结果。

串行工作运行都是相同的。无论你如何切片，它的运行时间大约为 2.28 秒。

具有 1 次迭代的并行工作比完全没有并行稍微长一些。 2 个项目更短，3 个项目也更短，并且 4 次或更多迭代的时间都约为 0.8 秒。

它使用所有核心，但效率并未达到 100%。如果串行工作在没有开销的情况下被分为 4 种方式，它将在 0.57 秒内完成 (2.28 / 4 = 0.57)。

在其他场景中，我发现并行 2-3 次迭代根本没有加速。您无法使用 Parallel.ForEach 对其进行细粒度控制，并且如果机器繁忙，算法可能会决定将它们“分区”为 1 个块并在 1 个核心上运行。

The short answer is no, you should not just use Parallel.ForEach or related constructs on each loop that you can.
Parallel has some overhead, which is not justified in loops with few, fast iterations. Also, break is significantly more complex inside these loops.

Parallel.ForEach is a request to schedule the loop as the task scheduler sees fit, based on number of iterations in the loop, number of CPU cores on the hardware and current load on that hardware. Actual parallel execution is not always guaranteed, and is less likely if there are fewer cores, the number of iterations is low and/or the current load is high.

The long answer:

We can classify loops by how they fall on two axes:

Few iterations through to many iterations.
Each iteration is fast through to each iteration is slow.

A third factor is if the tasks vary in duration very much – for instance if you are calculating points on the Mandelbrot set, some points are quick to calculate, some take much longer.

When there are few, fast iterations it's probably not worth using parallelisation in any way, most likely it will end up slower due to the overheads. Even if parallelisation does speed up a particular small, fast loop, it's unlikely to be of interest: the gains will be small and it's not a performance bottleneck in your application so optimise for readability not performance.

Where a loop has very few, slow iterations and you want more control, you may consider using Tasks to handle them, along the lines of:

var tasks = new List<Task>(actions.Length); 
foreach(var action in actions) 
{ 
    tasks.Add(Task.Factory.StartNew(action)); 
} 
Task.WaitAll(tasks.ToArray());

Where there are many iterations, Parallel.ForEach is in its element.

The Microsoft documentation states that

When a parallel loop runs, the TPL partitions the data source so that
the loop can operate on multiple parts concurrently. Behind the
scenes, the Task Scheduler partitions the task based on system
resources and workload. When possible, the scheduler redistributes
work among multiple threads and processors if the workload becomes
unbalanced.

This partitioning and dynamic re-scheduling is going to be harder to do effectively as the number of loop iterations decreases, and is more necessary if the iterations vary in duration and in the presence of other tasks running on the same machine.

I ran some code.

The test results below show a machine with nothing else running on it, and no other threads from the .Net Thread Pool in use. This is not typical (in fact in a web-server scenario it is wildly unrealistic). In practice, you may not see any parallelisation with a small number of iterations.

The test code is:

namespace ParallelTests 
{ 
    class Program 
    { 
        private static int Fibonacci(int x) 
        { 
            if (x <= 1) 
            { 
                return 1; 
            } 
            return Fibonacci(x - 1) + Fibonacci(x - 2); 
        } 

        private static void DummyWork() 
        { 
            var result = Fibonacci(10); 
            // inspect the result so it is no optimised away. 
            // We know that the exception is never thrown. The compiler does not. 
            if (result > 300) 
            { 
                throw new Exception("failed to to it"); 
            } 
        } 

        private const int TotalWorkItems = 2000000; 

        private static void SerialWork(int outerWorkItems) 
        { 
            int innerLoopLimit = TotalWorkItems / outerWorkItems; 
            for (int index1 = 0; index1 < outerWorkItems; index1++) 
            { 
                InnerLoop(innerLoopLimit); 
            } 
        } 

        private static void InnerLoop(int innerLoopLimit) 
        { 
            for (int index2 = 0; index2 < innerLoopLimit; index2++) 
            { 
                DummyWork(); 
            } 
        } 

        private static void ParallelWork(int outerWorkItems) 
        { 
            int innerLoopLimit = TotalWorkItems / outerWorkItems; 
            var outerRange = Enumerable.Range(0, outerWorkItems); 
            Parallel.ForEach(outerRange, index1 => 
            { 
                InnerLoop(innerLoopLimit); 
            }); 
        } 

        private static void TimeOperation(string desc, Action operation) 
        { 
            Stopwatch timer = new Stopwatch(); 
            timer.Start(); 
            operation(); 
            timer.Stop(); 

            string message = string.Format("{0} took {1:mm}:{1:ss}.{1:ff}", desc, timer.Elapsed); 
            Console.WriteLine(message); 
        } 

        static void Main(string[] args) 
        { 
            TimeOperation("serial work: 1", () => Program.SerialWork(1)); 
            TimeOperation("serial work: 2", () => Program.SerialWork(2)); 
            TimeOperation("serial work: 3", () => Program.SerialWork(3)); 
            TimeOperation("serial work: 4", () => Program.SerialWork(4)); 
            TimeOperation("serial work: 8", () => Program.SerialWork(8)); 
            TimeOperation("serial work: 16", () => Program.SerialWork(16)); 
            TimeOperation("serial work: 32", () => Program.SerialWork(32)); 
            TimeOperation("serial work: 1k", () => Program.SerialWork(1000)); 
            TimeOperation("serial work: 10k", () => Program.SerialWork(10000)); 
            TimeOperation("serial work: 100k", () => Program.SerialWork(100000)); 

            TimeOperation("parallel work: 1", () => Program.ParallelWork(1)); 
            TimeOperation("parallel work: 2", () => Program.ParallelWork(2)); 
            TimeOperation("parallel work: 3", () => Program.ParallelWork(3)); 
            TimeOperation("parallel work: 4", () => Program.ParallelWork(4)); 
            TimeOperation("parallel work: 8", () => Program.ParallelWork(8)); 
            TimeOperation("parallel work: 16", () => Program.ParallelWork(16)); 
            TimeOperation("parallel work: 32", () => Program.ParallelWork(32)); 
            TimeOperation("parallel work: 64", () => Program.ParallelWork(64)); 
            TimeOperation("parallel work: 1k", () => Program.ParallelWork(1000)); 
            TimeOperation("parallel work: 10k", () => Program.ParallelWork(10000)); 
            TimeOperation("parallel work: 100k", () => Program.ParallelWork(100000)); 

            Console.WriteLine("done"); 
            Console.ReadLine(); 
        } 
    } 
}

the results on a 4-core Windows 7 machine are:

serial work: 1 took 00:02.31 
serial work: 2 took 00:02.27 
serial work: 3 took 00:02.28 
serial work: 4 took 00:02.28 
serial work: 8 took 00:02.28 
serial work: 16 took 00:02.27 
serial work: 32 took 00:02.27 
serial work: 1k took 00:02.27 
serial work: 10k took 00:02.28 
serial work: 100k took 00:02.28 

parallel work: 1 took 00:02.33 
parallel work: 2 took 00:01.14 
parallel work: 3 took 00:00.96 
parallel work: 4 took 00:00.78 
parallel work: 8 took 00:00.84 
parallel work: 16 took 00:00.86 
parallel work: 32 took 00:00.82 
parallel work: 64 took 00:00.80 
parallel work: 1k took 00:00.77 
parallel work: 10k took 00:00.78 
parallel work: 100k took 00:00.77 
done

Running code Compiled in .Net 4 and .Net 4.5 give much the same results.

The serial work runs are all the same. It doesn't matter how you slice it, it runs in about 2.28 seconds.

The parallel work with 1 iteration is slightly longer than no parallelism at all. 2 items is shorter, so is 3 and with 4 or more iterations is all about 0.8 seconds.

It is using all cores, but not with 100% efficiency. If the serial work was divided 4 ways with no overhead it would complete in 0.57 seconds (2.28 / 4 = 0.57).

In other scenarios I saw no speed-up at all with parallel 2-3 iterations. You do not have fine-grained control over that with Parallel.ForEach and the algorithm may decide to "partition " them into just 1 chunk and run it on 1 core if the machine is busy.

回复收藏 0 原文

绅士风度i 2024-10-09 01:21:42

不，你绝对不应该这样做。这里重要的一点并不是迭代次数，而是要完成的工作。如果您的工作非常简单，并行执行 1000000 个委托将增加巨大的开销，并且很可能比传统的单线程解决方案慢。您可以通过对数据进行分区来解决这个问题，这样您就可以执行大量的工作。

例如，考虑下面的情况：

Input = Enumerable.Range(1, Count).ToArray();
Result = new double[Count];

Parallel.ForEach(Input, (value, loopState, index) => { Result[index] = value*Math.PI; });

这里的操作非常简单，并行执行此操作的开销将使使用多个内核的增益相形见绌。此代码的运行速度比常规 foreach 循环慢得多。

通过使用分区，我们可以减少开销并实际观察到性能的提高。

Parallel.ForEach(Partitioner.Create(0, Input.Length), range => {
   for (var index = range.Item1; index < range.Item2; index++) {
      Result[index] = Input[index]*Math.PI;
   }
});

这个故事的主旨是并行性很困难，只有在仔细观察当前情况后才能使用它。此外，您应该在添加并行性之前和之后分析代码。

请记住，无论有任何潜在的性能增益，并行性总是会增加代码的复杂性，因此如果性能已经足够好，则没有理由增加复杂性。

No, you should definitely not do that. The important point here is not really the number of iterations, but the work to be done. If your work is really simple, executing 1000000 delegates in parallel will add a huge overhead and will most likely be slower than a traditional single threaded solution. You can get around this by partitioning the data, so you execute chunks of work instead.

E.g. consider the situation below:

Input = Enumerable.Range(1, Count).ToArray();
Result = new double[Count];

Parallel.ForEach(Input, (value, loopState, index) => { Result[index] = value*Math.PI; });

The operation here is so simple, that the overhead of doing this in parallel will dwarf the gain of using multiple cores. This code runs significantly slower than a regular foreach loop.

By using a partition we can reduce the overhead and actually observe a gain in performance.

Parallel.ForEach(Partitioner.Create(0, Input.Length), range => {
   for (var index = range.Item1; index < range.Item2; index++) {
      Result[index] = Input[index]*Math.PI;
   }
});

The morale of the story here is that parallelism is hard and you should only employ this after looking closely at the situation at hand. Additionally, you should profile the code both before and after adding parallelism.

Remember that regardless of any potential performance gain parallelism always adds complexity to the code, so if the performance is already good enough, there's little reason to add the complexity.

回复收藏 0 原文

苏辞 2024-10-09 01:21:42

并行操作没有下限。如果您只有 2 个项目需要处理，但每个项目都需要一段时间，那么使用 Parallel.ForEach 可能仍然有意义。另一方面，如果您有 1000000 个项目，但它们没有做太多事情，则并行循环可能不会比常规循环快。

例如，我编写了一个简单的程序来计时嵌套循环，其中外部循环使用 for 循环和 Parallel.ForEach 运行。我在我的 4-CPU（双核、超线程）笔记本电脑上进行了计时。

这是一个只有 2 个项目需要处理的运行，但每个都需要一段时间：

2 outer iterations, 100000000 inner iterations:
for loop: 00:00:00.1460441
ForEach : 00:00:00.0842240

这是一个有数百万个项目需要处理的运行，但它们做的并不多：

100000000 outer iterations, 2 inner iterations:
for loop: 00:00:00.0866330
ForEach : 00:00:02.1303315

唯一真正了解的方法就是尝试。

There is no lower limit for doing parallel operations. If you have only 2 items to work on but each one will take a while, it might still make sense to use Parallel.ForEach. On the other hand if you have 1000000 items but they don't do very much, the parallel loop might not go any faster than the regular loop.

For example, I wrote a simple program to time nested loops where the outer loop ran both with a for loop and with Parallel.ForEach. I timed it on my 4-CPU (dual-core, hyperthreaded) laptop.

Here's a run with only 2 items to work on, but each takes a while:

2 outer iterations, 100000000 inner iterations:
for loop: 00:00:00.1460441
ForEach : 00:00:00.0842240

Here's a run with millions of items to work on, but they don't do very much:

100000000 outer iterations, 2 inner iterations:
for loop: 00:00:00.0866330
ForEach : 00:00:02.1303315

The only real way to know is to try it.

回复收藏 0 原文

沫离伤花 2024-10-09 01:21:42

一般来说，一旦每个核心超过一个线程，操作中涉及的每个额外线程都会使其变慢，而不是更快。

然而，如果每个操作的一部分会阻塞（典型的例子是等待磁盘或网络 I/O，另一个是彼此不同步的生产者和消费者），那么比核心更多的线程可以开始再次加速，因为任务可以完成，而其他线程在 I/O 操作返回之前无法取得进展。

因此，当单核机器成为常态时，多线程唯一真正的理由是存在 I/O 引入的阻塞或提高响应能力（执行任务稍慢，但速度更快）再次开始响应用户输入）。

尽管如此，如今单核机器越来越罕见，因此看来您应该能够通过并行处理使所有事情的速度至少提高两倍。

如果顺序很重要，或者任务固有的某些因素迫使它具有同步瓶颈，或者如果操作数量太小以至于并行处理所带来的速度增加超过了所涉及的开销，则情况仍然不会如此。设置并行处理。如果共享资源需要线程阻塞执行相同并行操作的其他线程（取决于锁争用的程度），则可能会出现这种情况，也可能不会出现这种情况。

另外，如果您的代码本质上是多线程的，那么您可能会陷入本质上与自己竞争资源的情况（典型的情况是处理并发请求的 ASP.NET 代码）。这里并行操作的优势可能意味着在 4 核机器上的单个测试操作接近 4 倍的性能，但是一旦需要执行相同任务的请求数量达到 4，那么由于这 4 个请求中的每一个都尝试使用每个核心，它比每个核心都有一个核心好不了多少（也许稍微好一点，也许稍微差一点）。因此，随着使用从单个请求测试变为现实世界中的多个请求，并行操作的好处就消失了。

In general, once you go above a thread per core, each extra thread involved in an operation will make it slower, not faster.

However, if part of each operation will block (the classic example being waiting on disk or network I/O, another being producers and consumers that are out of synch with each other) then more threads than cores can begin to speed things up again, because tasks can be done while other threads are unable to make progress until the I/O operation returns.

For this reason, when single-core machines were the norm, the only real justifications in multi-threading were when either there was blocking of the sort I/O introduces or else to improve responsiveness (slightly slower to perform a task, but much quicker to start responding to user-input again).

Still, these days single-core machines are increasingly rare, so it would appear that you should be able to make everything at least twice as fast with parallel processing.

This will still not be the case if order is important, or something inherent to the task forces it to have a synchronised bottleneck, or if the number of operations is so small that the increase in speed from parallel processing is outweighed by the overheads involved in setting up that parallel processing. It may or may not be the case if a share resource requires threads to block on other threads performing the same parallel operation (depending on the degree of lock contention).

Also, if your code is inherently multithreaded to begin with, you can be in a situation where you are essentially competing for resources with yourself (a classic case being ASP.NET code handling simultaneous requests). Here the advantage in parallel operation may mean that a single test operation on a 4-core machine approaches 4 times the performance, but once the number of requests needing the same task to be performed reaches 4, then since each of those 4 requests are each trying to use each core, it becomes little better than if they had a core each (perhaps slightly better, perhaps slightly worse). The benefits of parallel operation hence disappears as the use changes from a single-request test to a real-world multitude of requests.

回复收藏 0 原文

默嘫て 2024-10-09 01:21:42

您不应该盲目地将应用程序中的每个 foreach 循环替换为并行 foreach。更多线程并不一定意味着您的应用程序会运行得更快。如果您想真正从多线程中受益，您需要将任务分割成可以并行运行的较小任务。如果您的算法不可并行化，您将不会获得任何好处。

回复收藏 0 原文

伴梦长久 2024-10-09 01:21:42

不。您需要了解代码在做什么以及它是否适合并行化。数据项之间的依赖性可能会使并行化变得困难，即，如果线程使用为前一个元素计算的值，则它必须等到该值计算出来并且无法并行运行。不过，您还需要了解您的目标架构，现在您购买的几乎所有产品上通常都会配备多核 CPU。即使在单核上，您也可以从更多线程中获得一些好处，但前提是您有一些阻塞任务。您还应该记住，创建和组织并行线程会产生开销。如果此开销占任务所需时间的很大一部分（或更多），您可能会减慢速度。

回复收藏 0 原文

夜血缘 2024-10-09 01:21:42

这些是我的基准测试，显示纯串行以及各种级别的分区是最慢的。

class Program
{
    static void Main(string[] args)
    {
        NativeDllCalls(true, 1, 400000000, 0);  // Seconds:     0.67 |)   595,203,995.01 ops
        NativeDllCalls(true, 1, 400000000, 3);  // Seconds:     0.91 |)   439,052,826.95 ops
        NativeDllCalls(true, 1, 400000000, 4);  // Seconds:     0.80 |)   501,224,491.43 ops
        NativeDllCalls(true, 1, 400000000, 8);  // Seconds:     0.63 |)   635,893,653.15 ops
        NativeDllCalls(true, 4, 100000000, 0);  // Seconds:     0.35 |) 1,149,359,562.48 ops
        NativeDllCalls(true, 400, 1000000, 0);  // Seconds:     0.24 |) 1,673,544,236.17 ops
        NativeDllCalls(true, 10000, 40000, 0);  // Seconds:     0.22 |) 1,826,379,772.84 ops
        NativeDllCalls(true, 40000, 10000, 0);  // Seconds:     0.21 |) 1,869,052,325.05 ops
        NativeDllCalls(true, 1000000, 400, 0);  // Seconds:     0.24 |) 1,652,797,628.57 ops
        NativeDllCalls(true, 100000000, 4, 0);  // Seconds:     0.31 |) 1,294,424,654.13 ops
        NativeDllCalls(true, 400000000, 0, 0);  // Seconds:     1.10 |)   364,277,890.12 ops
    }


static void NativeDllCalls(bool useStatic, int nonParallelIterations, int parallelIterations = 0, int maxParallelism = 0)
{
    if (useStatic) {
        Iterate<string, object>(
            (msg, cntxt) => { 
                ServiceContracts.ForNativeCall.SomeStaticCall(msg); 
            }
            , "test", null, nonParallelIterations,parallelIterations, maxParallelism );
    }
    else {
        var instance = new ServiceContracts.ForNativeCall();
        Iterate(
            (msg, cntxt) => {
                cntxt.SomeCall(msg);
            }
            , "test", instance, nonParallelIterations, parallelIterations, maxParallelism);
    }
}

static void Iterate<T, C>(Action<T, C> action, T testMessage, C context, int nonParallelIterations, int parallelIterations=0, int maxParallelism= 0)
{
    var start = DateTime.UtcNow;            
    if(nonParallelIterations == 0)
        nonParallelIterations = 1; // normalize values

    if(parallelIterations == 0)
        parallelIterations = 1; 

    if (parallelIterations > 1) {                    
        ParallelOptions options;
        if (maxParallelism == 0) // default max parallelism
            options = new ParallelOptions();
        else
            options = new ParallelOptions { MaxDegreeOfParallelism = maxParallelism };

        if (nonParallelIterations > 1) {
            Parallel.For(0, parallelIterations, options
            , (j) => {
                for (int i = 0; i < nonParallelIterations; ++i) {
                    action(testMessage, context);
                }
            });
        }
        else { // no nonParallel iterations
            Parallel.For(0, parallelIterations, options
            , (j) => {                        
                action(testMessage, context);
            });
        }
    }
    else {
        for (int i = 0; i < nonParallelIterations; ++i) {
            action(testMessage, context);
        }
    }

    var end = DateTime.UtcNow;

    Console.WriteLine("\tSeconds: {0,8:0.00} |) {1,16:0,000.00} ops",
        (end - start).TotalSeconds, (Math.Max(parallelIterations, 1) * nonParallelIterations / (end - start).TotalSeconds));

}

}

These are my benchmarks showing pure serial is slowest, along with various levels of partitioning.

class Program
{
    static void Main(string[] args)
    {
        NativeDllCalls(true, 1, 400000000, 0);  // Seconds:     0.67 |)   595,203,995.01 ops
        NativeDllCalls(true, 1, 400000000, 3);  // Seconds:     0.91 |)   439,052,826.95 ops
        NativeDllCalls(true, 1, 400000000, 4);  // Seconds:     0.80 |)   501,224,491.43 ops
        NativeDllCalls(true, 1, 400000000, 8);  // Seconds:     0.63 |)   635,893,653.15 ops
        NativeDllCalls(true, 4, 100000000, 0);  // Seconds:     0.35 |) 1,149,359,562.48 ops
        NativeDllCalls(true, 400, 1000000, 0);  // Seconds:     0.24 |) 1,673,544,236.17 ops
        NativeDllCalls(true, 10000, 40000, 0);  // Seconds:     0.22 |) 1,826,379,772.84 ops
        NativeDllCalls(true, 40000, 10000, 0);  // Seconds:     0.21 |) 1,869,052,325.05 ops
        NativeDllCalls(true, 1000000, 400, 0);  // Seconds:     0.24 |) 1,652,797,628.57 ops
        NativeDllCalls(true, 100000000, 4, 0);  // Seconds:     0.31 |) 1,294,424,654.13 ops
        NativeDllCalls(true, 400000000, 0, 0);  // Seconds:     1.10 |)   364,277,890.12 ops
    }


static void NativeDllCalls(bool useStatic, int nonParallelIterations, int parallelIterations = 0, int maxParallelism = 0)
{
    if (useStatic) {
        Iterate<string, object>(
            (msg, cntxt) => { 
                ServiceContracts.ForNativeCall.SomeStaticCall(msg); 
            }
            , "test", null, nonParallelIterations,parallelIterations, maxParallelism );
    }
    else {
        var instance = new ServiceContracts.ForNativeCall();
        Iterate(
            (msg, cntxt) => {
                cntxt.SomeCall(msg);
            }
            , "test", instance, nonParallelIterations, parallelIterations, maxParallelism);
    }
}

static void Iterate<T, C>(Action<T, C> action, T testMessage, C context, int nonParallelIterations, int parallelIterations=0, int maxParallelism= 0)
{
    var start = DateTime.UtcNow;            
    if(nonParallelIterations == 0)
        nonParallelIterations = 1; // normalize values

    if(parallelIterations == 0)
        parallelIterations = 1; 

    if (parallelIterations > 1) {                    
        ParallelOptions options;
        if (maxParallelism == 0) // default max parallelism
            options = new ParallelOptions();
        else
            options = new ParallelOptions { MaxDegreeOfParallelism = maxParallelism };

        if (nonParallelIterations > 1) {
            Parallel.For(0, parallelIterations, options
            , (j) => {
                for (int i = 0; i < nonParallelIterations; ++i) {
                    action(testMessage, context);
                }
            });
        }
        else { // no nonParallel iterations
            Parallel.For(0, parallelIterations, options
            , (j) => {                        
                action(testMessage, context);
            });
        }
    }
    else {
        for (int i = 0; i < nonParallelIterations; ++i) {
            action(testMessage, context);
        }
    }

    var end = DateTime.UtcNow;

    Console.WriteLine("\tSeconds: {0,8:0.00} |) {1,16:0,000.00} ops",
        (end - start).TotalSeconds, (Math.Max(parallelIterations, 1) * nonParallelIterations / (end - start).TotalSeconds));

}

}

回复收藏 0 原文

~没有更多了~