C# multithreading threadpool parallel-processing

将单线程应用程序迁移到多线程、并行执行、蒙特卡罗模拟

发布于 2024-07-26 23:23:32 字数 2356 浏览 3 评论 0原文

我的任务是采用现有的单线程蒙特卡洛模拟并优化它。这是 ac# 控制台应用程序，没有数据库访问，它从 csv 文件加载一次数据并在最后将其写出，所以它几乎只是 CPU 限制，也只使用大约 50mb 的内存。

我已经通过 Jetbrains dotTrace profiler 运行了它。总执行时间中大约 30% 用于生成均匀随机数，24% 将均匀随机数转换为正态分布随机数。

基本的算法是大量嵌套的for循环，以随机数调用和矩阵乘法为中心，每次迭代返回一个双精度值，该双精度值被添加到结果列表中，该列表定期排序和测试对于某些收敛标准（在总迭代计数的每 5% 的检查点处），如果可以接受，程序将跳出循环并写入结果，否则将继续到最后。

我希望开发人员权衡一下：

我应该使用 new Thread 还是 ThreadPool
我应该考虑 Microsoft Parallels Extension 库
我应该考虑 AForge.Net Parallel.For，http://code.google.com/p/aforge/< /a> 还有其他库吗？

上面的一些教程链接将非常受欢迎，因为我从未编写过任何并行或多线程代码。

生成大量正态分布随机数，然后使用它们的最佳策略。应用程序在此状态下永远不会使用统一随机数，它们始终会转换为正态分布，然后使用。
用于随机数生成的良好快速库（并行？）
当我进行并行时考虑内存，我需要多少额外的资源。

当前的应用程序需要 2 小时进行 500,000 次迭代，业务需要将其扩展到 3,000,000 次迭代，并且每天被调用多次，因此需要进行一些重大优化。

特别希望听到使用过 Microsoft Parallels Extension 或 AForge.Net Parallel 的人的意见，

这需要相当快地投入生产，因此 .net 4 beta 版已经发布，尽管我知道它内置了并发库，但我们可以在 .net 4 发布后考虑迁移到它。目前服务器有 .Net 2，我已提交升级到我的开发盒中的 .net 3.5 SP1 以供审核。

谢谢

更新

我刚刚尝试了 Parallel.For 实现，但它出现了一些奇怪的结果。单线程：

IRandomGenerator rnd = new MersenneTwister();
IDistribution dist = new DiscreteNormalDistribution(discreteNormalDistributionSize);
List<double> results = new List<double>();

for (int i = 0; i < CHECKPOINTS; i++)
{
 results.AddRange(Oblist.Simulate(rnd, dist, n));
}

至：

Parallel.For(0, CHECKPOINTS, i =>
        {
           results.AddRange(Oblist.Simulate(rnd, dist, n));
        });

在模拟内部有许多对 rnd.nextUniform() 的调用，我认为我得到了许多相同的值，这是否可能发生，因为现在是并行的？

另外，List AddRange 调用可能不是线程安全的问题吗？我认为这个

System.Threading.Collections.BlockingCollection 可能值得使用，但它只有一个 Add 方法，没有 AddRange，所以我必须查看结果并以线程安全的方式添加。使用过 Parallel 的人的任何见解。非常感谢。我暂时切换到 System.Random 进行调用，因为在使用 Mersenne Twister 实现调用 nextUniform 时遇到异常，也许它不是线程安全的某个数组索引超出范围......

原文

I've been tasked with taking an existing single threaded monte carlo simulation and optimising it. This is a c# console app, no db access it loads data once from a csv file and writes it out at the end, so it's pretty much just CPU bound, also only uses about 50mb of memory.

I've run it through Jetbrains dotTrace profiler. Of total execution time about 30% is generating uniform random numbers, 24% translating uniform random numbers to normally distributed random numbers.

The basic algorithm is a whole lot of nested for loops, with random number calls and matrix multiplication at the centre, each iteration returns a double which is added to a results list, this list is periodically sorted and tested for some convergence criteria (at check points every 5% of total iteration count) if acceptable the program breaks out of the loops and writes the results, else it proceeds to the end.

I'd like developers to weigh in on:

should I use new Thread v ThreadPool
should I look at the Microsoft Parallels Extension library
should I look at AForge.Net Parallel.For, http://code.google.com/p/aforge/ any other libraries?

Some links to tutorials on the above would be most welcome as I've never written any parallel or multi-threaded code.

best strategies for generating en-mass normally distributed random numbers, and then consuming these. Uniform random numbers are never used in this state by the app, they are always translated to normally distributed and then consumed.
good fast libraries (parallel?) for random number generation
memory considerations as I take this parallel, how much extra will I require.

Current app takes 2 hours for 500,000 iterations, business needs this to scale to 3,000,000 iterations and be called mulitple times a day so need some heavy optimisation.

Particulary would like to hear from people who have used Microsoft Parallels Extension or AForge.Net Parallel

This needs to be productionised fairly quickly so .net 4 beta is out even though I know it has concurrency libraries baked in, we can look at migrating to .net 4 later down the track once it's released. For the moment the server has .Net 2, I've submitted for review an upgrade to .net 3.5 SP1 which my dev box has.

Thanks

Update

I've just tried the Parallel.For implementation but it comes up with some weird results.
Single threaded:

IRandomGenerator rnd = new MersenneTwister();
IDistribution dist = new DiscreteNormalDistribution(discreteNormalDistributionSize);
List<double> results = new List<double>();

for (int i = 0; i < CHECKPOINTS; i++)
{
 results.AddRange(Oblist.Simulate(rnd, dist, n));
}

To:

Parallel.For(0, CHECKPOINTS, i =>
        {
           results.AddRange(Oblist.Simulate(rnd, dist, n));
        });

Inside simulate there are many calls to rnd.nextUniform(), I think I am getting many values that are the same, is this likely to happen because this is now parallel?

Also maybe issues with the List AddRange call not being thread safe? I see this

System.Threading.Collections.BlockingCollection might be worth using, but it only has an Add method no AddRange so I'd have to look over there results and add in a thread safe manner. Any insight from someone who has used Parallel.For much appreciated. I switched to the System.Random for my calls temporarily as I was getting an exception when calling nextUniform with my Mersenne Twister implementation, perhaps it wasn't thread safe a certain array was getting an index out of bounds....

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

静待花开 2024-08-02 23:23:32

首先，您需要了解为什么您认为使用多线程是一种优化 - 但事实上并非如此。仅当您有多个处理器时，使用多个线程才会使您的工作负载完成得更快，然后最多会比您拥有可用 CPU 的速度快很多倍（这称为加速比） >）。该工作并未按照传统意义上的“优化”进行（即工作量并未减少 - 事实上，对于多线程，由于线程开销，工作总量通常会增加）。

因此，在设计应用程序时，您必须找到可以并行或重叠方式完成的工作。可以并行生成随机数（通过在不同的 CPU 上运行多个 RNG），但这也会改变结果，因为您会获得不同的随机数。另一种选择是在一个 CPU 上生成随机数，而在不同的 CPU 上生成其他所有内容。这可以为您提供最大 3 的加速，因为 RNG 仍将按顺序运行，并且仍承担 30% 的负载。

因此，如果您进行这种并行化，最终会得到 3 个线程：线程 1 运行 RNG，线程 2 生成正态分布，线程 3 执行其余的模拟。

对于此架构，生产者-消费者架构是最重要的合适的。每个线程将从队列中读取其输入，并将其输出生成到另一个队列中。每个队列都应该是阻塞的，因此如果 RNG 线程落后，标准化线程将自动阻塞，直到有新的随机数可用。为了提高效率，我会跨线程传递 100 个（或更大）的随机数，以避免每个随机数同步。

对于这种方法，您不需要任何高级线程。只需使用常规线程类，没有池，没有库。您唯一需要的（不幸的是）不在标准库中的是阻塞 Queue 类（System.Collections 中的 Queue 类不好）。 Codeproject 提供了一种外观合理的实现；可能还有其他人。

First you need to understand why you think that using multiple threads is an optimization - when it is, in fact, not. Using multiple threads will make your workload complete faster only if you have multiple processors, and then at most as many times faster as you have CPUs available (this is called the speed-up). The work is not "optimized" in the traditional sense of the word (i.e. the amount of work isn't reduced - in fact, with multithreading, the total amount of work typically grows because of the threading overhead).

So in designing your application, you have to find pieces of work that can be done in a parallel or overlapping fashion. It may be possible to generate random numbers in parallel (by having multiple RNGs run on different CPUs), but that would also change the results, as you get different random numbers. Another option is have generation of the random numbers on one CPU, and everything else on different CPUs. This can give you a maximum speedup of 3, as the RNG will still run sequentially, and still take 30% of the load.

So if you go for this parallelization, you end up with 3 threads: thread 1 runs the RNG, thread 2 produces normal distribution, and thread 3 does the rest of the simulation.

For this architecture, a producer-consumer architecture is most appropriate. Each thread will read its input from a queue, and produce its output into another queue. Each queue should be blocking, so if the RNG thread falls behind, the normalization thread will automatically block until new random numbers are available. For efficiency, I would pass the random numbers in array of, say, 100 (or larger) across threads, to avoid synchronizations on every random number.

For this approach, you don't need any advanced threading. Just use regular thread class, no pool, no library. The only thing that you need that is (unfortunately) not in the standard library is a blocking Queue class (the Queue class in System.Collections is no good). Codeproject provides a reasonably-looking implementation of one; there are probably others.

回复收藏 0 原文

眼藏柔 2024-08-02 23:23:32

List 绝对不是线程安全的。请参阅 System.Collections.Generic.List 文档中的“线程安全”部分< /a>. 原因是性能：添加线程安全性并不是免费的。

您的随机数实现也不是线程安全的；在这种情况下，多次获得相同的数字正是您所期望的。让我们使用以下 rnd.NextUniform() 简化模型来了解发生了什么：

计算伪随机数
对象的当前状态
更新对象的状态，因此
下一次调用产生不同的数字
返回伪随机数

现在，如果两个线程并行执行此方法，可能会发生如下情况：

线程 A 计算一个随机数
如步骤 1 所示。
线程 B 计算随机数
如步骤 1 所示。线程 A 尚未
更新了对象的状态，所以
结果是一样的。
线程A更新了线程A的状态
对象如步骤 2 所示。
线程 B 更新对象的状态
与步骤 2 中一样的对象，践踏 A 的状态
改变或可能给予相同的
结果。

正如您所看到的，您可以用来证明 rnd.NextUniform() 有效的任何推理都不再有效，因为两个线程相互干扰。更糟糕的是，像这样的错误取决于时间，并且在某些工作负载或某些系统上很少会出现“故障”。调试噩梦！

一种可能的解决方案是消除状态共享：为每个任务提供自己的随机数生成器，并使用另一个种子进行初始化（假设实例不以某种方式通过静态字段共享状态）。

另一个（较差的）解决方案是在您的 MersenneTwister 类中创建一个包含锁对象的字段，如下所示：

private object lockObject = new object();

然后在您的 MersenneTwister.NextUniform()< 中使用此锁/code> 实现：

public double NextUniform()
{
   lock(lockObject)
   {
      // original code here
   }
}

这将阻止两个线程并行执行 NextUniform() 方法。 Parallel.For 中的列表问题可以通过类似的方式解决：将 Simulate 调用和 AddRange 调用分开，然后在 AddRange 调用周围添加锁定。

我的建议：如果可能的话，避免在并行任务之间共享任何可变状态（例如 RNG 状态）。如果不共享可变状态，则不会发生线程问题。这也避免了锁定瓶颈：您不希望“并行”任务等待根本不并行工作的单个随机数生成器。特别是如果 30% 的时间用于获取随机数。

将状态共享和锁定限制在无法避免的地方，例如聚合并行执行的结果时（如在 AddRange 调用中）。

List<double> is definitely not thread-safe. See the section "thread safety" in the System.Collections.Generic.List documentation. The reason is performance: adding thread safety is not free.

Your random number implementation also isn't thread-safe; getting the same numbers multiple times is exactly what you'd expect in this case. Let's use the following simplified model of rnd.NextUniform() to understand what is happening:

calculate pseudo-random number from
the current state of the object
update state of the object so the
next call yields a different number
return the pseudo-random number

Now, if two threads execute this method in parallel, something like this may happen:

Thread A calculates a random number
as in step 1.
Thread B calculates a random number
as in step 1. Thread A has not yet
updated the state of the object, so
the result is the same.
Thread A updates the state of the
object as in step 2.
Thread B updates the state of the
object as in step 2, trampling over A's state
changes or maybe giving the same
result.

As you can see, any reasoning you can do to prove that rnd.NextUniform() works is no longer valid because two threads are interfering with each other. Worse, bugs like this depend on timing and may appear only rarely as "glitches" under certain workloads or on certain systems. Debugging nightmare!

One possible solution is to eliminate the state sharing: give each task its own random number generator initialized with another seed (assuming that instances are not sharing state through static fields in some way).

Another (inferior) solution is to create a field holding a lock object in your MersenneTwister class like this:

private object lockObject = new object();

Then use this lock in your MersenneTwister.NextUniform() implementation:

public double NextUniform()
{
   lock(lockObject)
   {
      // original code here
   }
}

This will prevent two threads from executing the NextUniform() method in parallel. The problem with the list in your Parallel.For can be addressed in a similar manner: separate the Simulate call and the AddRange call, and then add locking around the AddRange call.

My recommendation: avoid sharing any mutable state (like the RNG state) between parallel tasks if at all possible. If no mutable state is shared, no threading issues occur. This also avoids locking bottlenecks: you don't want your "parallel" tasks to wait on a single random number generator that doesn't work in parallel at all. Especially if 30% of the time is spend acquiring random numbers.

Limit state sharing and locking to places where you can't avoid it, like when aggregating the results of parallel execution (as in your AddRange calls).

回复收藏 0 原文