GPU 上的蒙特卡罗
今天我和一个朋友谈话,他告诉我他尝试使用 GPU 进行一些蒙特卡罗模拟。有趣的是,他告诉我,他想在不同的处理器上随机抽取数字,并假设它们不相关。但他们不是。
问题是,是否存在一种方法可以在多个GPU上绘制独立数字集?他认为为每个人使用不同的种子可以解决问题,但事实并非如此。
如果需要任何澄清,请告诉我,我会请他提供更多详细信息。
Today I had a talk with a friend of mine told me he tries to make some monte carlo simulations using GPU. What was interesting he told me that he wanted to draw numbers randomly on different processors and assumed that there were uncorrelated. But they were not.
The question is, whether there exists a method to draw independent sets of numbers on several GPUs? He thought that taking a different seed for each of them would solve the problem, but it does not.
If any clarifications are need please let me know, I will ask him to provide more details.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
要生成完全独立的随机数,需要使用并行随机数生成器。本质上,您选择一个种子,它会生成M个独立的随机数流。因此,在每个 M 个 GPU 上,您都可以从独立的流中生成随机数。
在处理多个 GPU 时,您需要意识到您想要:
事实证明,在每个 GPU 核心上生成随机数是很棘手的(请参阅这个问题我不久前问过)。当我一直在研究 GPU 和 RN 时,如果您一次生成大量数字,您只会在 GPU 上生成随机数时获得加速。
相反,我会在 CPU 上生成随机数,因为:
在评论中回答您的问题:随机数取决于什么?
一个非常基本的随机数生成器是线性同余生成器。尽管该生成器已被较新的方法超越,但它应该能让您了解它们的工作原理。基本上,第 i 个随机数取决于第 (i-1) 个随机数。正如您所指出的,如果您运行两个流足够长的时间,它们将会重叠。最大的问题是,你不知道它们什么时候会重叠。
To generate completely independent random numbers, you need to use a parallel random number generator. Essentially, you choose a single seed and it generates M independent random number streams. So on each of the M GPUs you could then generate random numbers from independent streams.
When dealing with multiple GPUs you need to be aware that you want:
It turns out that generating random numbers on each GPU core is tricky (see this question I asked a while back). When I've been playing about with GPUs and RNs, you only get a speed-up generating random on the GPU if you generate large numbers at once.
Instead, I would generate random numbers on the CPU, since:
To answer your question in the comments: What do random numbers depend on?
A very basic random number generator is the linear congruential generator. Although this generator has been surpassed by newer methods, it should give you an idea of how they work. Basically, the ith random number depends on the (i-1) random number. As you point out, if you run two streams long enough, they will overlap. The big problem is, you don't know when they will overlap.
为了生成iid统一变量,您只需使用不同的种子初始化生成器。借助 Cuda,您可以使用 NVIDIA Curand 库来实现 Mersenne Twister 生成器。
例如,以下代码由 100 个内核并行执行,将绘制 (R^10)-uniform 的 10 个样本
For generating iid uniform variables, you just have to initialize your generators with differents seeds. With Cuda, you may use the NVIDIA Curand Library which implements the Mersenne Twister generator.
For example, the following code executed by 100 kernels in parallel, will draw 10 sample of a (R^10)-uniform
如果您使用任何“好的”生成器(例如 Mersenne Twister 等),则具有不同随机种子的两个序列将不相关,无论是在 GPU 还是 CPU 上。因此,我不确定你所说的在不同 GPU 上采用不同种子是不够的是什么意思。你能详细说明一下吗?
If you take any ``good'' generator (e.g. Mersenne Twister etc), two sequences with different random seeds will be uncorrelated, be it on GPU or CPU. Hence I'm not sure what you mean by saying taking different seeds on different GPUs were not enough. Would you elaborate?