从 CUDA 中的高斯分布生成随机数
我在互联网上进行了大量搜索,寻找一种在我的 CUDA 设备上的内核中生成随机数的方法。这些数字必须来自高斯分布。
我发现的最好的东西来自 NVIDIA 本身。这是华莱士算法,它使用均匀分布来构建高斯分布。但他们给出的代码示例缺乏解释,我真的需要了解算法是如何运行的,尤其是在设备上。例如,他们给出:
__device__ void generateRandomNumbers_wallace(
unsigned seed, // Initialization seed
float *chi2Corrections, // Set of correction values
float *globalPool, // Input random number pool
float *output // Output random numbers
unsigned tid=threadIdx.x;
// Load global pool into shared memory.
unsigned offset = __mul24(POOL_SIZE, blockIdx.x);
for( int i = 0; i < 4; i++ )
pool[tid+THREADS*i] = globalPool[offset+TOTAL_THREADS*i+tid];
__syncthreads();
const unsigned lcg_a=241;
const unsigned lcg_c=59;
const unsigned lcg_m=256;
const unsigned mod_mask = lcg_m-1;
seed=(seed+tid)&mod_mask ;
// Loop generating outputs repeatedly
for( int loop = 0; loop < OUTPUTS_PER_RUN; loop++ )
{
Transform();
unsigned intermediate_address;
i_a = __mul24(loop,8*TOTAL_THREADS)+8*THREADS *
blockIdx.x + threadIdx.x;
float chi2CorrAndScale=chi2Corrections[
blockIdx.x * OUTPUTS_PER_RUN + loop];
for( i = 0; i < 4; i++ )
output[i_a + i*THREADS]=chi2CorrAndScale*pool[tid+THREADS*i];
}
首先,许多声明的变量甚至没有在函数中使用!我真的不明白第二个循环中的“8”是什么意思。我知道其他循环中的“4”与 4x4 正交矩阵块有关,对吗?谁能让我更好地了解这里发生的事情?
无论如何,有人有我可以使用的好的代码示例吗?或者有人有另一种在 CUDA 内核中生成随机高斯数的方法吗?代码示例将不胜感激。
谢谢!
I've searched a lot over the internet to find a way to generate random numbers on my CUDA device, within a kernel. The numbers must come from a gaussian distribution.
The best thing I found was from NVIDIA itself. It is the Wallace algorithm, that uses a uniform distribution to build a gaussian one. But the code samples they give lack explanation and I really need to understand how the algorithm goes, especially on the device. For example, they give:
__device__ void generateRandomNumbers_wallace(
unsigned seed, // Initialization seed
float *chi2Corrections, // Set of correction values
float *globalPool, // Input random number pool
float *output // Output random numbers
unsigned tid=threadIdx.x;
// Load global pool into shared memory.
unsigned offset = __mul24(POOL_SIZE, blockIdx.x);
for( int i = 0; i < 4; i++ )
pool[tid+THREADS*i] = globalPool[offset+TOTAL_THREADS*i+tid];
__syncthreads();
const unsigned lcg_a=241;
const unsigned lcg_c=59;
const unsigned lcg_m=256;
const unsigned mod_mask = lcg_m-1;
seed=(seed+tid)&mod_mask ;
// Loop generating outputs repeatedly
for( int loop = 0; loop < OUTPUTS_PER_RUN; loop++ )
{
Transform();
unsigned intermediate_address;
i_a = __mul24(loop,8*TOTAL_THREADS)+8*THREADS *
blockIdx.x + threadIdx.x;
float chi2CorrAndScale=chi2Corrections[
blockIdx.x * OUTPUTS_PER_RUN + loop];
for( i = 0; i < 4; i++ )
output[i_a + i*THREADS]=chi2CorrAndScale*pool[tid+THREADS*i];
}
First of all, many of the variables declared aren't even used in the function! And I really don't get what the "8" is for in the second loop. I understand the "4" in the other loops have something to do with the 4x4 orthogonal matrix block, am I right? Could anyone give me a better idea of what is going on here?
Anyway, does anyone have any good code samples I could use? Or does anyone have another way of generating random gaussian numbers in a CUDA kernel? Code samples will be much appreciated.
Thanks!
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
您可以使用 CURAND,它包含在CUDA 工具包(版本 3.2 及更高版本)。那就简单多了!
关于您发布的代码的一些注释:
blockIdx
和threadIdx
- 这些定义带块的块索引和线程索引,请参阅 CUDA编程指南了解更多信息You could use CURAND, which is included with the CUDA Toolkit (version 3.2 and later). It'd be far simpler!
A few notes on the code you posted:
blockIdx
andthreadIdx
- these define the block index and thread index with a block, see the CUDA Programming Guide for more informationBox-Muller 方法也不错。
The Box-Muller method is also good.
快速 Walsh Hadamard 变换是通过加法和减法模式完成的。因此中心极限定理适用。经过 Walsh Hadamard 变换的均匀随机数数组将具有高斯/正态分布。关于这一点有一些轻微的技术细节。该算法不是华莱士发现的。它是我自己于 1993/1994 年左右首次在 Servo Magazine 上发表的。
我在 www.code.google.com/p/lemontree 上有关于 Walsh Hadamard 变换的代码
问候,
肖恩·奥康纳
The Fast Walsh Hadamard transform is done by patterns of addition and subtraction. Hence the central limit theorem applies. An array of uniform random numbers that undergoes a Walsh Hadamard transformation will have a Gaussian/Normal distribution. There are some slight technical details about that. The algorithm was not discovered by Wallace. It was first published in Servo Magazine around 1993/1994 by myself.
I have code about the Walsh Hadamard transform at www.code.google.com/p/lemontree
Regards,
Sean O'Connor