从 CUDA 中的高斯分布生成随机数

发布于 2024-10-13 02:47:31 字数 1652 浏览 8 评论 0原文

我在互联网上进行了大量搜索，寻找一种在我的 CUDA 设备上的内核中生成随机数的方法。这些数字必须来自高斯分布。

我发现的最好的东西来自 NVIDIA 本身。这是华莱士算法，它使用均匀分布来构建高斯分布。但他们给出的代码示例缺乏解释，我真的需要了解算法是如何运行的，尤其是在设备上。例如，他们给出：

 __device__ void generateRandomNumbers_wallace(  
unsigned seed,  // Initialization seed  
 float *chi2Corrections,  // Set of correction values  
 float *globalPool,  // Input random number pool  
 float *output  // Output random numbers  


    unsigned tid=threadIdx.x;  
    // Load global pool into shared memory.  
     unsigned offset = __mul24(POOL_SIZE, blockIdx.x);  
    for( int i = 0; i < 4; i++ )  
      pool[tid+THREADS*i] = globalPool[offset+TOTAL_THREADS*i+tid];  
    __syncthreads();  
      const unsigned lcg_a=241;  
      const unsigned lcg_c=59;  
      const unsigned lcg_m=256;  
      const unsigned mod_mask = lcg_m-1;  
      seed=(seed+tid)&mod_mask ;  
      // Loop generating outputs repeatedly  
     for( int loop = 0; loop < OUTPUTS_PER_RUN; loop++ )  
      {  
        Transform();  
        unsigned intermediate_address;  
        i_a = __mul24(loop,8*TOTAL_THREADS)+8*THREADS *  
          blockIdx.x + threadIdx.x;  
        float chi2CorrAndScale=chi2Corrections[  
          blockIdx.x * OUTPUTS_PER_RUN + loop];  
        for( i = 0; i < 4; i++ )  
          output[i_a + i*THREADS]=chi2CorrAndScale*pool[tid+THREADS*i];  
    }

首先，许多声明的变量甚至没有在函数中使用！我真的不明白第二个循环中的“8”是什么意思。我知道其他循环中的“4”与 4x4 正交矩阵块有关，对吗？谁能让我更好地了解这里发生的事情？

无论如何，有人有我可以使用的好的代码示例吗？或者有人有另一种在 CUDA 内核中生成随机高斯数的方法吗？代码示例将不胜感激。

谢谢！

原文

I've searched a lot over the internet to find a way to generate random numbers on my CUDA device, within a kernel. The numbers must come from a gaussian distribution.

The best thing I found was from NVIDIA itself. It is the Wallace algorithm, that uses a uniform distribution to build a gaussian one. But the code samples they give lack explanation and I really need to understand how the algorithm goes, especially on the device. For example, they give:

 __device__ void generateRandomNumbers_wallace(  
unsigned seed,  // Initialization seed  
 float *chi2Corrections,  // Set of correction values  
 float *globalPool,  // Input random number pool  
 float *output  // Output random numbers  


    unsigned tid=threadIdx.x;  
    // Load global pool into shared memory.  
     unsigned offset = __mul24(POOL_SIZE, blockIdx.x);  
    for( int i = 0; i < 4; i++ )  
      pool[tid+THREADS*i] = globalPool[offset+TOTAL_THREADS*i+tid];  
    __syncthreads();  
      const unsigned lcg_a=241;  
      const unsigned lcg_c=59;  
      const unsigned lcg_m=256;  
      const unsigned mod_mask = lcg_m-1;  
      seed=(seed+tid)&mod_mask ;  
      // Loop generating outputs repeatedly  
     for( int loop = 0; loop < OUTPUTS_PER_RUN; loop++ )  
      {  
        Transform();  
        unsigned intermediate_address;  
        i_a = __mul24(loop,8*TOTAL_THREADS)+8*THREADS *  
          blockIdx.x + threadIdx.x;  
        float chi2CorrAndScale=chi2Corrections[  
          blockIdx.x * OUTPUTS_PER_RUN + loop];  
        for( i = 0; i < 4; i++ )  
          output[i_a + i*THREADS]=chi2CorrAndScale*pool[tid+THREADS*i];  
    }

First of all, many of the variables declared aren't even used in the function! And I really don't get what the "8" is for in the second loop. I understand the "4" in the other loops have something to do with the 4x4 orthogonal matrix block, am I right? Could anyone give me a better idea of what is going on here?

Anyway, does anyone have any good code samples I could use? Or does anyone have another way of generating random gaussian numbers in a CUDA kernel? Code samples will be much appreciated.

Thanks!

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

黯然 2024-10-20 02:47:31

您可以使用 CURAND，它包含在CUDA 工具包（版本 3.2 及更高版本）。那就简单多了！

关于您发布的代码的一些注释：

华莱士生成器将高斯变换为高斯（即不是均匀到高斯）
CUDA 代码有两个隐式变量：blockIdx 和 threadIdx - 这些定义带块的块索引和线程索引，请参阅 CUDA编程指南了解更多信息
该代码在 sm_20 和更高版本上使用 __mul24，这实际上比“普通”32 位乘法慢，因此我会避免使用它（即使在较旧的体系结构上，为了简单起见）

回复收藏 0 原文

⊕婉儿 2024-10-20 02:47:31

Box-Muller 方法也不错。

回复收藏 0 原文

故人如初 2024-10-20 02:47:31

快速 Walsh Hadamard 变换是通过加法和减法模式完成的。因此中心极限定理适用。经过 Walsh Hadamard 变换的均匀随机数数组将具有高斯/正态分布。关于这一点有一些轻微的技术细节。该算法不是华莱士发现的。它是我自己于 1993/1994 年左右首次在 Servo Magazine 上发表的。
我在 www.code.google.com/p/lemontree 上有关于 Walsh Hadamard 变换的代码
问候，
肖恩·奥康纳

回复收藏 0 原文

~没有更多了~