通过引用传递 CUDA 随机生成器状态

发布于 2024-10-01 13:44:01 字数 1502 浏览 5 评论 0原文

在函数CalculateValue(curandState *localStat)和GetExponential(curandState *localState)中通过引用传递随机生成器状态(CUDA工具包3.2 curand.lib)时,以下代码是否正确?

谢谢

__device__ double GetExponential(curandState *localState) { 
    double u1 = curand_uniform_double(localState); } 


__device__  double CalculateValue(curandState *localStat)  { 
  double x = GetExponential(localState);  
  return x; } 


__global__ void RunMonteCarloKernel(curandState *state, double *results) { 
    int i = threadIdx.x + blockIdx.x * blockDim.x; 

    /* Copy state to local memory for efficiency */ 
    curandState localState = state[threadIdx.x + blockIdx.x * blockDim.x];    

    results[i] = CalculateValue(&localState); 

    /* Copy state back to global memory */ 
    state[threadIdx.x + blockIdx.x * blockDim.x] = localState; }

__global__ void setup_kernel(curandState *state) { 
    int i = threadIdx.x + blockIdx.x * blockDim.x; 

    /* Each thread gets different seed, a different sequence number, no offset */ 
    curand_init(i, i, 0, &state[i]); } 

int main(void) { 
    double *devResults; 
    curandState *devStates; 

    /* Allocate space for prng states on device */ 
    CUDA_CALL(cudaMalloc((void **)&devStates, totalThreads * sizeof(curandState))); 

    /* Setup prng states */ 
   setup_kernel<<<totalBlocks, threadsPerBlock>>>(devStates); 

    for(int i=0; i< 1000; i++) 
    { 
            RunMonteCarloKernel(devStates, devResults); 
    } }

Is the following code correct when passing the random generator state(CUDA toolkit 3.2 curand.lib) by reference in function CalculateValue(curandState *localStat) and GetExponential(curandState *localState)?

Thanks

__device__ double GetExponential(curandState *localState) { 
    double u1 = curand_uniform_double(localState); } 


__device__  double CalculateValue(curandState *localStat)  { 
  double x = GetExponential(localState);  
  return x; } 


__global__ void RunMonteCarloKernel(curandState *state, double *results) { 
    int i = threadIdx.x + blockIdx.x * blockDim.x; 

    /* Copy state to local memory for efficiency */ 
    curandState localState = state[threadIdx.x + blockIdx.x * blockDim.x];    

    results[i] = CalculateValue(&localState); 

    /* Copy state back to global memory */ 
    state[threadIdx.x + blockIdx.x * blockDim.x] = localState; }

__global__ void setup_kernel(curandState *state) { 
    int i = threadIdx.x + blockIdx.x * blockDim.x; 

    /* Each thread gets different seed, a different sequence number, no offset */ 
    curand_init(i, i, 0, &state[i]); } 

int main(void) { 
    double *devResults; 
    curandState *devStates; 

    /* Allocate space for prng states on device */ 
    CUDA_CALL(cudaMalloc((void **)&devStates, totalThreads * sizeof(curandState))); 

    /* Setup prng states */ 
   setup_kernel<<<totalBlocks, threadsPerBlock>>>(devStates); 

    for(int i=0; i< 1000; i++) 
    { 
            RunMonteCarloKernel(devStates, devResults); 
    } }

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

神经暖 2024-10-08 13:44:01

有问题吗?看起来还不错。

您可能需要查看 3.2 SDK 的 MonteCarloCURAND 目录中的 EstimatePiInlineP 示例。它使用 C++ 风格的引用传递来避免获取局部变量的地址。您需要在内核末尾将状态存储回内存(就像您在代码中所做的那样)。

通过 C++ 引用传递可以帮助编译器清楚地表明该函数可以直接对原始寄存器中的数据进行操作。如果编译器不能确定所有线程都以相同的方式处理指针(即对指针进行相同的操作),则获取 GPU 中本地数组的地址可能会损害性能,在这种情况下,它将把数组溢出到本地内存。它会起作用,但可能会更慢。

Is there a problem? It looks ok.

You may want to check out the EstimatePiInlineP sample which is in the MonteCarloCURAND directory of the 3.2 SDK. It uses C++ style pass by reference to avoid taking the address of a local variable. You would need to store the state back to memory at the end of the kernel (as you do in your code).

Passing by C++ reference can assist the compiler by clearly showing that the function can operate on the data directly in the original registers. Taking the address of a local array in a GPU can be detrimental to performance if the compiler cannot be certain that all threads handle the pointer identically (i.e. identical operations on the pointer), in which case it will spill the array to local memory. It'll work, but it may be slower.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文