cuda内核不改变输入数组

发布于 2024-12-03 13:09:54 字数 2023 浏览 0 评论 0原文

我的 CUDA 内核似乎没有更改我传入的数组的值，这是相关的主机代码：

dim3 grid(numNets, N); 
dim3 threads(1, 1, 1); 

// allocate the arrays and jagged arrays on the device
alloc_dev_memory( state0,  state1,    d_state0, d_state1, 
                  adjlist, d_adjlist, transfer, d_transfer,
                  indeg,   d_indeg, d_N,       d_K,      d_S,          
                  d_Spow,  d_numNets );

// operate on the device memory
kernel<<< grid, threads >>>( d_state0, d_state1, d_adjlist, d_transfer, d_indeg,
                             d_N,      d_K,      d_S,       d_Spow,     d_numNets );

// copy the new states from the device to the host
cutilSafeCall( cudaMemcpy( state0, d_state0, ens_size*sizeof(int), 
                           cudaMemcpyDeviceToHost ) );

// copy the new states from the array to the ensemble
for(int i=0; i < numNets; ++i)
    nets[i]->set_state( state0 + N*i );

这是调用的内核代码：

// this dummy kernel just sets all the values to 0 for checking later.
__global__ void kernel( int * state0,    
                        int * state1,
                    int ** adjlist,
                    luint ** transfer,
                        int * indeg,
                        int * d_N,
                    float * d_K,
                        int * d_S,
                    luint * d_Spow,
                        int * d_numNets )
{
    int       N = *d_N;
    luint * Spow = d_Spow;
    int tid = blockIdx.x*N + blockIdx.y;

    state0[tid] = 0;
    state1[tid] = 0;

    for(int k=0; k < indeg[tid]; ++k) {
        adjlist[tid][k] = 0;
    }
    for(int k=0; k < Spow[indeg[tid]]; ++k) {
        transfer[tid][k] = 0;
    }
}

然后，在使用 cudaMemcpy 将 state0 数组返回主机后，如果我循环遍历 state0 并将所有值发送到 stdout，它们与初始值相同，即使我的内核被编写为将所有值设置为零。

预期输出应该是 state0 的初始值：101111101011，后跟 state0 的最终值：（全零）

此代码输出的示例运行：

101111101011
101111101011

Press ENTER to exit...

第二行应该全零。为什么这个 CUDA 内核不影响 state0 数组？

原文

My CUDA Kernel doesn't seem to be changing the values of the arrays I pass in, here's the relevant host code:

dim3 grid(numNets, N); 
dim3 threads(1, 1, 1); 

// allocate the arrays and jagged arrays on the device
alloc_dev_memory( state0,  state1,    d_state0, d_state1, 
                  adjlist, d_adjlist, transfer, d_transfer,
                  indeg,   d_indeg, d_N,       d_K,      d_S,          
                  d_Spow,  d_numNets );

// operate on the device memory
kernel<<< grid, threads >>>( d_state0, d_state1, d_adjlist, d_transfer, d_indeg,
                             d_N,      d_K,      d_S,       d_Spow,     d_numNets );

// copy the new states from the device to the host
cutilSafeCall( cudaMemcpy( state0, d_state0, ens_size*sizeof(int), 
                           cudaMemcpyDeviceToHost ) );

// copy the new states from the array to the ensemble
for(int i=0; i < numNets; ++i)
    nets[i]->set_state( state0 + N*i );

Here is the kernel code that is called:

// this dummy kernel just sets all the values to 0 for checking later.
__global__ void kernel( int * state0,    
                        int * state1,
                    int ** adjlist,
                    luint ** transfer,
                        int * indeg,
                        int * d_N,
                    float * d_K,
                        int * d_S,
                    luint * d_Spow,
                        int * d_numNets )
{
    int       N = *d_N;
    luint * Spow = d_Spow;
    int tid = blockIdx.x*N + blockIdx.y;

    state0[tid] = 0;
    state1[tid] = 0;

    for(int k=0; k < indeg[tid]; ++k) {
        adjlist[tid][k] = 0;
    }
    for(int k=0; k < Spow[indeg[tid]]; ++k) {
        transfer[tid][k] = 0;
    }
}

Then, after using cudaMemcpy to get the state0 array back on the host, if I loop through state0 and send all the values to stdout, they are the same as the initial values, even though my kernel is written to set all values to zero.

The expected output should be the initial value of state0: 101111101011, followed by the final value of state0: (all zeros)

A sample run of this code outputs:

101111101011
101111101011

Press ENTER to exit...

The second line should be all zeros. Why isn't this CUDA kernel affecting the state0 array?

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

醉生梦死 2024-12-10 13:09:55

我发现 N 和 numNets 的值是垃圾值。 N 的偏移量是错误的，因此这些值被设置在数组之外。 @pQB，你的建议正是我所需要的。

回复收藏 0 原文

~没有更多了~

关于作者

拥有

暂无简介

0 文章

0 评论

743 人气

关注发私信

胡图图

文章 0 评论 0

关注

zt006

文章 0 评论 0

关注

z祗昰~

文章 0 评论 0

关注

冰葑

文章 0 评论 0

关注

野の

文章 0 评论 0

关注

天空

文章 0 评论 0

友情链接

文江博客

cuda内核不改变输入数组

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（1）

关于作者

相关话题

热门标签

推荐作者

胡图图

zt006

z祗昰~

冰葑

野の

天空

友情链接

cuda内核不改变输入数组

如果你对这篇内容有疑问，欢迎到本站社区发帖提问 参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（1）

关于作者

相关话题

热门标签

推荐作者

胡图图

zt006

z祗昰~

冰葑

野の

天空

友情链接

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。