cuda内核不改变输入数组
我的 CUDA 内核似乎没有更改我传入的数组的值,这是相关的主机代码:
dim3 grid(numNets, N);
dim3 threads(1, 1, 1);
// allocate the arrays and jagged arrays on the device
alloc_dev_memory( state0, state1, d_state0, d_state1,
adjlist, d_adjlist, transfer, d_transfer,
indeg, d_indeg, d_N, d_K, d_S,
d_Spow, d_numNets );
// operate on the device memory
kernel<<< grid, threads >>>( d_state0, d_state1, d_adjlist, d_transfer, d_indeg,
d_N, d_K, d_S, d_Spow, d_numNets );
// copy the new states from the device to the host
cutilSafeCall( cudaMemcpy( state0, d_state0, ens_size*sizeof(int),
cudaMemcpyDeviceToHost ) );
// copy the new states from the array to the ensemble
for(int i=0; i < numNets; ++i)
nets[i]->set_state( state0 + N*i );
这是调用的内核代码:
// this dummy kernel just sets all the values to 0 for checking later.
__global__ void kernel( int * state0,
int * state1,
int ** adjlist,
luint ** transfer,
int * indeg,
int * d_N,
float * d_K,
int * d_S,
luint * d_Spow,
int * d_numNets )
{
int N = *d_N;
luint * Spow = d_Spow;
int tid = blockIdx.x*N + blockIdx.y;
state0[tid] = 0;
state1[tid] = 0;
for(int k=0; k < indeg[tid]; ++k) {
adjlist[tid][k] = 0;
}
for(int k=0; k < Spow[indeg[tid]]; ++k) {
transfer[tid][k] = 0;
}
}
然后,在使用 cudaMemcpy 将 state0 数组返回主机后,如果我循环遍历 state0 并将所有值发送到 stdout,它们与初始值相同,即使我的内核被编写为将所有值设置为零。
预期输出应该是 state0 的初始值:101111101011,后跟 state0 的最终值:(全零)
此代码输出的示例运行:
101111101011
101111101011
Press ENTER to exit...
第二行应该全零。为什么这个 CUDA 内核不影响 state0 数组?
My CUDA Kernel doesn't seem to be changing the values of the arrays I pass in, here's the relevant host code:
dim3 grid(numNets, N);
dim3 threads(1, 1, 1);
// allocate the arrays and jagged arrays on the device
alloc_dev_memory( state0, state1, d_state0, d_state1,
adjlist, d_adjlist, transfer, d_transfer,
indeg, d_indeg, d_N, d_K, d_S,
d_Spow, d_numNets );
// operate on the device memory
kernel<<< grid, threads >>>( d_state0, d_state1, d_adjlist, d_transfer, d_indeg,
d_N, d_K, d_S, d_Spow, d_numNets );
// copy the new states from the device to the host
cutilSafeCall( cudaMemcpy( state0, d_state0, ens_size*sizeof(int),
cudaMemcpyDeviceToHost ) );
// copy the new states from the array to the ensemble
for(int i=0; i < numNets; ++i)
nets[i]->set_state( state0 + N*i );
Here is the kernel code that is called:
// this dummy kernel just sets all the values to 0 for checking later.
__global__ void kernel( int * state0,
int * state1,
int ** adjlist,
luint ** transfer,
int * indeg,
int * d_N,
float * d_K,
int * d_S,
luint * d_Spow,
int * d_numNets )
{
int N = *d_N;
luint * Spow = d_Spow;
int tid = blockIdx.x*N + blockIdx.y;
state0[tid] = 0;
state1[tid] = 0;
for(int k=0; k < indeg[tid]; ++k) {
adjlist[tid][k] = 0;
}
for(int k=0; k < Spow[indeg[tid]]; ++k) {
transfer[tid][k] = 0;
}
}
Then, after using cudaMemcpy to get the state0 array back on the host, if I loop through state0 and send all the values to stdout, they are the same as the initial values, even though my kernel is written to set all values to zero.
The expected output should be the initial value of state0: 101111101011, followed by the final value of state0: (all zeros)
A sample run of this code outputs:
101111101011
101111101011
Press ENTER to exit...
The second line should be all zeros. Why isn't this CUDA kernel affecting the state0 array?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
我发现
N
和numNets
的值是垃圾值。N
的偏移量是错误的,因此这些值被设置在数组之外。 @pQB,你的建议正是我所需要的。I found that the values of
N
andnumNets
were garbage values. The offset byN
was wrong, so the values were being set outside of the array. @pQB, your suggestion was just what I needed.