CUDA程序赋予垃圾价值
我真的不明白为什么下面代码的输出不是a和b。
#include<cutil.h>
#include<iostream>
__global__ void p(unsigned char **a){
unsigned char temp[2];
temp[0] = 'a';
temp[1] = 'b';
a[0] = temp;
}
void main(){
unsigned char **a ;
cudaMalloc((void**)&a, sizeof(unsigned char*));
p<<<1,1>>>(a);
unsigned char **c;
unsigned char b[2];
cudaMemcpy(c, a, sizeof(unsigned char *), cudaMemcpyDeviceToHost);
cudaMemcpy(b, c[0], 2*sizeof(unsigned char), cudaMemcpyDeviceToHost);
for( int i=0 ; i < 2; i++){
printf("%c\n", b[i]);
}
getchar();
}
我的逻辑有什么问题吗?
I really do not understand why the output for the below code is not a and b.
#include<cutil.h>
#include<iostream>
__global__ void p(unsigned char **a){
unsigned char temp[2];
temp[0] = 'a';
temp[1] = 'b';
a[0] = temp;
}
void main(){
unsigned char **a ;
cudaMalloc((void**)&a, sizeof(unsigned char*));
p<<<1,1>>>(a);
unsigned char **c;
unsigned char b[2];
cudaMemcpy(c, a, sizeof(unsigned char *), cudaMemcpyDeviceToHost);
cudaMemcpy(b, c[0], 2*sizeof(unsigned char), cudaMemcpyDeviceToHost);
for( int i=0 ; i < 2; i++){
printf("%c\n", b[i]);
}
getchar();
}
what is wrong with my logic?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
我们暂时先不考虑 CUDA。让我们创建一个将数据写入用户提供的数组的函数。用户通过指针传递数组:
现在,您对局部变量所做的事情没有意义,因为您想要使用局部变量的地址,而在离开函数作用域后该地址将变得无效。您可以做的下一个最好的事情是 memcpy() 或某种等效的 C++ 算法:
好的,现在开始调用该函数:我们首先必须提供目标内存,并且然后传递一个指针:(
在 C++ 中,您可能会使用
new int[2]
和delete your_memory
来代替,但是通过使用 Cmalloc()
> 与 CUDA 的连接有望成为清楚。)当您将
fill_me_up
移动到 CUDA 设备时,您必须为其提供一个设备指针而不是主机指针,因此您必须先设置该指针,然后将结果复制回来出来,但这是唯一的变化。Let's leave out CUDA for now. Let's just make a function that writes data to a user-provided array. The user passes the array via a pointer:
Now, what you're doing with the local variable doesn't make sense, because you want to use the address of a local variable, which becomes invalid after you leave the function scope. The next best thing you could do is
memcpy()
, or some equivalent C++ algorithm:OK, now on to calling that function: We first must provide the target memory, and then pass a pointer:
(In C++ you would probably have uses
new int[2]
anddelete your_memory
instead, but by using Cmalloc()
the connection to CUDA hopefully becomes clear.)When you're moving
fill_me_up
to the CUDA device, you have to give it a device pointer rather than a host pointer, so you have to set that one up first and afterwards copy the results back out, but that's about the only change.