CUDA:cudaMemcpy 仅在仿真模式下工作
我刚刚开始学习如何使用 CUDA。我正在尝试运行一些简单的示例代码:
float *ah, *bh, *ad, *bd;
ah = (float *)malloc(sizeof(float)*4);
bh = (float *)malloc(sizeof(float)*4);
cudaMalloc((void **) &ad, sizeof(float)*4);
cudaMalloc((void **) &bd, sizeof(float)*4);
... initialize ah ...
/* copy array on device */
cudaMemcpy(ad,ah,sizeof(float)*N,cudaMemcpyHostToDevice);
cudaMemcpy(bd,ad,sizeof(float)*N,cudaMemcpyDeviceToDevice);
cudaMemcpy(bh,bd,sizeof(float)*N,cudaMemcpyDeviceToHost);
当我在仿真模式(nvcc -deviceemu)下运行时,它运行良好(并且实际上复制了数组)。 但是当我以常规模式运行它时,它运行时没有错误,但从不复制数据。就好像 cudaMemcpy 行被忽略了。
我做错了什么?
非常感谢, 贾森
I am just starting to learn how to use CUDA. I am trying to run some simple example code:
float *ah, *bh, *ad, *bd;
ah = (float *)malloc(sizeof(float)*4);
bh = (float *)malloc(sizeof(float)*4);
cudaMalloc((void **) &ad, sizeof(float)*4);
cudaMalloc((void **) &bd, sizeof(float)*4);
... initialize ah ...
/* copy array on device */
cudaMemcpy(ad,ah,sizeof(float)*N,cudaMemcpyHostToDevice);
cudaMemcpy(bd,ad,sizeof(float)*N,cudaMemcpyDeviceToDevice);
cudaMemcpy(bh,bd,sizeof(float)*N,cudaMemcpyDeviceToHost);
When I run in emulation mode (nvcc -deviceemu) it runs fine (and actually copies the array).
But when I run it in regular mode, it runs w/o error, but never copies the data. It's as if the cudaMemcpy lines are just ignored.
What am I doing wrong?
Thank you very much,
Jason
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
您应该检查错误,最好是在每个 malloc 和 memcpy 上检查错误,但只需在最后执行一次就足够了 (
cudaGetErrorString(cudaGetLastError())
。只是为了检查明显的情况:
deviceQuery
SDK 示例来检查设备是否正常工作,并且所有驱动程序均已安装并正常工作。N
(在 memcpy 中)等于 4(在 malloc 中),对吧?You should check for errors, ideally on each malloc and memcpy but just doing it once at the end will be sufficient (
cudaGetErrorString(cudaGetLastError())
.Just to check the obvious:
deviceQuery
SDK sample to check the device is working correctly and all the drivers are installed and working.N
(in the memcpy) is equal to 4 (in the malloc), right?查看您是否有支持 CUDA 的设备。也许您可以尝试运行下面的代码,看看您得到什么信息:
See if you have a CUDA enabled device. Probably you can try running the code below and see what info you get: