cudaMemcpy - 检查
有人可以给我以下建议吗? 我正在将一些数据从 CPU 复制到 GPU,我需要知道它是否复制正确。 我可以检查 cudeMemcpy 的返回码,但如果我可以在 GPU 上打印数组,那就更好了。
int doCopyMemory(char * Input, int InputBytes)
{
/* Copying needed data on GPU */
cudaError_t s = cudaMemcpy ( SOURCE_DATA, Input, InputBytes, cudaMemcpyHostToDevice );
if (s != cudaSuccess) return 0;
else return 100;
}
我需要在复制后查看SOURCE_DATA的内容。 谢谢建议。
can somebody give me an advice in following.
I am copying some data from CPU to GPU and i need to know whether its copied rigth.
I can check the return code of cudeMemcpy, but it would much more better if i can print the array at GPU.
int doCopyMemory(char * Input, int InputBytes)
{
/* Copying needed data on GPU */
cudaError_t s = cudaMemcpy ( SOURCE_DATA, Input, InputBytes, cudaMemcpyHostToDevice );
if (s != cudaSuccess) return 0;
else return 100;
}
I need to see the content of SOURCE_DATA after copying.
Thx in advice.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
您可以将内存再次复制回(
cudaMemcpyDeviceToHost
)到主机上的另一个临时缓冲区,并验证它是否与原始缓冲区匹配。You could just copy the memory back again (
cudaMemcpyDeviceToHost
) to a different, temporary buffer on the host, and verify that this matches the original buffer.你是说你看到复制不成功,但cudaMemcpy返回cudaSuccess?我从来没有见过这种情况,如果你见过,那么你应该提交一个错误。
另一方面,如果您只是出于某种原因(偏执?!)进行额外检查,那么您可以复制回来。您可以从 GPU 进行打印(查看计算能力 1.x 中的 cuPrintf,或者如果您有 2.x 设备,则仅使用 printf),但对于您正在做的事情,最好复制回主机。
Are you saying that you have seen the copy be unsuccessful, but cudaMemcpy returns cudaSuccess? I've never seen that and if you have then you should submit a bug.
On the other hand, if you're just doing additional checks for some reason (paranoia?!) then you can just copy back. You can print from the GPU (check out cuPrintf in compute capability 1.x, or just use printf if you have a 2.x device) but for what you are doing you're better off copying back to the host.
映射固定内存对于此调试场景非常有用,因为您可以将主机和设备指针指向相同内存。只是不要忘记在检查内存之前调用 cudaThreadSynchronize() 以确保 GPU 完成处理(或者,在 Windows Vista 或 Windows 7 上,工作已提交给 GPU)。
Mapped pinned memory is very useful for this debugging scenario, since you can have host and device pointers to the same memory. Just don't forget to call cudaThreadSynchronize() to make sure the GPU is done processing (or, on Windows Vista or Windows 7, that the work gets submitted to the GPU) before examining the memory.