CUDA、cuPrintf 导致“未指定的启动失败”？

发布于 2024-11-18 10:40:46 字数 504 浏览 1 评论 0原文

我有一个内核，它以不同的网格大小运行两次。

我的问题是 cuPrintf。当我在内核运行之前没有 cudaPrintfInit() 以及在内核运行之后没有 cudaPrintfDisplay(stdout, true) 和 cudaPrintfEnd() 时，我有没有错误，但是当我把它们放在那里时，我收到“未指定的启动失败”错误。

在我的设备代码中，只有一个这样的循环用于打印：

if (threadIdx.x==0) {
     cuPrintf("MAX:%f x:%d y:%d\n", maxVal, blockIdx.x, blockIdx.y);
}

我正在将 CUDA 4.0 与具有 cuda 功能 2.0 的卡一起使用，因此我使用以下语法编译我的代码：

nvcc LB2.0.cu -arch=compute_20 -code=sm_20

原文

I have a kernel which runs twice with different grid size.

My problem is with cuPrintf. When I don't have cudaPrintfInit() before kernel run and cudaPrintfDisplay(stdout, true) and cudaPrintfEnd() after kernel run, I have no error but when I put them there I get "unspecified launch failure" error.

In my device code, there is only one loop like this for printing:

if (threadIdx.x==0) {
     cuPrintf("MAX:%f x:%d y:%d\n", maxVal, blockIdx.x, blockIdx.y);
}

I'm using CUDA 4.0 with a card with cuda capability 2.0 and so I'm compiling my code with this syntax:

nvcc LB2.0.cu -arch=compute_20 -code=sm_20

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

時窥 2024-11-25 10:40:46

如果您使用的是 CC 2.0 GPU，则根本不需要 cuPrintf——CUDA 为 CC-2.0 和更高版本的 GPU 内置了 printf。因此，只需将您对 cuPrintf 的调用替换为：（

#if __CUDA_ARCH__ >= 200
if (threadIdx.x==0) {
    printf("MAX:%f x:%d y:%d\n", maxVal, blockIdx.x, blockIdx.y);
}
#endif

请注意，如果您正在为 sm_20 以及早期版本编译代码，则只需要 #if / #endif 行。使用您提供的示例编译命令行，您可以消除它们。）

使用 printf，您不需要 cudaPrintfInit() 或 cudaPrintfDisplay() - 它是自动的。但是，如果打印大量数据，则可能需要使用 cudaDeviceSetLimit()，传递cudaLimitPrintfFifoSize选项。

If you are on a CC 2.0 GPU, you don't need cuPrintf at all -- CUDA has printf built-in for CC-2.0 and higher GPUs. So just replace your call to cuPrintf with this:

#if __CUDA_ARCH__ >= 200
if (threadIdx.x==0) {
    printf("MAX:%f x:%d y:%d\n", maxVal, blockIdx.x, blockIdx.y);
}
#endif

(Note you only need the #if / #endif lines if you are compiling your code for sm_20 and also earlier versions. With the example compilation command line you gave, you can eliminate them.)

With printf, you don't need cudaPrintfInit() or cudaPrintfDisplay() -- it is automatic. However if you print a lot of data, you may need to increase the default printf FIFO size with cudaDeviceSetLimit(), passing the cudaLimitPrintfFifoSize option.

回复收藏 0 原文

~没有更多了~