PyCUDA：设备代码中的 Pow 尝试使用 std::pow，失败

发布于 2024-11-01 17:18:15 字数 206 浏览 6 评论 0原文

问题或多或少说明了一切。

calling a host function("std::pow<int, int> ") from a __device__/__global__ function("_calc_psd") is not allowed

根据我的理解，这应该使用 cuda pow 函数，但事实并非如此。

原文

Question more or less says it all.

calling a host function("std::pow<int, int> ") from a __device__/__global__ function("_calc_psd") is not allowed

from my understanding, this should be using the cuda pow function instead, but it isn't.

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

寂寞陪衬 2024-11-08 17:18:15

该错误与编译器报告的完全一样。您不能在设备代码中使用主机函数，其中包括整个主机 C++ std 库。 CUDA 包含其自己的标准库，如编程指南中所述，但您应该使用 pow 或 fpow（取自 C 标准库，无 C++ 或命名空间）。 nvcc 将使用 cuda 正确的设备函数重载该函数并内联生成的代码。类似以下内容将起作用：

#include <math.h>

__device__ float func(float x) {

   return x * x * fpow(x, 0.123456f);
}

编辑：我第一次错过的部分是错误中报告的模板说明符。您确定要向 pow 传递 float 或 double 参数吗？如果您传递整数，则 CUDA 标准库中没有重载函数，这就是它可能失败的原因。如果你需要一个整数 pow 函数，你将不得不自己滚动（或者进行转换，但 pow 是一个相当昂贵的函数，我确信一些级联整数乘法会更快）。

The error is exactly as the compiler is reported. You can't used host functions in device code, and that include the whole host C++ std library. CUDA includes its own standard library, described in the programming guide, but you should use either pow or fpow (taken from the C standard library, no C++ or namespaces). nvcc will overload the function with the cuda correct device function and inline the resulting code. Something like the following will work:

#include <math.h>

__device__ float func(float x) {

   return x * x * fpow(x, 0.123456f);
}

EDIT: The bit I missed the first time is the template specifier reported in the errors. Are you sure that you are passing either float or double arguments to pow? If you are passing integers, there is no overload function in the CUDA standard library, which is why it might be failing. If you need an integer pow function, you will have to roll your own (or do casting, but pow is a rather expensive function and I am certain some cascaded integer multiplication will be faster).

回复收藏 0 原文

~没有更多了~