我遵循一些准则来处理 C++ 中的内存管理。一些例子:我从不使用malloc
。我几乎从不需要或使用 new
或 delete
。我使用智能指针,几乎不需要编写析构函数。
我想学习CUDA。我一直在网上寻找与我的 C++ 编程风格相匹配的教程,但一切看起来都是 C 风格的。 我不清楚何时需要这种 C 风格的编程以及何时它只是作者的风格。 作为示例,这里是来自 NVIDIA 教程:
int main(void)
{
int N = 1<<20;
float *x, *y, *d_x, *d_y;
x = (float*)malloc(N*sizeof(float));
y = (float*)malloc(N*sizeof(float));
cudaMalloc(&d_x, N*sizeof(float));
cudaMalloc(&d_y, N*sizeof(float));
//...
printf("Max error: %f\n", maxError);
cudaFree(d_x);
cudaFree(d_y);
free(x);
free(y);
}
此代码使用 malloc
, 免费
,拥有原始指针和 C 风格数组。这些都有必要吗?我可以编写现代 C++ 风格的 CUDA 吗?
I follow some guidelines to deal with memory management in C++. Some examples: I never use malloc
. I almost never need or use new
or delete
. I use smart pointers, and almost never need to write destructors.
I want to learn CUDA. I have been looking online for tutorials that match my C++ style of programming, but everything looks C-style. It is not clear to me when this C style of programming is necessary and when it is just the style of the author. As an example, here is a snippet of code from a NVIDIA tutorial:
int main(void)
{
int N = 1<<20;
float *x, *y, *d_x, *d_y;
x = (float*)malloc(N*sizeof(float));
y = (float*)malloc(N*sizeof(float));
cudaMalloc(&d_x, N*sizeof(float));
cudaMalloc(&d_y, N*sizeof(float));
//...
printf("Max error: %f\n", maxError);
cudaFree(d_x);
cudaFree(d_y);
free(x);
free(y);
}
This code uses malloc
, free
, owning raw pointers, and C-style arrays. Are these all necessary? Can I write modern C++-style CUDA?
发布评论
评论(1)
Cuda(十年前)是一个很大程度上的C风格实体。随着时间的流逝,语言迁移为C ++变体/定义。为了理解,我们应该描述设备代码和主机代码之间的讨论。
对于设备代码,,在各种限制。特定的限制之一是 no对标准库的一般支持。
对于设备代码,(与主机代码有些重叠)有进行进化类似于STL的库/功能。但是,例如,
std :: vector
在CUDA设备代码中不可用(您可以在CUDA设备代码中使用new
)。对于主机代码,只要我们谈论的是严格的主机代码,实际上并不是任何打算是越野外的。此外的例外是无证的问题,这些问题会不时出现例如
boost
,也许还有许多其他库。这些不是故意的遗漏,而是通过CUDA使用一个特殊的预处理器/前端即使对于主机代码,与每个想象中的每个可想象的库可能要使用的库,再加上不完整的测试。关于用户提供的库(与标准库或系统库相对),CUDA通常需要适当地装饰功能才能适当地装饰以在设备代码中使用,也可能值得一提。无论我们是在谈论编译的库还是仅限标题库,除非库专门用于CUDA使用,否则通常应该在主机代码中使用这些库(受到上述警告),但不一定在设备代码中使用。
如果主机代码与设备代码接口,则需要相当仔细地遵循限制。同样,一个
std :: vector
容器无法轻松传递到设备代码函数调用(a cuda kernel )。但是,正如评论中已经提到的那样,您可以使用 throust库< /a> cuda工具包安装中。malloc
和免费
是不需要的。您可以类似地使用new
和删除
,或使用推力容器。关于使用原始指针以及相关的C风格阵列,这可能是不可避免的,因为它们是C ++的一部分,除了标准库中,C ++中没有更高级别的容器。至少在主机设备接口上使用原始指针肯定是典型的。例如,如果使用
throust :: device_vector
,则仍然需要提取原始指针将其传递给内核。cuda runtime 和驱动程序 API在很大程度上对他们具有C风格的感觉。它不是正式的CUDA一部分,但其他人则创建了包装器,以使事情变得更加“ C ++”。一个这样的例子是此库来自einpoklum/eyalroz。我没有个人经验,但是维护它似乎是相对充满活力的,这是一个持续的问题。正如评论中所暗示的,通过C ++过载和EG在各种容器和库结构中可更换功能,您可能可以构建一个容器或构造,该容器或构造可以通过替换标准分配器等
。提供一种容器/算法方法来利用CUDA环境中的这些C ++概念。
它不是CUDA的一部分,但Nvidia提供了一种方法来加速标准C ++代码也是如此。
CUDA started out (over a decade ago) as a largely C style entity. Over time, the language migrated to be primarily a C++ variant/definition. For understanding, we should delineate the discussion between device code and host code.
For device code, CUDA claims compliance to a particular C++ standard, subject to various restrictions. One of the particular restrictions is that there is no general support for standard libraries.
For device code, (with some overlap with host code) there is an evolution underway to provide a set of STL-like libraries/features. But as an example,
std::vector
is not usable in CUDA device code (you can usenew
in CUDA device code).For host code, there really isn't anything that is intended to be out-of-bounds, as long as we are talking about things that are strictly host code. The exceptions to this are undocumented issues that crop up from time to time for example with
boost
and perhaps many other libraries. These aren't intentional omissions, but arise via the fact that CUDA uses a special preprocessor/front-end, even for host code, coupled with incomplete testing against every imaginable library one might want to use.It might also be worthwhile to say regarding user-supplied libraries (as opposed to standard libraries or system libraries) that CUDA generally requires functions to be decorated appropriately in order to be usable in device code. Whether we are talking about compiled libraries or header-only libraries, these should generally be usable in host code (subject to the caveat above), but not necessarily in device code, unless the library has been specifically decorated for CUDA usage.
Where host code is interfacing with device code, you'll need to follow the limitations fairly closely. Again, a
std::vector
container cannot be easily passed to a device code function call (a CUDA kernel). But as already mentioned in the comments, there is something similar you can do with the thrust library which is included with the CUDA toolkit install.malloc
andfree
are not necessary. You can similarly usenew
anddelete
, or use the thrust containers.regarding use of raw pointers and relatedly, C-style arrays, this will probably be more-or-less unavoidable, as these are part of C++ and there are no higher level containers in C++ apart from what is in standard libraries, AFAIK. Use of raw pointers at least at the host-device interface is certainly typical. If you use
thrust::device_vector
, for example, you will still need to extract a raw pointer to pass to the kernel.The CUDA runtime and driver APIs still have largely a C-style feel to them. It's not formally part of CUDA, but others have created wrappers to make things more "C++ like". One such example is this library from einpoklum/eyalroz. I have no personal experience with it, but the maintenance of it seems to be relatively energetic, a going concern. And as hinted in the comments, via C++ overloads and e.g. replaceable functionality in various containers and library constructs, you can probably build a container or construct that does what you want, perhaps by replacing standard allocators, etc.
As already mentioned, thrust intends to provide a container/algorithm approach to leverage those kinds of C++ concepts in a CUDA environment.
It's not part of CUDA, but NVIDIA offers a way to accelerate standard C++ code also.