CUDA 全局静态数据替代方案?

发布于 2024-10-16 06:17:18 字数 311 浏览 3 评论 0原文

我正在构建一个在 CUDA 中提供不同算法的工具包。但是,许多这些算法使用将由所有线程使用的静态常量全局数据,例如以这种方式声明:

static __device__ __constant__ real buf[MAX_NB];

我的问题是,如果我在库中包含所有 .cuh 文件,则当库将实例化所有这些内存时即使用户可能只想使用其中一种算法,也会在设备上进行分配。有什么办法解决这个问题吗?我绝对必须使用典型的动态分配内存吗?

我想要尽可能快的常量内存,以便所有线程在运行时都可以使用。有什么想法吗?

谢谢!

I'm building a toolkit that offers different algorithms in CUDA. However, many of these algorithms use static constant global data that will be used by all threads, declared this way for example:

static __device__ __constant__ real buf[MAX_NB];

My problem is that if I include all the .cuh files in the library, when the library will be instantiated all this memory will be allocated on the device, even though the user might want to use only one of these algorithms. Is there any any way around this? Will I absolutely have to use the typical dynamically allocated memory?

I want the fastest constant memory possible that can be used by all threads at runtime. Any ideas?

Thanks!

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

╰つ倒转 2024-10-23 06:17:18

.cu 文件中的所有常量内存都是在启动时分配的(当生成并运行 .cubin 时,每个 .cu 属于不同的模块)!因此,要使用许多使用常量内存的不同内核,您必须将它们划分为 .cu 文件,以免常量内存溢出。通常的最大值为 64kb。来源:http://forums.nvidia.com/index.php?showtopic=185993

All the constant memory in a .cu file is allocated at launch (When the .cubin is generated and run, each .cu belongs to a different module)! Therefore, to use many different kernels that use constant memory, you have to divide them in .cu files as to not get a const memory overflow. The usual max is 64kb. Source: http://forums.nvidia.com/index.php?showtopic=185993

叹梦 2024-10-23 06:17:18

您研究过纹理内存吗?我相信这很棘手,但它可以非常快并且可以动态分配。

如果你不能使用纹理,我一直在集思广益,我对常量唯一能想到的就是分配一个常量数组......希望小于 /all/ 中常量的 /all/ 的数量标头的大小,但足够大,可以满足最大用例中任何人的需要。然后您可以根据不同的需要将不同的值加载到该数组中。

(我假设您已经确认为整个库分配常量内存是一个问题。是空间不足,还是初始化时间长,还是什么?)

Have you looked into texture memory? I believe it is tricky but that it can be quite fast and can be allocated dynamically.

If you can't use textures, I've been brainstorming and the only thing I can think of for constant is to allocate a single constant array... some amount that is hopefully less than /all/ of the constants in /all/ of the headers, but big enough for what anyone would need in a maximal use case. Then you can load different values into this array for different needs.

(I'm assuming you've confirmed that allocating constant memory for the entire library is a problem. Is it insufficient space, or long initialization times, or what?)

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文