当前位置：文江博客话题详情

global 函数中的动态内存分配

发布于 2024-12-05 09:47:48 字数 195 浏览 1 评论 0原文

我有一张 CC 1.1 卡，我的程序需要我在全局或设备函数中动态分配数组。

将为每个执行线程创建这些数组。

malloc 抛出错误，网上冲浪告诉我，对于小于 2.0 的 CC，使用 malloc 是非法的。

我想问一下有什么解决办法吗？

谢谢

原文

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

一江春梦 2024-12-12 09:47:48

我建议您使用固定大小的内存：

__global__ my_kernel(...) {

__shared__ float memory[BLOCK_SIZE];

};

GPU 上的动态分配很少需要，并且很可能会引入一些性能瓶颈。特别是对于计算能力 1.1，您将需要调整对齐方式
共享内存以获得最佳性能并避免 Warp 内内存争用。

I would suggest you to use fixed size memory:

__global__ my_kernel(...) {

__shared__ float memory[BLOCK_SIZE];

};

dynamic allocation on the GPU is rarely need and can introduce most likely some performance bottleneck. And specially with a compute capability 1.1 you will need to tweak the alignments
of the shared memory to have the best performances and avoid intra-Warp memory contention.

回复收藏 0 原文