入口函数使用太多共享数据(0x8020 字节 + 0x10 字节系统,最大 0x4000) - CUDA 错误
我使用的是 Tesla C2050,它的计算能力为 2.0,共享内存为 48KB。但是,当我尝试使用此共享内存时,nvcc
编译器给出以下错误
Entry function '_Z4SAT3PhPdii' uses too much shared data (0x8020 bytes + 0x10 bytes system, 0x4000 max)
SAT1
是扫描算法的幼稚实现,因为我正在操作大小为订单 4096x2160
我必须使用 double 来计算累积和。尽管 Tesla C2050 不支持 double,但它仍然通过将其降级为 float 来完成任务。但对于 4096 的图像宽度,共享内存大小会大于 16KB,但完全在 48KB 限制之内。
谁能帮助我了解这里发生的事情。我正在使用 CUDA 工具包 3.0。
I am using Tesla C2050, which has a compute capability 2.0 and has 48KB shared memory . But when I try to use this shared memory the nvcc
compiler gives me the following error
Entry function '_Z4SAT3PhPdii' uses too much shared data (0x8020 bytes + 0x10 bytes system, 0x4000 max)
SAT1
is the naive implementation of a scan algorithm, and because I am operating on images sizes of the order 4096x2160
I have to use double to calculate the cumulative sum. Though Tesla C2050
does not support double, but it nevertheless does the task by demoting it to float. But for an image width of 4096 the shared memory size comes out to be greater 16KB but it is well within the 48KB limit.
Can anybody help me understand what is happening here. I am using CUDA Toolkit 3.0.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
默认情况下,Fermi 卡以兼容模式运行,每个多处理器具有 16kb 共享内存和 48kb L1 缓存。如果需要,API 调用
cudaThreadSetCacheConfig
可用于将 GPU 更改为使用 48kb 共享内存和 16kb L1 缓存运行。然后,您必须编译计算能力 2.0 的代码,以避免出现您所看到的代码生成错误。此外,您的 Telsa C2050 支持双精度。如果您收到有关降级双精度的编译器警告,则意味着您没有为正确的体系结构编译代码。添加
到您的 nvcc 参数中,GPU 工具链将为您的 Fermi 卡进行编译,并将包括双精度支持和其他 Fermi 特定硬件功能,包括更大的共享内存大小。
By default, Fermi cards run in a compatibility mode, with 16kb shared memory and 48kb L1 cache per multiprocessor. The API call
cudaThreadSetCacheConfig
can be used to change the GPU to run with 48kb shared memory and 16kb L1 cache, if you require it. You then must compile the code for compute capability 2.0 to avoid the code generation error you are seeing.Also, your Telsa C2050 does support double precision. If you are getting compiler warnings about demoting doubles, it means you are not compiling your code for the correct architecture. Add
to your
nvcc
arguments and the GPU toolchain will compile for your Fermi card, and will include double precision support and other Fermi specific hardware features, including larger shared memory size.据我所知Cuda 3.0支持计算2.0。
我使用 VS 2010 和 CUDA 4.1 。所以我假设 VS 2008 也应该有些相似。右键单击项目并选择属性-> Cuda C/C++ ->设备->代码生成。更改为compute_10,sm_10;compute_20,sm_20
As far as I know Cuda 3.0 supports compute 2.0.
I use VS 2010 with CUDA 4.1 . So I am assuming VS 2008 should be also somewhat similar. Right click on the project and select properties-> Cuda C/C++ -> Device ->Code generation. Change it to compute_10,sm_10;compute_20,sm_20