分配共享内存
我试图通过使用常量参数来分配共享内存,但出现错误。我的内核看起来像这样:
__global__ void Kernel(const int count)
{
__shared__ int a[count];
}
并且我收到一条错误消息
错误:表达式必须具有常量值
count is const!为什么我会收到此错误?我该如何解决这个问题?
i am trying to allocate shared memory by using a constant parameter but getting an error. my kernel looks like this:
__global__ void Kernel(const int count)
{
__shared__ int a[count];
}
and i am getting an error saying
error: expression must have a constant value
count is const! Why am I getting this error? And how can I get around this?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(5)
CUDA 支持动态共享内存分配。如果您像这样定义内核:
然后将所需的字节数作为内核启动的第三个参数传递
,那么可以在运行时调整它的大小。请注意,运行时仅支持每个块动态声明的分配。如果您需要更多,则需要使用指向该单个分配内的偏移量的指针。使用指针时还要注意,共享内存使用 32 位字,并且所有分配都必须是 32 位字对齐,无论共享内存分配的类型如何。
CUDA supports dynamic shared memory allocation. If you define the kernel like this:
and then pass the number of bytes required as the the third argument of the kernel launch
then it can be sized at run time. Be aware that the runtime only supports a single dynamically declared allocation per block. If you need more, you will need to use pointers to offsets within that single allocation. Also be aware when using pointers that shared memory uses 32 bit words, and all allocations must be 32 bit word aligned, irrespective of the type of the shared memory allocation.
const
并不意味着“常量”,它意味着“只读”。常量表达式是编译器在编译时已知其值的东西。
const
doesn't mean "constant", it means "read-only".A constant expression is something whose value is known to the compiler at compile-time.
选项一:使用常量值声明共享内存(与
const
不同)选项二:在内核启动配置中动态声明共享内存:
注意:指向动态共享内存的指针是 所有都具有相同的地址。我用两个共享内存数组来说明如何在共享内存中手动设置两个数组。
option one: declare shared memory with constant value (not the same as
const
)option two: declare shared memory dynamically in the kernel launch configuration:
note: Pointers to dynamically shared memory are all given the same address. I use two shared memory arrays to illustrate how to manually set up two arrays in shared memory.
来自《CUDA C 编程指南》:
执行配置是通过插入以下形式的表达式来指定的:
其中:
因此,通过使用动态参数 Ns,用户可以指定一个内核函数可以使用的共享内存的总大小,无论有多少个共享变量这个内核里有。
From the "CUDA C Programming Guide":
The execution configuration is specified by inserting an expression of the form:
where:
So by using the dynamical parameter Ns, the user can specify the total size of shared memory one kernel function can use, no matter how many shared variables there are in this kernel.
你不能像这样声明共享变量..
虽然如果你对数组 a 的最大大小足够确定那么你可以直接声明,
但在这种情况下你应该担心你的程序中有多少块,因为修复了共享内存到一个块(并且没有得到充分利用),将导致您使用全局内存进行上下文切换(高延迟),从而导致性能不佳...
这个问题有一个很好的解决方案,可以
在从内存调用内核时声明和分配内存类似
,但您也应该在这里感到困扰,因为如果您在块中使用的内存多于在内核中分配的内存,您将得到意想不到的结果。
You cannot declare shared variable like this..
although if you are sure enough about the max size of array a then you can directly declare like
but in this case you should be worried about how many blocks are there in your program , since fixing shared memory to a block ( and not getting utilized fully), will lead you to context switching with global memory( high latency) , thus poor performance...
There is a nice solution to this problem to declare
and allocating the memory while calling kernel from memory like
but you should also be bothered here because if you are using more memory in blocks than you are assigning in kernel , you are going to getting unexpected results.