CUDA:如何在指向数组的指针数组上应用 __restrict__ ?
使用两个 __restrict__ int 数组的内核可以正常编译:
__global__ void kerFoo( int* __restrict__ arr0, int* __restrict__ arr1, int num )
{
for ( /* Iterate over array */ )
arr1[i] = arr0[i]; // Copy one to other
}
但是,组成指针数组的相同两个 int 数组编译失败:
__global__ void kerFoo( int* __restrict__ arr[2], int num )
{
for ( /* Iterate over array */ )
arr[1][i] = arr[0][i]; // Copy one to other
}
编译器给出的错误是:
error: invalid use of `restrict'
我有某些由数组组成的结构指向数组的指针。 (例如,传递给内核的结构具有 int* arr[16]
。)如何将它们传递给内核并能够对其应用 __restrict__
?
This kernel using two __restrict__
int arrays compiles fine:
__global__ void kerFoo( int* __restrict__ arr0, int* __restrict__ arr1, int num )
{
for ( /* Iterate over array */ )
arr1[i] = arr0[i]; // Copy one to other
}
However, the same two int arrays composed into a pointer array fails compilation:
__global__ void kerFoo( int* __restrict__ arr[2], int num )
{
for ( /* Iterate over array */ )
arr[1][i] = arr[0][i]; // Copy one to other
}
The error given by the compiler is:
error: invalid use of `restrict'
I have certain structures that are composed as an array of pointers to arrays. (For example, a struct passed to the kernel that has int* arr[16]
.) How do I pass them to kernels and be able to apply __restrict__
on them?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
data:image/s3,"s3://crabby-images/d5906/d59060df4059a6cc364216c4d63ceec29ef7fe66" alt="扫码二维码加入Web技术交流群"
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
CUDA C手册仅参考了
__restrict__
的C99定义,没有特殊的CUDA特定情况。由于指示的参数是一个包含两个指针的数组,因此 __restrict__ 的这种使用对我来说看起来完全有效,编译器没有理由抱怨恕我直言。我会要求编译器作者验证并可能/可能纠正该问题。不过,我会对不同的意见感兴趣。
对@talonmies 的评论:
这并不完全正确。
restrict
告诉编译器,所讨论的指针在其生命周期内是唯一可以访问所指向对象的指针。请注意,所指向的对象仅被假定为int
数组。 (事实上,在这种情况下它只是一个int
。)由于编译器无法知道数组的大小,因此由程序员来保护数组的边界。The CUDA C manual only refers to the C99 definition of
__restrict__
, no special CUDA-specific circumstances.Since the indicated parameter is an array containing two pointers, this use of
__restrict__
looks perfectly valid to me, no reason for the compiler to complain IMHO. I would ask the compiler author to verify and possibly/probably correct the issue. I'd be interested in different opinions, though.One remark to @talonmies:
This is not strictly true.
restrict
tells the compiler that the pointer in question, for the duration of its lifetime, is the only pointer through which the pointed-to object can be accessed. Be aware that the object pointed to is only assumed to be an array ofint
. (In truth it's only oneint
in this case.) Since the compiler cannot know the size of the array, it is up to the programmer to guard the array's boundaries..通过任意迭代填充代码中的注释,我们得到以下程序:
编译良好使用 CUDA 10.1 (Godbolt.org)。
Filling in the comment in your code with some arbitrary iteration, we get the following program:
and this compiles fine with CUDA 10.1 (Godbolt.org).