CUDA:如何在指向数组的指针数组上应用 __restrict__ ?

发布于 2024-12-19 19:45:11 字数 648 浏览 2 评论 0原文

使用两个 __restrict__ int 数组的内核可以正常编译:

__global__ void kerFoo( int* __restrict__ arr0, int* __restrict__ arr1, int num )
{
    for ( /* Iterate over array */ )
        arr1[i] = arr0[i];  // Copy one to other
}

但是,组成指针数组的相同两个 int 数组编译失败:

__global__ void kerFoo( int* __restrict__ arr[2], int num )
{
    for ( /* Iterate over array */ )
        arr[1][i] = arr[0][i];  // Copy one to other
}

编译器给出的错误是:

error: invalid use of `restrict'

我有某些由数组组成的结构指向数组的指针。 (例如,传递给内核的结构具有 int* arr[16]。)如何将它们传递给内核并能够对其应用 __restrict__

This kernel using two __restrict__ int arrays compiles fine:

__global__ void kerFoo( int* __restrict__ arr0, int* __restrict__ arr1, int num )
{
    for ( /* Iterate over array */ )
        arr1[i] = arr0[i];  // Copy one to other
}

However, the same two int arrays composed into a pointer array fails compilation:

__global__ void kerFoo( int* __restrict__ arr[2], int num )
{
    for ( /* Iterate over array */ )
        arr[1][i] = arr[0][i];  // Copy one to other
}

The error given by the compiler is:

error: invalid use of `restrict'

I have certain structures that are composed as an array of pointers to arrays. (For example, a struct passed to the kernel that has int* arr[16].) How do I pass them to kernels and be able to apply __restrict__ on them?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

清眉祭 2024-12-26 19:45:11

CUDA C手册仅参考了__restrict__的C99定义,没有特殊的CUDA特定情况。

由于指示的参数是一个包含两个指针的数组,因此 __restrict__ 的这种使用对我来说看起来完全有效,编译器没有理由抱怨恕我直言。我会要求编译器作者验证并可能/可能纠正该问题。不过,我会对不同的意见感兴趣。

对@talonmies 的评论:

restrict 的全部意义在于告诉编译器两个或多个指针参数在内存中永远不会重叠。

这并不完全正确。 restrict 告诉编译器,所讨论的指针在其生命周期内是唯一可以访问所指向对象的指针。请注意,所指向的对象仅被假定int 数组。 (事实上​​,在这种情况下它只是一个int。)由于编译器无法知道数组的大小,因此由程序员来保护数组的边界。

The CUDA C manual only refers to the C99 definition of __restrict__, no special CUDA-specific circumstances.

Since the indicated parameter is an array containing two pointers, this use of __restrict__ looks perfectly valid to me, no reason for the compiler to complain IMHO. I would ask the compiler author to verify and possibly/probably correct the issue. I'd be interested in different opinions, though.

One remark to @talonmies:

The whole point of restrict is to tell the compiler that two or more pointer arguments will never overlap in memory.

This is not strictly true. restrict tells the compiler that the pointer in question, for the duration of its lifetime, is the only pointer through which the pointed-to object can be accessed. Be aware that the object pointed to is only assumed to be an array of int. (In truth it's only one int in this case.) Since the compiler cannot know the size of the array, it is up to the programmer to guard the array's boundaries..

桃扇骨 2024-12-26 19:45:11

通过任意迭代填充代码中的注释,我们得到以下程序:

__global__ void kerFoo( int* __restrict__ arr[2], int num )
{
    for ( int i = 0; i < 1024; i ++)
        arr[1][i] = arr[0][i];  // Copy one to other
}

编译良好使用 CUDA 10.1 (Godbolt.org)。

Filling in the comment in your code with some arbitrary iteration, we get the following program:

__global__ void kerFoo( int* __restrict__ arr[2], int num )
{
    for ( int i = 0; i < 1024; i ++)
        arr[1][i] = arr[0][i];  // Copy one to other
}

and this compiles fine with CUDA 10.1 (Godbolt.org).

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文