CUDA：如何在指向数组的指针数组上应用 restrict ？

发布于 2024-12-19 19:45:11 字数 648 浏览 2 评论 0原文

使用两个 __restrict__ int 数组的内核可以正常编译：

__global__ void kerFoo( int* __restrict__ arr0, int* __restrict__ arr1, int num )
{
    for ( /* Iterate over array */ )
        arr1[i] = arr0[i];  // Copy one to other
}

但是，组成指针数组的相同两个 int 数组编译失败：

__global__ void kerFoo( int* __restrict__ arr[2], int num )
{
    for ( /* Iterate over array */ )
        arr[1][i] = arr[0][i];  // Copy one to other
}

编译器给出的错误是：

error: invalid use of `restrict'

我有某些由数组组成的结构指向数组的指针。（例如，传递给内核的结构具有 int* arr[16]。）如何将它们传递给内核并能够对其应用 __restrict__？

原文

This kernel using two __restrict__ int arrays compiles fine:

__global__ void kerFoo( int* __restrict__ arr0, int* __restrict__ arr1, int num )
{
    for ( /* Iterate over array */ )
        arr1[i] = arr0[i];  // Copy one to other
}

However, the same two int arrays composed into a pointer array fails compilation:

__global__ void kerFoo( int* __restrict__ arr[2], int num )
{
    for ( /* Iterate over array */ )
        arr[1][i] = arr[0][i];  // Copy one to other
}

The error given by the compiler is:

error: invalid use of `restrict'

I have certain structures that are composed as an array of pointers to arrays. (For example, a struct passed to the kernel that has int* arr[16].) How do I pass them to kernels and be able to apply __restrict__ on them?

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

清眉祭 2024-12-26 19:45:11

CUDA C手册仅参考了__restrict__的C99定义，没有特殊的CUDA特定情况。

由于指示的参数是一个包含两个指针的数组，因此 __restrict__ 的这种使用对我来说看起来完全有效，编译器没有理由抱怨恕我直言。我会要求编译器作者验证并可能/可能纠正该问题。不过，我会对不同的意见感兴趣。

对@talonmies 的评论：

restrict 的全部意义在于告诉编译器两个或多个指针参数在内存中永远不会重叠。

这并不完全正确。 restrict 告诉编译器，所讨论的指针在其生命周期内是唯一可以访问所指向对象的指针。请注意，所指向的对象仅被假定为int 数组。（事实上，在这种情况下它只是一个int。）由于编译器无法知道数组的大小，因此由程序员来保护数组的边界。

回复收藏 0 原文

桃扇骨 2024-12-26 19:45:11

通过任意迭代填充代码中的注释，我们得到以下程序：

__global__ void kerFoo( int* __restrict__ arr[2], int num )
{
    for ( int i = 0; i < 1024; i ++)
        arr[1][i] = arr[0][i];  // Copy one to other
}

编译良好使用 CUDA 10.1 (Godbolt.org)。

Filling in the comment in your code with some arbitrary iteration, we get the following program:

__global__ void kerFoo( int* __restrict__ arr[2], int num )
{
    for ( int i = 0; i < 1024; i ++)
        arr[1][i] = arr[0][i];  // Copy one to other
}

and this compiles fine with CUDA 10.1 (Godbolt.org).

回复收藏 0 原文

~没有更多了~