我可以调用“类似函数的宏”吗?在 CUDA __global__ 函数的头文件中?

发布于 2024-09-12 00:26:06 字数 863 浏览 6 评论 0原文

这是我的头文件 aes_locl.h 的一部分:

.
.
# define SWAP(x) (_lrotl(x, 8) & 0x00ff00ff | _lrotr(x, 8) & 0xff00ff00) 
# define GETU32(p) SWAP(*((u32 *)(p))) 
# define PUTU32(ct, st) { *((u32 *)(ct)) = SWAP((st)); } 
.
.

现在,我在 .cu 文件中声明了一个 __ global__ 函数并包含了头文件像这样:

#include "aes_locl.h"
.....
__global__ void cudaEncryptKern(u32* _Te0, u32* _Te1, u32* _Te2, u32* _Te3, unsigned char* in, u32* rdk, unsigned long* length)
{
    u32 *rk = rdk;
    u32 s0, s1, s2, s3, t0, t1, t2, t3;

    s0 = GETU32(in + threadIdx.x*(i) ) ^ rk[0];
}

这导致我收到以下错误消息:

错误:仅在设备模拟模式下才允许从 __ device__/__ global__ 函数调用主机函数

我有示例代码,程序员完全以这种方式调用宏。

我可以这样称呼它,还是这根本不可能?如果不是,我将感谢一些关于重写宏并将所需值分配给 S0 的最佳方法的提示。

提前非常感谢!

This is part of my header file aes_locl.h:

.
.
# define SWAP(x) (_lrotl(x, 8) & 0x00ff00ff | _lrotr(x, 8) & 0xff00ff00) 
# define GETU32(p) SWAP(*((u32 *)(p))) 
# define PUTU32(ct, st) { *((u32 *)(ct)) = SWAP((st)); } 
.
.

Now from the .cu file I have declared a __ global__ function and included the header file like this :

#include "aes_locl.h"
.....
__global__ void cudaEncryptKern(u32* _Te0, u32* _Te1, u32* _Te2, u32* _Te3, unsigned char* in, u32* rdk, unsigned long* length)
{
    u32 *rk = rdk;
    u32 s0, s1, s2, s3, t0, t1, t2, t3;

    s0 = GETU32(in + threadIdx.x*(i) ) ^ rk[0];
}

This leads me to the following error message:

error: calling a host function from a __ device__/__ global__ function is only allowed in device emulation mode

I have sample code where the programmer calls the macro exactly in that way.

Can I call it in this way, or is this not possible at all? If it is not, I will appreciate some hints of what would be the best approach to rewrite the macros and assign the desired value to S0.

thank you very much in advance!!!

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

累赘 2024-09-19 00:26:06

我认为问题不在于宏本身 - nvcc 用于 CUDA 代码的编译过程以通常的方式运行 C 预处理器,因此以这种方式使用头文件应该没问题。我相信问题出在您对 _lrotl_lrotr 的调用中。

您应该能够通过暂时删除这些调用来检查这确实是问题所在。

您应该查看 CUDA 编程指南,了解需要哪些功能来替换这些调用以在 GPU 上运行。

I think the problem is not the macros themselves - the compilation process used by nvcc for CUDA code runs the C preprocessor in the usual way and so using header files in this way should be fine. I believe the problem is in your calls to _lrotl and _lrotr.

You ought to be able to check that that is indeed the problem by temporarily removing those calls.

You should check the CUDA programming guide to see what functionality you need to replace those calls to run on the GPU.

愛放△進行李 2024-09-19 00:26:06

硬件没有内置的旋转指令,因此没有内在的东西来公开它(你不能公开不存在的东西!)。

如果 x 是 32 位,则可以向左旋转 8 位:

((x << 8) | (x >> 24))

不过,使用移位和掩码实现起来相当简单,例如, 8 将把剩下的所有内容压入八位(即丢弃最左边的八位),x >>> 24 会将所有内容向右推 24 位(即丢弃除最左边的 8 位之外的所有内容),然后按位或将它们组合在一起给出您需要的结果。

// # define SWAP(x) (_lrotl(x, 8) & 0x00ff00ff | _lrotr(x, 8) & 0xff00ff00)
# define SWAP(x) (((x << 8) | (x >> 24)) & 0x00ff00ff | ((x >> 8) | (x << 24)) & 0xff00ff00)

当然,您可以通过认识到上述内容来提高效率:

# define SWAP(x) (((x & 0xff00ff00) >> 8) | ((x & 0x00ff00ff) << 8))

The hardware doesn't have a built-in rotate instruction, and so there is no intrinsic to expose it (you can't expose something that doesn't exist!).

It's fairly simple to implement with shifts and masks though, for example if x is 32-bits then to rotate left eight bits you can do:

((x << 8) | (x >> 24))

Where x << 8 will push everything left eight bits (i.e. discarding the leftmost eight bits), x >> 24 will push everything right twnty-four bits (i.e. discarding all but the leftmost eight bits), and bitwise ORing them together gives the result you need.

// # define SWAP(x) (_lrotl(x, 8) & 0x00ff00ff | _lrotr(x, 8) & 0xff00ff00)
# define SWAP(x) (((x << 8) | (x >> 24)) & 0x00ff00ff | ((x >> 8) | (x << 24)) & 0xff00ff00)

You could of course make this more efficient by recognising that the above is overkill:

# define SWAP(x) (((x & 0xff00ff00) >> 8) | ((x & 0x00ff00ff) << 8))
自由范儿 2024-09-19 00:26:06

该错误说明了真正的问题是什么。您正在从 CUDA 函数内部调用另一个文件(属于 CPU 代码)中定义的函数/宏。这是不可能的!

您无法从 GPU 函数调用 CPU函数/宏/代码

您应该将您的定义(_lrotl() 存在于 CUDA 中吗?)放在将由 nvcc 编译的同一文件中。

The error says what the problem really is. You are calling a function/macro defined in another file (which belongs to the CPU code), from inside the CUDA function. This is impossible!

You cannot call CPU functions/macros/code from a GPU function.

You should put your definitions (does _lrotl() exist in CUDA?) inside the same file that will be compiled by nvcc.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文