我可以调用“类似函数的宏”吗？在 CUDA global 函数的头文件中？

发布于 2024-09-12 00:26:06 字数 863 浏览 10 评论 0原文

这是我的头文件 aes_locl.h 的一部分：

.
.
# define SWAP(x) (_lrotl(x, 8) & 0x00ff00ff | _lrotr(x, 8) & 0xff00ff00) 
# define GETU32(p) SWAP(*((u32 *)(p))) 
# define PUTU32(ct, st) { *((u32 *)(ct)) = SWAP((st)); } 
.
.

现在，我在 .cu 文件中声明了一个 __ global__ 函数并包含了头文件像这样：

#include "aes_locl.h"
.....
__global__ void cudaEncryptKern(u32* _Te0, u32* _Te1, u32* _Te2, u32* _Te3, unsigned char* in, u32* rdk, unsigned long* length)
{
    u32 *rk = rdk;
    u32 s0, s1, s2, s3, t0, t1, t2, t3;

    s0 = GETU32(in + threadIdx.x*(i) ) ^ rk[0];
}

这导致我收到以下错误消息：

错误：仅在设备模拟模式下才允许从 __ device__/__ global__ 函数调用主机函数

我有示例代码，程序员完全以这种方式调用宏。

我可以这样称呼它，还是这根本不可能？如果不是，我将感谢一些关于重写宏并将所需值分配给 S0 的最佳方法的提示。

提前非常感谢！

原文

This is part of my header file aes_locl.h:

.
.
# define SWAP(x) (_lrotl(x, 8) & 0x00ff00ff | _lrotr(x, 8) & 0xff00ff00) 
# define GETU32(p) SWAP(*((u32 *)(p))) 
# define PUTU32(ct, st) { *((u32 *)(ct)) = SWAP((st)); } 
.
.

Now from the .cu file I have declared a __ global__ function and included the header file like this :

#include "aes_locl.h"
.....
__global__ void cudaEncryptKern(u32* _Te0, u32* _Te1, u32* _Te2, u32* _Te3, unsigned char* in, u32* rdk, unsigned long* length)
{
    u32 *rk = rdk;
    u32 s0, s1, s2, s3, t0, t1, t2, t3;

    s0 = GETU32(in + threadIdx.x*(i) ) ^ rk[0];
}

This leads me to the following error message:

error: calling a host function from a __ device__/__ global__ function is only allowed in device emulation mode

I have sample code where the programmer calls the macro exactly in that way.

Can I call it in this way, or is this not possible at all? If it is not, I will appreciate some hints of what would be the best approach to rewrite the macros and assign the desired value to S0.

thank you very much in advance!!!

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

累赘 2024-09-19 00:26:06

我认为问题不在于宏本身 - nvcc 用于 CUDA 代码的编译过程以通常的方式运行 C 预处理器，因此以这种方式使用头文件应该没问题。我相信问题出在您对 _lrotl 和 _lrotr 的调用中。

您应该能够通过暂时删除这些调用来检查这确实是问题所在。

您应该查看 CUDA 编程指南，了解需要哪些功能来替换这些调用以在 GPU 上运行。

回复收藏 0 原文

愛放△進行李 2024-09-19 00:26:06

硬件没有内置的旋转指令，因此没有内在的东西来公开它（你不能公开不存在的东西！）。

如果 x 是 32 位，则可以向左旋转 8 位：

((x << 8) | (x >> 24))

不过，使用移位和掩码实现起来相当简单，例如， 8 将把剩下的所有内容压入八位（即丢弃最左边的八位），x >>> 24 会将所有内容向右推 24 位（即丢弃除最左边的 8 位之外的所有内容），然后按位或将它们组合在一起给出您需要的结果。

// # define SWAP(x) (_lrotl(x, 8) & 0x00ff00ff | _lrotr(x, 8) & 0xff00ff00)
# define SWAP(x) (((x << 8) | (x >> 24)) & 0x00ff00ff | ((x >> 8) | (x << 24)) & 0xff00ff00)

当然，您可以通过认识到上述内容来提高效率：

# define SWAP(x) (((x & 0xff00ff00) >> 8) | ((x & 0x00ff00ff) << 8))

The hardware doesn't have a built-in rotate instruction, and so there is no intrinsic to expose it (you can't expose something that doesn't exist!).

It's fairly simple to implement with shifts and masks though, for example if x is 32-bits then to rotate left eight bits you can do:

((x << 8) | (x >> 24))

Where x << 8 will push everything left eight bits (i.e. discarding the leftmost eight bits), x >> 24 will push everything right twnty-four bits (i.e. discarding all but the leftmost eight bits), and bitwise ORing them together gives the result you need.

// # define SWAP(x) (_lrotl(x, 8) & 0x00ff00ff | _lrotr(x, 8) & 0xff00ff00)
# define SWAP(x) (((x << 8) | (x >> 24)) & 0x00ff00ff | ((x >> 8) | (x << 24)) & 0xff00ff00)

You could of course make this more efficient by recognising that the above is overkill:

# define SWAP(x) (((x & 0xff00ff00) >> 8) | ((x & 0x00ff00ff) << 8))

回复收藏 0 原文

自由范儿 2024-09-19 00:26:06

该错误说明了真正的问题是什么。您正在从 CUDA 函数内部调用另一个文件（属于 CPU 代码）中定义的函数/宏。这是不可能的！

您无法从 GPU 函数调用 CPU函数/宏/代码。

您应该将您的定义（_lrotl() 存在于 CUDA 中吗？）放在将由 nvcc 编译的同一文件中。

回复收藏 0 原文

~没有更多了~

关于作者

绅士风度i

暂无简介

0 文章

0 评论

23 人气

关注发私信

友情链接

文江博客

我可以调用“类似函数的宏”吗？在 CUDA global 函数的头文件中？

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（3）

关于作者

相关话题

热门标签

推荐作者

謌踐踏愛綪

开始看清了

高速公鹿

alipaysp_PLnULTzf66

热情消退

白色月光

友情链接

我可以调用“类似函数的宏”吗？在 CUDA __global__ 函数的头文件中？

如果你对这篇内容有疑问，欢迎到本站社区发帖提问 参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（3）

关于作者

相关话题

热门标签

推荐作者

謌踐踏愛綪

开始看清了

高速公鹿

alipaysp_PLnULTzf66

热情消退

白色月光

友情链接

我可以调用“类似函数的宏”吗？在 CUDA global 函数的头文件中？

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。