如何让 ARM MULL 指令在 gcc 中的 uint64_t 中生成其输出?

发布于 2024-08-19 19:17:37 字数 137 浏览 2 评论 0原文

我想在 c99 代码库中引入一些汇编代码。我想使用 ARM CPU 的 UMULL 指令乘以 2 uint32_t 并立即将结果转换为 uint64_t。

现在一个 uint64_t 需要 2 个寄存器,那么如何指定 asm 块的输出和约束呢?

I would like to introduce some assembly code into a c99 codebase. I want to use the UMULL instruction from the ARM CPU to multiply 2 uint32_t and get the result immediately into a uint64_t.

Now a uint64_t needs 2 registers, so how do I specify the output and the constraints of the asm block?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

相思故 2024-08-26 19:17:37

好问题!

以下代码使用 GCC -O 或更高版本输出您想要的内容,而无需求助于汇编程序:

uint32_t a, b;
uint64_t c;
...
c = (uint64_t)a * (uint64_t)b;

or if you feel you must use machine-specific asm, you can go:

uint32_t a, b;
uint64_t c;

asm ("umull %Q0, %R0, %1, %2" : "=r"(c) : "r"(a), "r"(b));

c's register name is the first of the register pair, and %Q and %R pick out the lower and upper 32-bit registers of the pair. See gcc/config/arm/arm.md -> umulsidi3 for an example.

但是,如果您可以继续使用 C,那么优化器就有机会做更多事情,并且对程序的读者更友善。

Good question!

The following code outputs what you want using GCC -O or higher without resorting to assembler:


uint32_t a, b;
uint64_t c;
...
c = (uint64_t)a * (uint64_t)b;

or if you feel you must use machine-specific asm, you can go:

uint32_t a, b;
uint64_t c;

asm ("umull %Q0, %R0, %1, %2" : "=r"(c) : "r"(a), "r"(b));


c's register name is the first of the register pair, and %Q and %R pick out the lower and upper 32-bit registers of the pair. See gcc/config/arm/arm.md -> umulsidi3 for an example.

However, if you can stay in C, that gives the optimizer a chance to do more and is kinder on readers of your program.

凉墨 2024-08-26 19:17:37

umull 指令将其结果生成到两个 32 位寄存器中。我建议用类似的东西显式地重新组装 64 位值:

/* assuming the 64-bit result was stored in "hi" (upper
   half) and "lo" (lower half) */
uint64_t v = ((uint64_t)hi << 32) | (uint64_t)lo;

编译器优化器应该注意到左移是纯数据路由,并且生成的代码应该没问题。可以肯定的是,只需使用 -S 来检查编译器输出。

The umull instruction produces its results into two 32-bit registers. I suggest explicitly reassembling the 64-bit value with something like that:

/* assuming the 64-bit result was stored in "hi" (upper
   half) and "lo" (lower half) */
uint64_t v = ((uint64_t)hi << 32) | (uint64_t)lo;

The compiler optimizer should notice that the left-shift is pure data routing, and the resulting code should be fine. To be sure, just use -S to check the compiler output.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文