如何使用 gcc 风格的内联汇编打印寄存器号？

发布于 2025-01-11 17:42:43 字数 1231 浏览 4 评论 0原文

gcc 式内联汇编的一种用例是对编译器和汇编器都不知道的指令进行编码。例如，我给出了这个示例，了解如何在太旧的工具链上使用rdrand指令支持它：

/* "rdrand %%rax ; setc %b1" */
asm volatile (".byte 0x48, 0x0f, 0xc7, 0xf0; setc %b1"
    : "=a"(result), "=qm"(success) :: "cc");

不幸的是，对指令进行硬编码意味着您还需要对其使用的寄存器进行硬编码，从而大大减少了编译器执行寄存器分配的自由度。

在某些架构上（例如带有 .insn 指令的 RISC-V），汇编器提供了一种系统地构建原始指令的方法，但这似乎是例外。

一个简单的解决方案是有一种方法来获取寄存器的未修饰编号，以将其手动编码到指令中。例如，假设存在模板修饰符 X 来打印所选寄存器的编号。然后，上面的例子可以变得更加灵活：

/* "rdrand %0 ; setc %b1" */
asm volatile (".byte 0x48 | (%X0 >> 3), 0x0f, 0xc7, 0xf0 | (%X0 & 7); setc %b1"
    : "=r"(result), "=qm"(success) :: "cc");

同样，如果有一种方法可以让 gcc 在 ARM64 上打印 12 而不是 v12 来打印 SIMD 寄存器 12，那么可以做这样的事情：

float32x4_t add3(float32x4_t a, float32x4_t b)
{
    float32x4_t c;

    /* fadd %0, %1, %2 */
    asm (".inst 0x4e20d40 + %X0 + (%X1<<5) + (%X2<<16)" : "=w"(c) : "w"(a), "w"(b));

    return c;
}

有没有办法获得寄存器号？如果不是，还有哪些其他选项可以对编译器和汇编器都不知道的指令进行编码，而无需对寄存器号进行硬编码？

原文

Inspired by a recent question.

One use case for gcc-style inline assembly is to encode instructions neither compiler nor assembler are aware of. For example, I gave this example for how to use the rdrand instruction on a toolchain too old to support it:

/* "rdrand %%rax ; setc %b1" */
asm volatile (".byte 0x48, 0x0f, 0xc7, 0xf0; setc %b1"
    : "=a"(result), "=qm"(success) :: "cc");

Unfortunately, hard-coding the instruction means that you also need to hard-code the registers used with it, greatly reducing the compiler's freedom to perform register allocation.

On some architectures (like RISC-V with its .insn directive) the assembler provides a way to systematically build original instructions, but that seems to be the exception.

A simple solution would be to have a way to obtain the undecorated number of the register to manually encode it into the instruction. For example, suppose a template modifier X existed to print the number of the register chosen. Then, the above example could be made more flexible as such:

/* "rdrand %0 ; setc %b1" */
asm volatile (".byte 0x48 | (%X0 >> 3), 0x0f, 0xc7, 0xf0 | (%X0 & 7); setc %b1"
    : "=r"(result), "=qm"(success) :: "cc");

Similarly, if there was a way to have gcc print 12 instead of v12 for SIMD register 12 on ARM64, it would be possible to do stuff like this:

float32x4_t add3(float32x4_t a, float32x4_t b)
{
    float32x4_t c;

    /* fadd %0, %1, %2 */
    asm (".inst 0x4e20d40 + %X0 + (%X1<<5) + (%X2<<16)" : "=w"(c) : "w"(a), "w"(b));

    return c;
}

Is there a way to obtain the register number? If no, what other options exist to encode instructions neither compiler nor assembler are aware of without having to hard-code register numbers?

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

只涨不跌 2025-01-18 17:42:43

我实际上遇到了同样的问题并提出了以下解决方案。

#define REG_CONST(n) asm(".equ .L__reg_const__v" #n ", " #n);

REG_CONST(0)
REG_CONST(1)
REG_CONST(2)
REG_CONST(3)
// ... repeat this for all register numbers ...
REG_CONST(27)
REG_CONST(28)
REG_CONST(29)
REG_CONST(30)

float32x4_t add3(float32x4_t a, float32x4_t b) {
    float32x4_t c;
    // fadd %0, %1, %2
    asm(".inst 0x4e20d40 | .L__reg_const__%0 | (.L__reg_const__%1 << 5) + (.L__reg_const__%2 << 16)" : "=w"(c) : "w"(a), "w"(b));

    return c;
}

这是如何运作的？

请记住，像 %0、%1、... 这样的占位符将通过编译器之前的简单字符串替换来填充寄存器名称em> 将结果传递给汇编器。
在汇编文件中，我们可以使用 .equ 指令来定义表示整数的符号。（以 .L 开头的符号在生成的目标文件中将不可见，因此我们不会不必要地弄乱符号表）
REG_CONST 的每次调用宏将定义一个（本地）符号：.L__reg_const__v0 等于 0，.L__reg_const__v1 等于 1， .L__reg_const__v2 到 2，依此类推。
这些宏被有意放置在文件的顶部，任何函数之外，因为生成的 asm(".equ .L__reg_const__v0 0") 表达式应该位于汇编文件的顶部。
在 add3 函数内的 asm(".inst ...") 模板中，%0, %1、%2 将被替换为编译器为 a、b 和 c 选择的任何寄存器。
由于我们偷偷地在 .L__reg_const__ 表达式后面直接编写了没有任何空格的占位符，因此替换会将其转换为 .L__reg_const__v7 之类的表达式。
但这与我们在顶部定义的整数符号的名称完全对应！因此汇编器实际上会将其作为符号并将其替换为我们定义的整数值。
在评估符号之后，结果是一个纯数字表达式，汇编器会很乐意将整数值“或”在一起，产生所需的操作码。

I've actually had the same problem and came up with the following solution.

#define REG_CONST(n) asm(".equ .L__reg_const__v" #n ", " #n);

REG_CONST(0)
REG_CONST(1)
REG_CONST(2)
REG_CONST(3)
// ... repeat this for all register numbers ...
REG_CONST(27)
REG_CONST(28)
REG_CONST(29)
REG_CONST(30)

float32x4_t add3(float32x4_t a, float32x4_t b) {
    float32x4_t c;
    // fadd %0, %1, %2
    asm(".inst 0x4e20d40 | .L__reg_const__%0 | (.L__reg_const__%1 << 5) + (.L__reg_const__%2 << 16)" : "=w"(c) : "w"(a), "w"(b));

    return c;
}

how does this work?

Keep in mind that the placeholder like %0, %1, ... will be filled with a register name through simple string replacements by the compiler before passing the result to the assembler.
inside assembly files we can use the .equ directive to define symbols to represent integers. (symbols that start with .L will be not be visible in the generated object file, so we don't unnecessarily clutter the symbol table)
each of the invocations of the REG_CONST macro will define a (local) symbol: .L__reg_const__v0 which will be equal to 0, .L__reg_const__v1 equal to 1, .L__reg_const__v2 to 2, and so on.
the macros are intentionally placed at the top of the file, outside any function because the resulting asm(".equ .L__reg_const__v0 0") expression is supposed to go at the top of the assembly file.
in the asm(".inst ...") template inside the add3 function the %0, %1, %2 will then be replaced with whatever register the compiler selected for a, b and c.
since we sneakily wrote the placeholder without any space directly after the .L__reg_const__ expression, the replacement will turn it into expressions like .L__reg_const__v7.
but this corresponds exactly to the name of the integer symbols we defined at the top! so the assembler will actually pick this up as a symbol and replace it with the integer value we defined.
after evaluating the symbols, the result is a purely numeric expression and the assembler will happily "or" the integer values together, yielding the desired opcode.