GCC 内联汇编器，混合寄存器大小 (x86)

发布于 2024-07-06 02:17:08 字数 567 浏览 7 评论 0原文

有谁知道如何摆脱以下汇编器警告？

代码是 x86，32 位：

int test (int x)
{
  int y;
  // do a bit-rotate by 8 on the lower word. leave upper word intact.
  asm ("rorw $8, %0\n\t": "=q"(y) :"0"(x));
  return y;
}

如果我编译它，我会收到以下（非常有效）警告：

Warning: using `%ax' instead of `%eax' due to `w' suffix

我正在寻找的是一种告诉编译器/汇编器我想要访问 % 的低 16 位子寄存器的方法0。访问字节子寄存器（在本例中为 AL 和 AH）也很高兴知道。

我已经选择了“q”修饰符，因此编译器被迫使用 EAX、EBX、ECX 或 EDX。我已经确保编译器必须选择一个具有子寄存器的寄存器。

我知道我可以强制 asm 代码使用特定的寄存器（及其子寄存器），但我想将寄存器分配工作留给编译器。

原文

Does anyone know how I can get rid of the following assembler warning?

Code is x86, 32 bit:

int test (int x)
{
  int y;
  // do a bit-rotate by 8 on the lower word. leave upper word intact.
  asm ("rorw $8, %0\n\t": "=q"(y) :"0"(x));
  return y;
}

If I compile it I get the following (very valid) warning:

Warning: using `%ax' instead of `%eax' due to `w' suffix

What I'm looking for is a way to tell the compiler/assembler that I want to access the lower 16 bit sub-register of %0. Accessing the byte sub-registers (in this case AL and AH) would be nice to know as well.

I've already chosen the "q" modifier, so the compiler is forced to use EAX, EBX, ECX or EDX. I've made sure the compiler has to pick a register that has sub-registers.

I know that I can force the asm-code to use a specific register (and its sub-registers), but I want to leave the register-allocation job up to the compiler.

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

空‖城人不在 2024-07-13 02:17:08

如果我没记错的话，你可以使用 %w0 。我刚刚也测试过。 :-)

int
test(int x)
{
    int y;
    asm ("rorw $8, %w0" : "=q" (y) : "0" (x));
    return y;
}

编辑：作为对OP的回应，是的，您也可以执行以下操作：

int
test(int x)
{
    int y;
    asm ("xchg %b0, %h0" : "=Q" (y) : "0" (x));
    return y;
}

对于x86，它记录在 x86 操作数修饰符部分手册扩展汇编部分。

对于非 x86 指令集，您可能需要深入研究 GCC 源代码中的 .md 文件。例如，在正式记录之前，gcc/config/i386/i386.md 是唯一可以找到它的地方。

You can use %w0 if I remember right. I just tested it, too. :-)

int
test(int x)
{
    int y;
    asm ("rorw $8, %w0" : "=q" (y) : "0" (x));
    return y;
}

Edit: In response to the OP, yes, you can do the following too:

int
test(int x)
{
    int y;
    asm ("xchg %b0, %h0" : "=Q" (y) : "0" (x));
    return y;
}

For x86 it's documented in the x86 Operand Modifiers section of the Extended Asm part of the manual.

For non-x86 instruction sets, you may have to dig through their .md files in the GCC source. For example, gcc/config/i386/i386.md was the only place to find this before it was officially documented.

回复收藏 0 原文

余生一个溪 2024-07-13 02:17:08

很久以前，但我可能需要这个作为我自己未来的参考......

添加到克里斯的好答案说，关键是在“％”和输出操作数的数量之间使用修饰符。例如，“MOV %1, %0” 可能会变为 “MOV %q1, %w0”。

我在constraints.md中找不到任何内容，但是 / gcc/config/i386/i386.c 在 print_reg() 的源代码中有这个可能有用的注释：

/* Print the name of register X to FILE based on its machine mode and number.
   If CODE is 'w', pretend the mode is HImode.
   If CODE is 'b', pretend the mode is QImode.
   If CODE is 'k', pretend the mode is SImode.
   If CODE is 'q', pretend the mode is DImode.
   If CODE is 'x', pretend the mode is V4SFmode.
   If CODE is 't', pretend the mode is V8SFmode.
   If CODE is 'h', pretend the reg is the 'high' byte register.
   If CODE is 'y', print "st(0)" instead of "st", if the reg is stack op.
   If CODE is 'd', duplicate the operand for AVX instruction.
 */

下面的 ix86_print_operand() 注释提供了一个示例：

b -- 打印指定操作数的寄存器的 QImode 名称。
如果操作数[0]是reg 0，%b0将打印%al。

GCC 内部文档的“noreferrer">输出模板：

'%cdigit' 可用于替换常量操作数
没有通常指示立即操作数的语法的值。
'%ndigit' 与 '%cdigit' 类似，只不过常量的值为
打印前求反。
'%adigit' 可用于替换操作数，就好像它是内存一样
引用，实际操作数被视为地址。这可能是
当输出“加载地址”指令时很有用，因为通常
此类指令的汇编语法要求您编写
操作数就像内存引用一样。
“%ldigit”用于将 label_ref 替换为跳转指令。
'%=' 输出一个数字，该数字对于指令中的每条指令都是唯一的
整个编译。这对于制作本地标签很有用
在生成的单个模板中多次引用
多个汇编指令。

“%c2”构造允许使用偏移量正确格式化 LEA 指令：

#define ASM_LEA_ADD_BYTES(ptr, bytes)                            \
    __asm volatile("lea %c1(%0), %0" :                           \
                   /* reads/writes %0 */  "+r" (ptr) :           \
                   /* reads */ "i" (bytes));

请注意“%c1”中关键但记录稀疏的“c”。该宏相当于

ptr = (char *)ptr + bytes

但不使用通常的整数算术执行端口。

编辑添加：

在 x64 中进行直接调用可能很困难，因为它需要另一个未记录的修饰符：“%P0”（似乎用于 PIC）

#define ASM_CALL_FUNC(func)                                         \
    __asm volatile("call %P0") :                                    \
              /* no writes */ :                                     \
              /* reads %0 */ "i" (func))

小写的“p”修饰符似乎也可以GCC 中的功能相同，但 ICC 只识别大写“P”。更多详细信息可能位于 /gcc/config/i386/ i386.c。搜索“'p'”。

Long ago, but I'll likely need this for my own future reference...

Adding on to Chris's fine answer says, the key is using a modifier between the '%' and the number of the output operand. For example, "MOV %1, %0" might become "MOV %q1, %w0".

I couldn't find anything in constraints.md, but /gcc/config/i386/i386.c had this potentially useful comment in the source for print_reg():

/* Print the name of register X to FILE based on its machine mode and number.
   If CODE is 'w', pretend the mode is HImode.
   If CODE is 'b', pretend the mode is QImode.
   If CODE is 'k', pretend the mode is SImode.
   If CODE is 'q', pretend the mode is DImode.
   If CODE is 'x', pretend the mode is V4SFmode.
   If CODE is 't', pretend the mode is V8SFmode.
   If CODE is 'h', pretend the reg is the 'high' byte register.
   If CODE is 'y', print "st(0)" instead of "st", if the reg is stack op.
   If CODE is 'd', duplicate the operand for AVX instruction.
 */

A comment below for ix86_print_operand() offer an example:

b -- print the QImode name of the register for the indicated operand.
%b0 would print %al if operands[0] is reg 0.

A few more useful options are listed under Output Template of the GCC Internals documentation:

‘%cdigit’ can be used to substitute an operand that is a constant
value without the syntax that normally indicates an immediate operand.
‘%ndigit’ is like ‘%cdigit’ except that the value of the constant is
negated before printing.
‘%adigit’ can be used to substitute an operand as if it were a memory
reference, with the actual operand treated as the address. This may be
useful when outputting a “load address” instruction, because often the
assembler syntax for such an instruction requires you to write the
operand as if it were a memory reference.
‘%ldigit’ is used to substitute a label_ref into a jump instruction.
‘%=’ outputs a number which is unique to each instruction in the
entire compilation. This is useful for making local labels to be
referred to more than once in a single template that generates
multiple assembler instructions.

The '%c2' construct allows one to properly format an LEA instruction using an offset:

#define ASM_LEA_ADD_BYTES(ptr, bytes)                            \
    __asm volatile("lea %c1(%0), %0" :                           \
                   /* reads/writes %0 */  "+r" (ptr) :           \
                   /* reads */ "i" (bytes));

Note the crucial but sparsely documented 'c' in '%c1'. This macro is equivalent to

ptr = (char *)ptr + bytes

but without making use of the usual integer arithmetic execution ports.

Edit to add:

Making direct calls in x64 can be difficult, as it requires yet another undocumented modifier: '%P0' (which seems to be for PIC)

#define ASM_CALL_FUNC(func)                                         \
    __asm volatile("call %P0") :                                    \
              /* no writes */ :                                     \
              /* reads %0 */ "i" (func))

A lower case 'p' modifier also seems to function the same in GCC, although only the capital 'P' is recognized by ICC. More details are probably available at /gcc/config/i386/i386.c. Search for "'p'".

回复收藏 0 原文

孤者何惧 2024-07-13 02:17:08

当我在考虑这个问题时...您应该在 Chris 的第二个解决方案中将“q”约束替换为大写“Q”约束：

int
test(int x)
{
    int y;
    asm ("xchg %b0, %h0" : "=Q" (y) : "0" (x));
    return y;
}

“q”和“Q”在 64 位模式下略有不同，您可以在其中获得所有整数寄存器（ax、bx、cx、dx、si、di、sp、bp、r8-r15）的最低字节。但你只能得到四个原始386寄存器（ax、bx、cx、dx）的第二低字节（例如ah）。

While I'm thinking about it ... you should replace the "q" constraint with a capital "Q" constraint in Chris's second solution:

int
test(int x)
{
    int y;
    asm ("xchg %b0, %h0" : "=Q" (y) : "0" (x));
    return y;
}

"q" and "Q" are slightly different in 64-bit mode, where you can get the lowest byte for all of the integer registers (ax, bx, cx, dx, si, di, sp, bp, r8-r15). But you can only get the second-lowest byte (e.g. ah) for the four original 386 registers (ax, bx, cx, dx).

回复收藏 0 原文

爱格式化 2024-07-13 02:17:08

显然有一些技巧可以做到这一点......但它可能不是那么有效。 32 位 x86 处理器在操作通用寄存器中的 16 位数据时通常慢。如果性能很重要，您应该对其进行基准测试。

除非这是 (a) 性能关键并且 (b) 被证明更快，否则我会为自己省去一些维护麻烦，只需在 C 中完成：

uint32_t y, hi=(x&~0xffff), lo=(x&0xffff);
y = hi + (((lo >> 8) + (lo << 8))&0xffff);

使用 GCC 4.2 和 -O2，这可以优化到六个指令...

So apparently there are tricks to do this... but it may not be so efficient. 32-bit x86 processors are generally slow at manipulating 16-bit data in general purpose registers. You ought to benchmark it if performance is important.

Unless this is (a) performance critical and (b) proves to be much faster, I would save myself some maintenance hassle and just do it in C:

uint32_t y, hi=(x&~0xffff), lo=(x&0xffff);
y = hi + (((lo >> 8) + (lo << 8))&0xffff);

With GCC 4.2 and -O2 this gets optimized down to six instructions...

回复收藏 0 原文

~没有更多了~

关于作者

赠佳期

暂无简介

0 文章

0 评论

22 人气

关注发私信

友情链接

文江博客

GCC 内联汇编器，混合寄存器大小 (x86)

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（4）

关于作者

相关话题

热门标签

推荐作者

留蓝

18790681156

zach7772

Wini

ayeshaaroy

初雪

友情链接

GCC 内联汇编器，混合寄存器大小 (x86)

如果你对这篇内容有疑问，欢迎到本站社区发帖提问 参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（4）

关于作者

相关话题

热门标签

推荐作者

留蓝

18790681156

zach7772

Wini

ayeshaaroy

初雪

友情链接

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。