at&t asm 内联 c++问题

发布于 2024-08-15 16:53:02 字数 691 浏览 7 评论 0原文

我的代码

const int howmany = 5046;
char buffer[howmany];
    asm("lea     buffer,%esi"); //Get the address of buffer
    asm("mov     howmany,%ebx");         //Set the loop number
    asm("buf_loop:");                      //Lable for beginning of loop
    asm("movb     (%esi),%al");             //Copy buffer[x] to al
    asm("inc     %esi");                   //Increment buffer address
    asm("dec     %ebx");                   //Decrement loop count
    asm("jnz     buf_loop");              //jump to buf_loop if(ebx>0)

我的问题

我正在使用 gcc 编译器。由于某种原因,我的缓冲区/多少变量在我的 asm 看来是未定义的。我不知道为什么。我只想将缓冲区数组的起始地址移动到 esi 寄存器中,在将每个元素复制到 al 寄存器时循环“多少”次。

My Code

const int howmany = 5046;
char buffer[howmany];
    asm("lea     buffer,%esi"); //Get the address of buffer
    asm("mov     howmany,%ebx");         //Set the loop number
    asm("buf_loop:");                      //Lable for beginning of loop
    asm("movb     (%esi),%al");             //Copy buffer[x] to al
    asm("inc     %esi");                   //Increment buffer address
    asm("dec     %ebx");                   //Decrement loop count
    asm("jnz     buf_loop");              //jump to buf_loop if(ebx>0)

My Problem

I am using the gcc compiler. For some reason my buffer/howmany variables are undefined in the eyes of my asm. I'm not sure why. I just want to move the beginning address of my buffer array into the esi register, loop it 'howmany' times while copying each element to the al register.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

凉墨 2024-08-22 16:53:02

你在 gcc 中使用内联汇编器吗? (如果不是,具体是在其他什么 C++ 编译器中?)

如果是 gcc,请参阅详细信息 此处,特别是此示例:

    asm ("leal (%1,%1,4), %0"
         : "=r" (five_times_x)
         : "r" (x) 
         );

%0%1 引用 C 级变量,并且列出了它们具体作为 asm 的第二个(用于输出)和第三个(用于输入)参数。在您的示例中,您只有“输入”,因此您将有一个空的第二个操作数(传统上,在该冒号之后使用注释,例如 /* 无输出寄存器 */,以更明确地表明)。

Are you using the inline assembler in gcc? (If not, in what other C++ compiler, exactly?)

If gcc, see the details here, and in particular this example:

    asm ("leal (%1,%1,4), %0"
         : "=r" (five_times_x)
         : "r" (x) 
         );

%0 and %1 are referring to the C-level variables, and they're listed specifically as the second (for outputs) and third (for inputs) parameters to asm. In your example you have only "inputs" so you'd have an empty second operand (traditionally one uses a comment after that colon, such as /* no output registers */, to indicate that more explicitly).

羁拥 2024-08-22 16:53:02

像这样声明数组的部分

int howmany = 5046;
char buffer[howmany];

不是有效的 C++。在 C++ 中,不可能声明具有“变量”或运行时大小的数组。在 C++ 数组声明中,大小始终是编译时常量。

如果您的编译器允许此数组声明,则意味着它将其实现为扩展。在这种情况下,您必须自己进行研究以弄清楚它如何在内部实现这样一个运行时大小的数组。我猜测内部缓冲区将被实现为指针,而不是真正的数组。如果我的猜测是正确的并且它确实是一个指针,那么将数组地址加载到 esi 中的正确方法可能是

mov buffer,%esi

而不是像您的代码中那样的 lea 。 lea 只适用于“正常”编译时大小的数组,但不适用于运行时大小的数组。

另一个问题是您的代码中是否真的需要一个运行时大小的数组。难道是你不小心弄成这样的吗?如果您简单地更改数组的 howmany 声明,

const int howmany = 5046;

该数组将变成“普通”C++ 数组,并且您的代码可能会开始按原样工作(即使用 lea)。

The part that declares an array like that

int howmany = 5046;
char buffer[howmany];

is not valid C++. In C++ it is impossible to declare an array that has "variable" or run-time size. In C++ array declarations the size is always a compile-time constant.

If your compiler allows this array declaration, it means that it implements it as an extension. In that case you have to do your own research to figure out how it implements such a run-time sized array internally. I would guess that internally buffer will be implemented as a pointer, not as a true array. If my guess is correct and it is really a pointer, then the proper way to load the address of the array into esi might be

mov buffer,%esi

and not a lea, as in your code. lea will only work with "normal" compile-time sized arrays, but not with run-time sized arrays.

Another question is whether you really need a run-time sized array in your code. Could it be that you just made it so by mistake? If you simply change the howmany declaration to

const int howmany = 5046;

the array will turn into an "normal" C++ array and your code might start working as is (i.e. with lea).

掩耳倾听 2024-08-22 16:53:02

如果您想确保所有这些 asm 指令是连续的(它们之间没有编译器生成的代码),那么所有这些 asm 指令都需要位于相同 asm 语句中,并且您需要声明输入/输出/破坏操作数,否则您将踩到编译器的寄存器。

您不能在 C 变量名称中使用 leamov(除了全局/静态符号,它们实际上是在编译器的变量中定义的) asm 输出,但即使这样你通常也不应该)。

不要使用 mov 指令来设置输入,而是要求编译器使用输入操作数约束为您完成此操作。如果 GNU C 内联汇编语句的第一条或最后一条指令,通常意味着您做错了并且编写了低效的代码。

顺便说一句,GNU C++ 允许 C99 风格的可变长度数组,因此允许 howmany 为非 const,甚至以一种不会优化的方式设置一个常数。任何可以编译 GNU 风格内联汇编的编译器也将支持可变长度数组。


如何正确编写循环

如果这看起来过于复杂,那么 https://gcc.gnu.org/ wiki/DontUseInlineAsm。在 asm 中编写一个独立的函数,这样您就可以学习 asm,而不必学习 gcc 及其复杂但强大的内联 asm 接口。您基本上必须了解 asm 并了解编译器才能正确使用它(使用正确的约束来防止启用优化时出现损坏)。

请注意使用 %[ptr] 等命名操作数,而不是 %2%%ebx。让编译器选择要使用的寄存器通常是一件好事,但对于 x86,您可以使用 "r" 以外的字母,例如 rax 的 "=a"具体来说是/eax/ax/al。请参阅 https://gcc.gnu.org/onlinedocs/gcc/Extended-Asm .html,以及 内联汇编标记 wiki 中的其他链接。

我还使用 buf_loop%=: 在标签上附加一个唯一的编号,因此,如果优化器克隆该函数或将其内联到多个位置,该文​​件仍将进行汇编。

源+编译器asm输出开启Godbolt 编译器浏览器

void ext(char *);

int foo(void) 
{
    int howmany = 5046;   // could be a function arg
    char buffer[howmany];
    //ext(buffer);

    const char *bufptr = buffer;  // copy the pointer to a C var we can use as a read-write operand
    unsigned char result;
    asm("buf_loop%=:  \n\t"                 // do {
        "   movb     (%[ptr]), %%al \n\t"   // Copy buffer[x] to al
        "   inc     %[ptr]        \n\t"
        "   dec     %[count]      \n\t"
        "   jnz     buf_loop      \n\t"      // } while(ebx>0)
       :   [res]"=a"(result)      // al = write-only output
         , [count] "+r" (howmany) // input/output operand, any register
         , [ptr] "+r" (bufptr)
       : // no input-only operands
       : "memory"   // we read memory that isn't an input operand, only pointed to by inputs
    );
    return result;
}

我使用 %%al 作为如何显式编写寄存器名称的示例:扩展 Asm(带操作数)需要双 % 来获取文字 % 在 asm 输出中。您还可以使用 %[res]%0 并让编译器在其 asm 输出中替换 %al 。 (然后你就没有理由使用特定寄存器约束,除非你想利用cbwlodsb或类似的东西。)结果unsigned char,因此编译器会为其选择一个字节寄存器。如果您想要更宽操作数的低字节,您可以使用 %b[count] 例如。

这使用了“内存”破坏器,效率低下。您不需要编译器将所有内容溢出到内存中,只需确保内存中buffer[]的内容与C抽象机器状态匹配即可。 (这不能通过在寄存器中传递指向它的指针来保证)。

gcc7.2 -O3 输出:

    pushq   %rbp
    movl    $5046, %edx
    movq    %rsp, %rbp
    subq    $5056, %rsp
    movq    %rsp, %rcx         # compiler-emitted to satisfy our "+r" constraint for bufptr
    # start of the inline-asm block
    buf_loop18:  
       movb     (%rcx), %al 
       inc     %rcx        
       dec     %edx      
       jnz     buf_loop      
    # end of the inline-asm block

    movzbl  %al, %eax
    leave
    ret

在没有内存破坏或输入约束的情况下,leave 出现在内联 asm 块之前,在内联汇编使用现已过时的指针。在错误的时间运行的信号处理程序会破坏它。


更有效的方法是使用虚拟内存操作数,它告诉编译器整个数组是 asm 语句的只读内存输入。请参阅在内联 GNU 汇编器中获取字符串长度,了解有关此灵活数组的更多信息-member 告诉编译器您读取了整个数组而不显式指定长度的技巧。

在 C 中,您可以在强制转换中定义新类型,但在 C++ 中则不能,因此使用 using 而不是真正复杂的输入操作数。

int bar(unsigned howmany)
{
    //int howmany = 5046;
    char buffer[howmany];
    //ext(buffer);
    buffer[0] = 1;
    buffer[100] = 100;   // test whether we got the input constraints right

    //using input_t = const struct {char a[howmany];};  // requires a constant size
    using flexarray_t = const struct {char a; char x[];};
    const char *dummy;
    unsigned char result;
    asm("buf_loop%=:  \n\t"                 // do {
        "   movb     (%[ptr]), %%al \n\t"   // Copy buffer[x] to al
        "   inc     %[ptr]        \n\t"
        "   dec     %[count]      \n\t"
        "   jnz     buf_loop      \n\t"      // } while(ebx>0)
       : [res]"=a"(result)        // al = write-only output
         , [count] "+r" (howmany) // input/output operand, any register
         , "=r" (dummy)           // output operand in the same register as buffer input, so we can modify the register
       : [ptr] "2" (buffer)     // matching constraint for the dummy output
         , "m" (*(flexarray_t *) buffer)  // whole buffer as an input operand

           //, "m" (*buffer)        // just the first element: doesn't stop the buffer[100]=100 store from sinking past the inline asm, even if you used asm volatile
       : // no clobbers
    );
    buffer[100] = 101;
    return result;
}

我还使用了匹配约束,因此缓冲区可以直接作为输入,并且同一寄存器中的输出操作数意味着我们可以修改该寄存器。通过使用 const char *bufptr = buffer; ,然后使用读写约束告诉编译器该值的新值,我们在 foo() 中获得了相同的效果C 变量是我们留在寄存器中的变量。无论哪种方式,我们都会在死 C 变量中留下一个值,该值超出范围而不会被读取,但是匹配约束方式对于您不想修改输入值的宏很有用(并且不需要输入的类型:int dummy 也可以正常工作。)

buffer[100] = 100;buffer[100] = 101; 赋值是为了表明它们都出现在 asm 中,而不是在 inline-asm 中合并(如果省略 "m" 输入操作数,就会发生这种情况)。我不知道为什么 buffer[100] = 101; 没有被优化掉;它已经死了,就应该如此。另请注意,asm 易失性不会阻止此重新排序,因此它不能替代“内存”破坏器或使用正确的约束。

All of those asm instructions need to be in the same asm statement if you want to be sure they're contiguous (without compiler-generated code between them), and you need to declare input / output / clobber operands or you will step on the compiler's registers.

You can't use lea or mov to/from a C variable name (except for global / static symbols which are actually defined in the compiler's asm output, but even then you usually shouldn't).

Instead of using mov instructions to set up inputs, ask the compiler to do it for you using input operand constraints. If the first or last instruction of a GNU C inline asm statement, usually that means you're doing it wrong and writing inefficient code.

And BTW, GNU C++ allows C99-style variable-length arrays, so howmany is allowed to be non-const and even set in a way that doesn't optimize away to a constant. Any compiler that can compile GNU-style inline asm will also support variable-length arrays.


How to write your loop properly

If this looks over-complicated, then https://gcc.gnu.org/wiki/DontUseInlineAsm. Write a stand-alone function in asm so you can just learn asm instead of also having to learn about gcc and its complex but powerful inline-asm interface. You basically have to know asm and understand compilers to use it correctly (with the right constraints to prevent breakage when optimization is enabled).

Note the use of named operands like %[ptr] instead of %2 or %%ebx. Letting the compiler choose which registers to use is normally a good thing, but for x86 there are letters other than "r" you can use, like "=a" for rax/eax/ax/al specifically. See https://gcc.gnu.org/onlinedocs/gcc/Extended-Asm.html, and also other links in the inline-assembly tag wiki.

I also used buf_loop%=: to append a unique number to the label, so if the optimizer clones the function or inlines it multiple places, the file will still assemble.

Source + compiler asm output on the Godbolt compiler explorer.

void ext(char *);

int foo(void) 
{
    int howmany = 5046;   // could be a function arg
    char buffer[howmany];
    //ext(buffer);

    const char *bufptr = buffer;  // copy the pointer to a C var we can use as a read-write operand
    unsigned char result;
    asm("buf_loop%=:  \n\t"                 // do {
        "   movb     (%[ptr]), %%al \n\t"   // Copy buffer[x] to al
        "   inc     %[ptr]        \n\t"
        "   dec     %[count]      \n\t"
        "   jnz     buf_loop      \n\t"      // } while(ebx>0)
       :   [res]"=a"(result)      // al = write-only output
         , [count] "+r" (howmany) // input/output operand, any register
         , [ptr] "+r" (bufptr)
       : // no input-only operands
       : "memory"   // we read memory that isn't an input operand, only pointed to by inputs
    );
    return result;
}

I used %%al as an example of how to write register names explicitly: Extended Asm (with operands) needs a double % to get a literal % in the asm output. You could also use %[res] or %0 and let the compiler substitute %al in its asm output. (And then you'd have no reason to use a specific-register constraint unless you wanted to take advantage of cbw or lodsb or something like that.) result is unsigned char, so the compiler will pick a byte register for it. If you want the low byte of a wider operand, you could use %b[count] for example.

This uses a "memory" clobber, which is inefficient. You don't need the compiler to spill everything to memory, only to make sure that the contents of buffer[] in memory matches the C abstract machine state. (This is not guaranteed by passing a pointer to it in a register).

gcc7.2 -O3 output:

    pushq   %rbp
    movl    $5046, %edx
    movq    %rsp, %rbp
    subq    $5056, %rsp
    movq    %rsp, %rcx         # compiler-emitted to satisfy our "+r" constraint for bufptr
    # start of the inline-asm block
    buf_loop18:  
       movb     (%rcx), %al 
       inc     %rcx        
       dec     %edx      
       jnz     buf_loop      
    # end of the inline-asm block

    movzbl  %al, %eax
    leave
    ret

Without a memory clobber or input constraint, leave appears before the inline asm block, releasing that stack memory before the inline asm uses the now-stale pointer. A signal-handler running at the wrong time would clobber it.


A more efficient way is to use a dummy memory operand which tells the compiler that the entire array is a read-only memory input to the asm statement. See get string length in inline GNU Assembler for more about this flexible-array-member trick for telling the compiler you read all of an array without specifying the length explicitly.

In C you can define a new type inside a cast, but you can't in C++, hence the using instead of a really complicated input operand.

int bar(unsigned howmany)
{
    //int howmany = 5046;
    char buffer[howmany];
    //ext(buffer);
    buffer[0] = 1;
    buffer[100] = 100;   // test whether we got the input constraints right

    //using input_t = const struct {char a[howmany];};  // requires a constant size
    using flexarray_t = const struct {char a; char x[];};
    const char *dummy;
    unsigned char result;
    asm("buf_loop%=:  \n\t"                 // do {
        "   movb     (%[ptr]), %%al \n\t"   // Copy buffer[x] to al
        "   inc     %[ptr]        \n\t"
        "   dec     %[count]      \n\t"
        "   jnz     buf_loop      \n\t"      // } while(ebx>0)
       : [res]"=a"(result)        // al = write-only output
         , [count] "+r" (howmany) // input/output operand, any register
         , "=r" (dummy)           // output operand in the same register as buffer input, so we can modify the register
       : [ptr] "2" (buffer)     // matching constraint for the dummy output
         , "m" (*(flexarray_t *) buffer)  // whole buffer as an input operand

           //, "m" (*buffer)        // just the first element: doesn't stop the buffer[100]=100 store from sinking past the inline asm, even if you used asm volatile
       : // no clobbers
    );
    buffer[100] = 101;
    return result;
}

I also used a matching constraint so buffer could be an input directly, and the output operand in the same register means we can modify that register. We got the same effect in foo() by using const char *bufptr = buffer; and then using a read-write constraint to tell the compiler that the new value of that C variable is what we leave in the register. Either way we leave a value in a dead C variable that goes out of scope without being read, but the matching constraint way can be useful for macros where you don't want to modify the value of your input (and don't need the type of your input: int dummy would work fine, too.)

The buffer[100] = 100; and buffer[100] = 101; assignments are there to show that they both appear in the asm, instead of being merged across the inline-asm (which does happen if you leave out the "m" input operand). IDK why the buffer[100] = 101; isn't optimized away; it's dead so it should be. Also note that asm volatile doesn't block this reordering, so it's not an alternative to a "memory" clobber or using the right constraints.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文