我如何表明可以使用内联ASM参数指向的内存 *?

发布于 2025-02-11 07:57:17 字数 1573 浏览 2 评论 0原文

考虑以下小函数:

void foo(int* iptr) {
    iptr[10] = 1;
    __asm__ volatile ("nop"::"r"(iptr):);
    iptr[10] = 2;
}

使用GCC,此编译为

foo:
        nop
        mov     DWORD PTR [rdi+40], 2
        ret

特别是请注意,第一次写入<<代码> iptr ,iptr [10] = 1根本不会发生:inline asm nop是该功能的第一件事,只有出现2的最终写入(ASM调用后)。显然,编译器决定它仅需要提供iptr 本身的值的最新版本,而不是它指向的内存。

我可以告诉编译器,内存必须与内存 clobber一样最新,就像这样:

void foo(int* iptr) {
    iptr[10] = 1;
    __asm__ volatile ("nop"::"r"(iptr):"memory");
    iptr[10] = 2;
}

它导致预期的代码:

foo:
        mov     DWORD PTR [rdi+40], 1
        nop
        mov     DWORD PTR [rdi+40], 2
        ret

但是,这是条件的太强 ,因为它告诉编译器所有必须编写。例如,在以下功能中:

void foo2(int* iptr, long* lptr) {
    iptr[10] = 1;
    lptr[20] = 100;
    __asm__ volatile ("nop"::"r"(iptr):);
    iptr[10] = 2;
    lptr[20] = 200;
}

所需的行为是让编译器优化第一个写入lptr [20],而不是第一个写入iptr [10]“内存” clobber无法实现这一目标,因为这意味着这两个写作都必须发生:

foo2:
        mov     DWORD PTR [rdi+40], 1
        mov     QWORD PTR [rsi+160], 100 ; lptr[10] written unecessarily
        nop
        mov     DWORD PTR [rdi+40], 2
        mov     QWORD PTR [rsi+160], 200
        ret

是否有某种方法告诉编译器接受GCC Extend ASM语法,ASM的输入包括指针,并且可以指向的任何内容到?

Consider the following small function:

void foo(int* iptr) {
    iptr[10] = 1;
    __asm__ volatile ("nop"::"r"(iptr):);
    iptr[10] = 2;
}

Using gcc, this compiles to:

foo:
        nop
        mov     DWORD PTR [rdi+40], 2
        ret

Note in particular, that the first write to iptr, iptr[10] = 1 doesn't occur at all: the inline asm nop is the first thing in the function, and only the final write of 2 appears (after the ASM call). Apparently the compiler decides that it only needs to provide an up-to-date version of the value of iptr itself, but not the memory it points to.

I can tell the compiler that memory must be up to date with a memory clobber, like so:

void foo(int* iptr) {
    iptr[10] = 1;
    __asm__ volatile ("nop"::"r"(iptr):"memory");
    iptr[10] = 2;
}

which results in the expected code:

foo:
        mov     DWORD PTR [rdi+40], 1
        nop
        mov     DWORD PTR [rdi+40], 2
        ret

However, this is too strong of a condition, since it tells the compiler all memory has to be written. For example, in the following function:

void foo2(int* iptr, long* lptr) {
    iptr[10] = 1;
    lptr[20] = 100;
    __asm__ volatile ("nop"::"r"(iptr):);
    iptr[10] = 2;
    lptr[20] = 200;
}

The desired behavior is to let the compiler optimize away the first write to lptr[20], but not the first write to iptr[10]. The "memory" clobber cannot achieve this because it means both writes have to occur:

foo2:
        mov     DWORD PTR [rdi+40], 1
        mov     QWORD PTR [rsi+160], 100 ; lptr[10] written unecessarily
        nop
        mov     DWORD PTR [rdi+40], 2
        mov     QWORD PTR [rsi+160], 200
        ret

Is there some way to tell compilers accepting gcc extended asm syntax that the input to the asm includes the pointer and anything it can point to?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

游魂 2025-02-18 07:57:17

没错;要求指针作为内联ASM的输入 表示指向内存也是输入或输出,或两者兼而有之。使用寄存器输入和寄存器输出,对于所有GCC,您都知道您的ASM仅通过掩盖低位来对齐指针,或者在其中添加常数。 (在这种情况下,您将想要它来优化一个死商店。)

简单选项是asm volatile“内存” clobber < sup> 1 。

您要求的更窄的方式是使用“虚拟”内存操作以及寄存器中的指针中的指针。您的ASM模板没有引用此操作数(除了在ASM注释中以查看编译器选择的内容)。它告诉编译器您实际读取,写或读取+写入。

虚拟内存输入: “ m”(*(const int(*)[])iptr)
或输出:“ = m”(*(int(*)[])iptr)。或当然“+m”带有相同的语法。

该语法正在铸造到指针到阵列和删除,因此实际输入是c array 。 (如果您实际上有一个数组,而不是指针,则不需要任何铸件,只能作为内存操作数要求它。)

如果您将尺寸不指定的[],这告诉GCC,相对于该指针访问的任何内存都是输入,输出或输入操作数。 >,这告诉编译器特定的大小。使用运行时变量尺寸,GCC实际上会错过iptr [size+1]不是输入的一部分的优化。

gcc文档支持它。我认为,如果数组元素类型与指针相同,也许是char> char,这不是严格的违规行为。

(来自GCC手册)
x86字符串内存参数为未知长度的示例。

  asm(“ pepne scasb”
    :“ = c”(count),“+d”(p)
    :“ m”(*(*(const char(*)[])p),“ 0”(-1),“ a”(0));
 

如果您可以避免在指针输入操作数上使用早期案件,则虚拟内存输入操作数通常会使用同一寄存器选择一个简单的地址模式。

但是,如果您确实使用早期裁切案以严格正确的ASM循环,则有时虚拟操作数会在内存操作数的基础地址上对GCC浪费说明(和额外的寄存器)进行。检查编译器的ASM 输出


背景:

这是在内联弹簧示例中的广泛错误,通常没有被发现,因为ASM包裹在一个函数中,该函数不会嵌入任何诱使编译器重新排序商店以合并死商店消除的呼叫者。

gnu c内联ASM语法旨在围绕描述编译器的A 单个指令。目的是,您将编译器以“ m”“ = m”操作汇总约束告诉编译器有关内存输入或内存输出,然后选择了地址模式。

在内联ASM中编写整个循环需要注意确保编译器真正知道发生了什么(或asm volatile加上“内存” clobber),否则,您会在更改时可能会破坏周围的代码,或启用链接时间优化,允许交叉文件内衬。

另请参见 loop loop ting inline Assembly 使用assm /code>语句作为循环 body ,仍在C中进行循环逻辑。 /代码>操作数,编译器可以通过使用其选择的地址模式中的位移来展开循环。


脚注1:a “内存” clobber获取编译器将ASM视为非内部功能调用(可以读取或写入任何内存,除了逃生分析证明没有逃脱)。逃生分析包括ASM语句本身的输入操作数,以及任何早期呼叫都可以将指示器存储到的任何全局或静态变量。因此,通常不必在asm语句上使用“内存” clobber溢出/重新加载本地循环计数器。

ASM挥发性对于确保ASM的输出操作数未使用(因为您需要未遵守写作记忆的副作用)即使不会优化ASM的必要条件)。

或仅由ASM读取的内存,如果相同的输入缓冲区包含不同的输入数据,则需要ASM再次运行。没有挥发性,ASM语句可以为 csed “内存” clobber do 不是使Optimizer在考虑是否需要运行ASM语句时,将所有内存视为输入。

ASM没有输出操作数是隐式volatile,但要明确说明是一个好主意。 (GCC手册在

例如asm(“ ...总和一个数组...”:“ = r”(sum):“ r”(指针),“ r”(end_pointer):“内存”)具有一个输出操作数,因此不是隐式波动。如果您使用它,例如

 arr[5] = 1;
 total += asm_sum(arr, len);
 memcpy(arr, foo, len);
 total += asm_sum(arr, len);

没有volatile 2nd asm_sum可以优化,假设具有相同输入操作数(指针和长度)的相同ASM会产生相同的输出。对于任何不是其显式输入操作数的纯函数的ASM,您需要volatile。如果它没有优化,则,则 “内存” clobber将具有要求内存同步的所需效果。

That's correct; asking for a pointer as input to inline asm does not imply that the pointed-to memory is also an input or output or both. With a register input and register output, for all gcc knows your asm just aligns a pointer by masking off the low bits, or adds a constant to it. (In which case you would want it to optimize away a dead store.)

The simple option is asm volatile and a "memory" clobber1.

The narrower more specific way you're asking for is to use a "dummy" memory operand as well as the pointer in a register. Your asm template doesn't reference this operand (except maybe inside an asm comment to see what the compiler picked). It tells the compiler which memory you actually read, write, or read+write.

Dummy memory input: "m" (*(const int (*)[]) iptr)
or output: "=m" (*(int (*)[]) iptr). Or of course "+m" with the same syntax.

That syntax is casting to a pointer-to-array and dereferencing, so the actual input is a C array. (If you actually have an array, not pointer, you don't need any casting and can just ask for it as a memory operand.)

If you leave the size unspecified with [], that tells GCC that any memory accessed relative to that pointer is an input, output, or in/out operand. If you use [10] or [some_variable], that tells the compiler the specific size. With runtime-variable sizes, gcc in practice misses the optimization that iptr[size+1] is not part of the input.

GCC documents this and therefore supports it. I think it's not a strict-aliasing violation if the array element type is the same as the pointer, or maybe if it's char.

(from the GCC manual)
An x86 example where the string memory argument is of unknown length.

   asm("repne scasb"
    : "=c" (count), "+D" (p)
    : "m" (*(const char (*)[]) p), "0" (-1), "a" (0));

If you can avoid using an early-clobber on the pointer input operand, the dummy memory input operand will typically pick a simple addressing mode using that same register.

But if you do use an early-clobber for strict correctness of an asm loop, sometimes a dummy operand will make gcc waste instructions (and an extra register) on a base address for the memory operand. Check the asm output of the compiler.


Background:

This is a widespread bug in inline-asm examples which often goes undetected because the asm is wrapped in a function that doesn't inline into any callers that tempt the compiler into reordering stores for merging doing dead-store elimination.

GNU C inline asm syntax is designed around describing a single instruction to the compiler. The intent is that you tell the compiler about a memory input or memory output with a "m" or "=m" operand constraint, and it picks the addressing mode.

Writing whole loops in inline asm requires care to make sure the compiler really knows what's going on (or asm volatile plus a "memory" clobber), otherwise you risk breakage when changing the surrounding code, or enabling link-time optimization that allows for cross-file inlining.

See also Looping over arrays with inline assembly for using an asm statement as the loop body, still doing the loop logic in C. With actual (non-dummy) "m" and "=m" operands, the compiler can unroll the loop by using displacements in the addressing modes it chooses.


Footnote 1: A "memory" clobber gets the compiler to treat the asm like a non-inline function call (that could read or write any memory except for locals that escape analysis has proved have not escaped). The escape analysis includes input operands to the asm statement itself, but also any global or static variables that any earlier call could have stored pointers into. So usually local loop counters don't have to be spilled/reloaded around an asm statement with a "memory" clobber.

asm volatile is necessary to make sure the asm isn't optimized away even if its output operands are unused (because you require the un-declared the side-effect of writing memory to happen).

Or for memory that is only read by asm, you you need the asm to run again if the same input buffer contains different input data. Without volatile, the asm statement could be CSEd out of a loop. (A "memory" clobber does not make the optimizer treat all memory as an input when considering whether the asm statement even needs to run.)

asm with no output operands is implicitly volatile, but it's a good idea to make it explicit. (The GCC manual has a section on asm volatile).

e.g. asm("... sum an array ..." : "=r"(sum) : "r"(pointer), "r"(end_pointer) : "memory") has an output operand so is not implicitly volatile. If you used it like

 arr[5] = 1;
 total += asm_sum(arr, len);
 memcpy(arr, foo, len);
 total += asm_sum(arr, len);

Without volatile the 2nd asm_sum could optimize away, assuming that the same asm with the same input operands (pointer and length) will produce the same output. You need volatile for any asm that's not a pure function of its explicit input operands. If it doesn't optimize away, then the "memory" clobber will have the desired effect of requiring memory to be in sync.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文