为什么Clang和GCC会产生此亚最佳输出(复制结构),以将指针传递到副价值struct arg?

发布于 2025-02-13 09:47:34 字数 853 浏览 0 评论 0原文

我有一个类似的C程序:

#include <stdio.h>

struct sa {
    char buffer[24];
};

void proceed(const struct sa *data);

static inline void func(struct sa sa) {
    proceed(&sa);
}

void test(struct sa sa) {
    func(sa);
}

似乎在测试函数的最佳组件输出中,sa的地址可以直接传递到函数,因为继续函数可以不更改data。但是,编译器(x86-64 clang 14.0和GCC 12.1,-O3优化级别)发射组件类似:

test:                                   # @test
        sub     rsp, 24
        mov     rax, qword ptr [rsp + 48]
        mov     qword ptr [rsp + 16], rax
        movaps  xmm0, xmmword ptr [rsp + 32]
        movaps  xmmword ptr [rsp], xmm0
        mov     rdi, rsp
        call    proceed
        add     rsp, 24
        ret

请注意,在输出中,整个sa struct从[RSP + 32]复制到[RSP]。为什么编译器不消除此类副本?

I have a C program like:

#include <stdio.h>

struct sa {
    char buffer[24];
};

void proceed(const struct sa *data);

static inline void func(struct sa sa) {
    proceed(&sa);
}

void test(struct sa sa) {
    func(sa);
}

It seems that in the optimal assembly output of test function, the address of the sa argument of it can be directly passed to proceed function, since the proceed function is guaranteed not to change data. However, the compiler (both x86-64 clang 14.0 and gcc 12.1, -O3 optimization level) emits assembly like:

test:                                   # @test
        sub     rsp, 24
        mov     rax, qword ptr [rsp + 48]
        mov     qword ptr [rsp + 16], rax
        movaps  xmm0, xmmword ptr [rsp + 32]
        movaps  xmmword ptr [rsp], xmm0
        mov     rdi, rsp
        call    proceed
        add     rsp, 24
        ret

Note that in the output, the whole sa struct is copied from [rsp + 32] to [rsp]. Why does the compiler not eliminate such copy?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

蓝天 2025-02-20 09:47:34

这显然是一个错过的优化错误,因为它仅在func() 的额外级别上发生。

您可以在 https://gcc.gnu.gnu.gnu.org.org/bugzilla/bugzilla/enter_bug.bug.cgi.cgi.cgi?productcccccc=gcc < /a>(GCC错误报告更喜欢AT&amp; t语法,因此请选择在您的Godbolt链接中,最好将Godbolt Link与实际代码和ASM一起包含在GCC错误报告中,因此将来的读者很快检查了它是否已修复或与之相处。

)关键字错过


由于保证不更改数据

否,因此放弃const是合法的,因为原始指向对象不是const。但它仍然不需要复制;您的功能拥有其堆栈ARG空间,并且可以让其他函数在需要时修改唯一的副本。 (大多数调用约定都可以使用,包括所有正在使用的系统V章节,例如x86-64 sysv。还认为Windows X64,args大于8个字节,非恒定(?)对空间的参考传递(?)由呼叫者,在寄存器中使用指针,或者在堆栈中有4个或更多的ARG

。它没有修改,因此在返回后,另一个带有相同ARG的调用将需要重新副本。甚至foo(const struct sa);也可以使用这种方式;没有办法声明 /承诺函数不会重复使用其堆栈ARG空间以获取刮擦空间或向尾巴呼叫。


Godbolt上的此测试案例 证明这是一个错过的优化:test将使用Just JMP func将它是Noinline,在那里不复制任何ARG。并且func的非内部定义也不会复制,只是预期的RSP对齐,然后lea rdi,[rsp+16]/呼叫呼叫继续将指针传递到其堆栈ARG。

因此,添加__属性__((NOINLINE))到您的func将导致您的test nater 呼叫nowe不用复制Arg ,仅在执行路径中使用额外的JMP。如果这是合法的,那么在func内进行内部处理也是合法的。

struct sa {
    char buffer[24];
};
void proceed(const struct sa *data);

__attribute__((noinline))
static void func(struct sa sa) {
    proceed(&sa);
}

void test_struct(struct sa sa) {
    func(sa);
}
// same as non-inline func()
// void test_struct_direct(struct sa sa) { proceed(&sa); }
# clang (trunk) -O3
# GCC is equivalent but uses sub/add instead of dummy push/pop
test_struct:
        jmp     func                            # TAILCALL
func:
        pushq   %rax                  # re-align the stack by 16
        leaq    16(%rsp), %rdi
        callq   proceed
        popq    %rax                  # clean up the stack
        retq

请随意在您的错误报告中列入该确切的Godbolt链接,或者与所注释或无意识的内容相关;它使用GCC和Clang的夜间建造,因此开发人员将知道它尚未修复。还可以随意链接此堆栈溢出Q&amp; a,但是您的错误报告应该是独立的,并指出优化是合法的,并且不注重__属性__((NOINLINE))>>>>>>>>>>>

(因此,可能对于错误报告,您需要Noinline注释,并输入test_sstruct_direct手动inlines inlines func 的版本。

This is pretty clearly a missed optimization bug since it only happens with that extra level of inlining func().

You can report bugs on https://github.com/llvm/llvm-project/issues and https://gcc.gnu.org/bugzilla/enter_bug.cgi?product=gcc (GCC bug reports prefer AT&T syntax, so select that in your Godbolt link; it's generally good to include a Godbolt link in a GCC bug report along with the actual code and asm, so it's quick for future readers to check if it's been fixed, or play around with it.)

For GCC, use the keyword missed-optimization.


since the proceed function is guaranteed not to change data

No, it's legal to cast away const because the original pointed-to object is not const. But it still doesn't need to copy; your function owns its stack arg space, and can let other functions modify the only copy if it wants. (Most calling conventions work this way, including all System V conventions such as x86-64 SysV which is in use here. Also I think Windows x64, where args larger than 8 bytes are passed by non-constant(?) reference to space reserved by the caller, with a pointer in a register, or on the stack if there are 4 or more args before it.)

The caller of test can't assume it's unmodified, so another call with the same arg would need to re-copy the struct after this returned. Even foo(const struct sa); would work this way; there's no way to declare / promise that a function doesn't reuse its stack arg space for scratch space or args to tail-calls.


This test-case on Godbolt demonstrates that it's a missed optimization: test will tailcall with just jmp func if it's noinline, not copying any args there. And that non-inline definition of func won't copy either, just the expected RSP alignment then lea rdi, [rsp+16] / call proceed to pass a pointer to its stack arg.

So adding __attribute__((noinline)) to your func will result in your test calling proceed without copying the arg, with just an extra jmp in the path of execution. If that's legal, it would also be legal to do that when inlining func.

struct sa {
    char buffer[24];
};
void proceed(const struct sa *data);

__attribute__((noinline))
static void func(struct sa sa) {
    proceed(&sa);
}

void test_struct(struct sa sa) {
    func(sa);
}
// same as non-inline func()
// void test_struct_direct(struct sa sa) { proceed(&sa); }
# clang (trunk) -O3
# GCC is equivalent but uses sub/add instead of dummy push/pop
test_struct:
        jmp     func                            # TAILCALL
func:
        pushq   %rax                  # re-align the stack by 16
        leaq    16(%rsp), %rdi
        callq   proceed
        popq    %rax                  # clean up the stack
        retq

Feel free to shortlink that exact Godbolt link in your bug report, or with something commented or uncommented; it uses nightly builds of GCC and clang so devs will know it's not already fixed. Also feel free to link this Stack Overflow Q&A, but your bug report should be self-contained and point out that the optimization is legal, and that uncommenting __attribute__((noinline)) makes the difference.

(So probably for a bug report, you'd want noinline commented out, and uncomment the test_struct_direct version that manually inlines func.)

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文