为什么Clang和GCC会产生此亚最佳输出(复制结构),以将指针传递到副价值struct arg?
我有一个类似的C程序:
#include <stdio.h>
struct sa {
char buffer[24];
};
void proceed(const struct sa *data);
static inline void func(struct sa sa) {
proceed(&sa);
}
void test(struct sa sa) {
func(sa);
}
似乎在测试
函数的最佳组件输出中,sa
的地址可以直接传递到函数,因为
继续
函数可以不更改data
。但是,编译器(x86-64 clang 14.0和GCC 12.1,-O3优化级别)发射组件类似:
test: # @test
sub rsp, 24
mov rax, qword ptr [rsp + 48]
mov qword ptr [rsp + 16], rax
movaps xmm0, xmmword ptr [rsp + 32]
movaps xmmword ptr [rsp], xmm0
mov rdi, rsp
call proceed
add rsp, 24
ret
请注意,在输出中,整个sa
struct从[RSP + 32]复制到[RSP]。为什么编译器不消除此类副本?
I have a C program like:
#include <stdio.h>
struct sa {
char buffer[24];
};
void proceed(const struct sa *data);
static inline void func(struct sa sa) {
proceed(&sa);
}
void test(struct sa sa) {
func(sa);
}
It seems that in the optimal assembly output of test
function, the address of the sa
argument of it can be directly passed to proceed
function, since the proceed
function is guaranteed not to change data
. However, the compiler (both x86-64 clang 14.0 and gcc 12.1, -O3 optimization level) emits assembly like:
test: # @test
sub rsp, 24
mov rax, qword ptr [rsp + 48]
mov qword ptr [rsp + 16], rax
movaps xmm0, xmmword ptr [rsp + 32]
movaps xmmword ptr [rsp], xmm0
mov rdi, rsp
call proceed
add rsp, 24
ret
Note that in the output, the whole sa
struct is copied from [rsp + 32] to [rsp]. Why does the compiler not eliminate such copy?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
data:image/s3,"s3://crabby-images/d5906/d59060df4059a6cc364216c4d63ceec29ef7fe66" alt="扫码二维码加入Web技术交流群"
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
这显然是一个错过的优化错误,因为它仅在
func()
的额外级别上发生。您可以在和 https://gcc.gnu.gnu.gnu.org.org/bugzilla/bugzilla/enter_bug.bug.cgi.cgi.cgi?productcccccc=gcc < /a>(GCC错误报告更喜欢AT&amp; t语法,因此请选择在您的Godbolt链接中,最好将Godbolt Link与实际代码和ASM一起包含在GCC错误报告中,因此将来的读者很快检查了它是否已修复或与之相处。
)关键字
错过
。否,因此放弃
const
是合法的,因为原始指向对象不是const
。但它仍然不需要复制;您的功能拥有其堆栈ARG空间,并且可以让其他函数在需要时修改唯一的副本。 (大多数调用约定都可以使用,包括所有正在使用的系统V章节,例如x86-64 sysv。还认为Windows X64,args大于8个字节,非恒定(?)对空间的参考传递(?)由呼叫者,在寄存器中使用指针,或者在堆栈中有4个或更多的ARG。它没有修改,因此在返回后,另一个带有相同ARG的调用将需要重新副本。甚至
foo(const struct sa);
也可以使用这种方式;没有办法声明 /承诺函数不会重复使用其堆栈ARG空间以获取刮擦空间或向尾巴呼叫。Godbolt上的此测试案例 证明这是一个错过的优化:
test
将使用JustJMP func
将它是Noinline
,在那里不复制任何ARG。并且func
的非内部定义也不会复制,只是预期的RSP对齐,然后lea rdi,[rsp+16]
/呼叫呼叫继续将指针传递到其堆栈ARG。
因此,添加
__属性__((NOINLINE))
到您的func
将导致您的test
nater 呼叫nowe
不用复制Arg ,仅在执行路径中使用额外的JMP
。如果这是合法的,那么在func
内进行内部处理也是合法的。请随意在您的错误报告中列入该确切的Godbolt链接,或者与所注释或无意识的内容相关;它使用GCC和Clang的夜间建造,因此开发人员将知道它尚未修复。还可以随意链接此堆栈溢出Q&amp; a,但是您的错误报告应该是独立的,并指出优化是合法的,并且不注重
__属性__((NOINLINE))
>>>>>>>>>>>(因此,可能对于错误报告,您需要
Noinline
注释,并输入test_sstruct_direct
手动inlines inlinesfunc
的版本。This is pretty clearly a missed optimization bug since it only happens with that extra level of inlining
func()
.You can report bugs on https://github.com/llvm/llvm-project/issues and https://gcc.gnu.org/bugzilla/enter_bug.cgi?product=gcc (GCC bug reports prefer AT&T syntax, so select that in your Godbolt link; it's generally good to include a Godbolt link in a GCC bug report along with the actual code and asm, so it's quick for future readers to check if it's been fixed, or play around with it.)
For GCC, use the keyword
missed-optimization
.No, it's legal to cast away
const
because the original pointed-to object is notconst
. But it still doesn't need to copy; your function owns its stack arg space, and can let other functions modify the only copy if it wants. (Most calling conventions work this way, including all System V conventions such as x86-64 SysV which is in use here. Also I think Windows x64, where args larger than 8 bytes are passed by non-constant(?) reference to space reserved by the caller, with a pointer in a register, or on the stack if there are 4 or more args before it.)The caller of
test
can't assume it's unmodified, so another call with the same arg would need to re-copy the struct after this returned. Evenfoo(const struct sa);
would work this way; there's no way to declare / promise that a function doesn't reuse its stack arg space for scratch space or args to tail-calls.This test-case on Godbolt demonstrates that it's a missed optimization:
test
will tailcall with justjmp func
if it'snoinline
, not copying any args there. And that non-inline definition offunc
won't copy either, just the expected RSP alignment thenlea rdi, [rsp+16]
/call proceed
to pass a pointer to its stack arg.So adding
__attribute__((noinline))
to yourfunc
will result in yourtest
callingproceed
without copying the arg, with just an extrajmp
in the path of execution. If that's legal, it would also be legal to do that when inliningfunc
.Feel free to shortlink that exact Godbolt link in your bug report, or with something commented or uncommented; it uses nightly builds of GCC and clang so devs will know it's not already fixed. Also feel free to link this Stack Overflow Q&A, but your bug report should be self-contained and point out that the optimization is legal, and that uncommenting
__attribute__((noinline))
makes the difference.(So probably for a bug report, you'd want
noinline
commented out, and uncomment thetest_struct_direct
version that manually inlinesfunc
.)