现代编译器上的联合是否比转换更有效?
考虑简单的代码:
UINT64 result;
UINT32 high, low;
...
result = ((UINT64)high << 32) | (UINT64)low;
现代编译器是否会将其转换为真正的高位桶移位,或者将其优化为到正确位置的简单副本?
如果没有,那么使用联合似乎比大多数人使用的轮班更有效。然而,让编译器对此进行优化是理想的解决方案。
我想知道当人们确实需要额外的一点性能时我应该如何建议他们。
Consider the simple code:
UINT64 result;
UINT32 high, low;
...
result = ((UINT64)high << 32) | (UINT64)low;
Do modern compilers turn that into a real barrel shift on high, or optimize it to a simple copy to the right location?
If not, then using a union would seem to be more efficient than the shift that most people appear to use. However, having the compiler optimize this is the ideal solution.
I'm wondering how I should advise people when they do require that extra little bit of performance.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(4)
编辑:此响应基于没有强制转换的 OP 代码的早期版本
此代码
实际上会产生未定义的结果......因为使用
high
你将 32 位值移位 32 位(值的宽度),结果将是未定义的,并且取决于编译器和操作系统平台决定如何处理移位。然后,该未定义移位的结果将与low
进行或运算,这又将是未定义的,因为您正在将未定义的值与已定义的值进行或运算,因此最终结果将是可能不是您想要的 64 位值。例如,OSX 10.6 上的gcc -s
发出的代码如下所示:因此,您可以看到移位仅发生在具有 32- 位寄存器的 32 位值上。位汇编命令...结果最终与
high | 完全相同low
根本不进行任何移位,因为在本例中,shal $32, %eax
仅返回最初位于EAX
中的值。您没有得到 64 位结果。为了避免这种情况,请将
high
转换为uint64_t
,如下所示:EDIT: This response is based on an earlier version of the OP's code that did not have a cast
This code
is actually going to have undefined results ... since with
high
you're shifting a 32-bit value by 32-bits (the width of the value), the results are going to be undefined and will depend on how a compiler and OS platform decide to handle the shift. The results of that undefined shift will then be or'd withlow
, which again will be undefined since you're or'ing an undefined value against a defined value, and so the end-result will most likely not be a 64-bit value like you want. For instance, the code emitted bygcc -s
on OSX 10.6 looks like:So you can see that the shift is only taking place on a 32-bit value in a 32-bit register with a 32-bit assembly command ... the results end up being the exact same as
high | low
without any shifting at all because in this case,shal $32, %eax
just returns the value that was originally inEAX
. You're not getting a 64-bit result.In order to avoid that, cast
high
to auint64_t
like:现代编译器比您想象的更聪明;-)(所以是的,我认为您可以期待任何像样的编译器上的桶式移位)。
无论如何,我会使用语义更接近您实际尝试执行的选项。
Modern compilers are smarter than what you might think ;-) (so yes, I think you can expect a barrel shift on any decent compiler).
Anyway, I would use the option that has a semantic closer to what you are actually trying to do.
如果这应该是独立于平台的,那么唯一的选择就是在这里使用轮班。
使用union { r64; struct{low;high}} 您无法判断低/高字段将映射到什么。考虑字节顺序。
现代编译器可以很好地处理这种转变。
If this supposed to be platform independent then the only option is to use shifts here.
With
union { r64; struct{low;high}}
you cannot tell on what low/high fields will map to. Think about endianess.Modern compilers are pretty good handling such shifts.
我编写了以下(希望有效)测试:
运行 gcc -s 的未优化输出的差异:
我不知道汇编,所以我很难分析它。然而,看起来非联合(顶部)版本上正在发生一些变化。
但启用优化
-O2
后,输出是相同的。因此生成了相同的代码,两种方式将具有相同的性能。(Linux/AMD64 上的 gcc 版本 4.5.2)
带或不带联合的优化
-O2
代码的部分输出:代码片段在
if
行生成的跳转之后立即开始。I wrote the following (hopefully valid) test:
Running a diff of the unoptimized output of
gcc -s
:I don't know assembly, so it's hard for me to analyze that. However, it looks like some shifting is taking place as expected on the non-union (top) version.
But with optimizations
-O2
enabled, the output was identical. So the same code was generated and both ways will have the same performance.(gcc version 4.5.2 on Linux/AMD64)
Partial output of optimized
-O2
code with or without union:The snippet begins immediately after the jump generated by the
if
line.