混合 SSE 整数/浮点 SIMD 指令时,性能是否会受到影响
我最近经常以内在函数的形式使用 x86 SIMD 指令 (SSE1234)。令我沮丧的是,SSE ISA 有几个简单的指令,仅适用于浮点数或整数,但理论上对两者的性能应该相同。例如,浮点向量和双精度向量都有从地址(movhps、movhpd)加载 128 位向量的高 64 位的指令,但整数没有这样的指令向量。
我的问题:
在整数向量上使用浮点指令(例如使用movhps将数据加载到整数向量时)是否有任何理由预期性能会受到影响?
我写了几个测试来检查这一点,但我认为它们的结果不可信。编写一个正确的测试来探索此类事情的所有极端情况确实很困难,特别是当这里很可能涉及指令调度时。
相关问题:
其他类似的东西也有几个基本相同的指令。例如,我可以使用 por、orps 或 orpd 进行按位或运算。谁能解释一下这些附加说明的目的是什么?我想这可能与应用于每条指令的不同调度算法有关。
I've used x86 SIMD instructions (SSE1234) in the form of intrinsics quite a lot lately. What I found frustrating is that the SSE ISA has several simple instructions that are available only for floats or only for integers, but in theory should perform equally for both. For example, both float and double vectors have instructions to load higher 64bits of a 128-bit vector from an address (movhps, movhpd), but there's no such instruction for integer vectors.
My question:
Is there any reasons to expect a performance hit when using floating point instructions on integer vectors, e.g. using movhps to load data to an integer vector?
I wrote several tests to check that, but I suppose their results are not credible. It's really hard to write a correct test that explores all corner cases for such things, especially when the instruction scheduling is most probably involved here.
Related question:
Other trivially similar things also have several instructions that do basically the same. For example I can do bitwise OR with por, orps or orpd. Can anyone explain what's the purpose of these additional instructions? I guess this might be related to different scheduling algorithms applied to each instruction.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
来自专家(显然不是我:P): http://www.agner.org/optimize/ optimization_assemble.pdf [13.2 将向量指令与预期用途之外的其他类型的数据一起使用(第 118-119 页)]:
From an expert (obviously not me :P): http://www.agner.org/optimize/optimizing_assembly.pdf [13.2 Using vector instructions with other types of data than they are intended for (pages 118-119)]: