为什么我的数据似乎没有对齐?
我试图弄清楚如何最好地预先计算一些正弦和余弦值,将它们存储在对齐的块中,然后稍后将它们用于 SSE 计算:
在程序开始时,我创建一个带有成员的对象:
static __m128 *m_sincos;
然后我在构造函数中初始化该成员:
m_sincos = (__m128*) _aligned_malloc(Bins*sizeof(__m128), 16);
for (int t=0; t<Bins; t++)
m_sincos[t] = _mm_set_ps(cos(t), sin(t), sin(t), cos(t));
当我使用 m_sincos 时,遇到三个问题:
- 数据似乎没有对齐
movaps xmm0, m_sincos[t] //crashes
movups xmm0, m_sincos[t] //does not crash
- 变量似乎不正确
movaps result, xmm0 // returns values that are not what is in m_sincos[t]
//Although, putting a watch on m_sincos[t] displays the correct values
- 真正让我困惑的是,这使得一切正常(但太慢):
__m128 _sincos = m_sincos[t];
movaps xmm0, _sincos
movaps result, xmm0
I'm trying to figure out how to best pre-calculate some sin and cosine values, store them in aligned blocks, and then use them later for SSE calculations:
At the beginning of my program, I create an object with member:
static __m128 *m_sincos;
then I initialize that member in the constructor:
m_sincos = (__m128*) _aligned_malloc(Bins*sizeof(__m128), 16);
for (int t=0; t<Bins; t++)
m_sincos[t] = _mm_set_ps(cos(t), sin(t), sin(t), cos(t));
When I go to use m_sincos, I run into three problems:
-The data does not seem to be aligned
movaps xmm0, m_sincos[t] //crashes
movups xmm0, m_sincos[t] //does not crash
-The variables do not seem to be correct
movaps result, xmm0 // returns values that are not what is in m_sincos[t]
//Although, putting a watch on m_sincos[t] displays the correct values
-What really confuses me is that this makes everything work (but is too slow):
__m128 _sincos = m_sincos[t];
movaps xmm0, _sincos
movaps result, xmm0
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
m_sincos[t]
是一个 C 表达式。然而,在汇编指令中(__asm
?),它被解释为 x86 寻址模式,具有完全不同的结果。例如,VS2008 SP1 将: 编译为:(当应用程序在调试模式下崩溃时,请参阅反汇编窗口)
该解释尝试将变量
t
地址处存储的 128 位值复制到 xmm0 中。然而,t
是一个可能未对齐地址的 32 位值。执行该指令可能会导致对齐失败,并且在t
的地址对齐的奇怪情况下会得到不正确的结果。您可以使用适当的 x86 寻址模式来解决此问题。这是缓慢但清晰的版本:
旁注:
当我将其放入完整的程序中时,会发生一些奇怪的事情:
当您运行此程序时,如果您密切关注寄存器窗口,您可能会注意到一些奇怪的事情。尽管结果是正确的,但在执行
movaps
指令之前,xmm0
获取了正确的值。这是怎么发生的?查看生成的汇编代码可以看出,
_mm_set_ps()
将sin/cos结果加载到xmm0
中,然后将其保存到m_sincos[t]的内存地址中
。但该值也保留在xmm0
中。_mm_set_ps
是一个“内在”,而不是函数调用;完成后它不会尝试恢复它使用的寄存器的值。如果可以从中吸取教训,那就是在使用 SSE 内部函数时,始终使用它们,以便编译器可以为您进行优化。否则,如果您使用内联汇编,也请始终使用它。
m_sincos[t]
is a C expression. In an assembly instruction, however, (__asm
?), it's interpreted as an x86 addressing mode, with a completely different result. For example, VS2008 SP1 compiles:into: (see the disassembly window when the app crashes in debug mode)
That interpretation attempts to copy a 128-bit value stored at the address of the variable
t
into xmm0.t
, however, is a 32-bit value at a likely unaligned address. Executing the instruction is likely to cause an alignment failure, and would get you incorrect results at the odd case wheret
's address is aligned.You could fix this by using an appropriate x86 addressing mode. Here's the slow but clear version:
Sidenote:
When I put this in a complete program, something odd occurs:
When you run this, if you keep an eye on the registers window, you might notice something odd. Although the results are correct,
xmm0
is getting the correct value before themovaps
instruction is executed. How does that happen?A look at the generated assembly code shows that
_mm_set_ps()
loads the sin/cos results intoxmm0
, then saves it to the memory address ofm_sincos[t]
. But the value remains there inxmm0
too._mm_set_ps
is an 'intrinsic', not a function call; it does not attempt to restore the values of registers it uses after it's done.If there's a lesson to take from this, it might be that when using the SSE intrinsic functions, use them throughout, so the compiler can optimize things for you. Otherwise, if you're using inline assembly, use that throughout too.
您应该始终使用内联函数,甚至只是将其打开并保留它们,而不是显式对其进行编码。这是因为 __asm 不可移植到 64 位代码。
You should always use the instrinsics or even just turn it on and leave them, rather than explicitly coding it in. This is because __asm is not portable to 64bit code.