关于XMM寄存器位图的困惑
抱歉,我没有一个好的标题...
我正在阅读此主题: SSE中的向量矩阵乘法
原贴有以下代码
// xmm0 = (v0,v1,v2,v3)
movups xmm0, [eax]
// xmm0 = (v0,v0,v0,v0)
// xmm1 = (v1,v1,v1,v1)
// xmm2 = (v2,v2,v2,v2)
// xmm3 = (v3,v3,v3,v3)
shufps xmm3, xmm0, 255
shufps xmm2, xmm0, 170
shufps xmm1, xmm0, 85
shufps xmm0, xmm0, 0
有人说了以下内容:
但根据手册实际发生的情况是:(a, b, c, d) 表示 a 是位 0 到 31,b 是位 32 到 63,依此类推
// xmm0 = (v0,v1,v2,v3)
movups xmm0, [eax]
// xmm0 = (v0, v0, v0, v0)
shufps xmm0, xmm0, 0
这对我来说很有意义,因为在线性数组模型中 [elt0, elt1, elt2, ....] elt0 是数组[0]。
令我困惑的是,根据手册,xmm寄存器的位图是[127...0](见下图)。
我就像原始海报看着位图一样,认为 [elt0, elt2, elt3, elt4] 的最左边是位“11”。
那么如果我想要 xmm0 只包含 v0
shufps xmm0, xmm0, 0xFF // 11 11 11 11 === 0xFF
哪个解释是正确的?
Sorry I don't have a good title...
I was reading this thread: Vector Matrix Multiplication In SSE
The original poster had the following code
// xmm0 = (v0,v1,v2,v3)
movups xmm0, [eax]
// xmm0 = (v0,v0,v0,v0)
// xmm1 = (v1,v1,v1,v1)
// xmm2 = (v2,v2,v2,v2)
// xmm3 = (v3,v3,v3,v3)
shufps xmm3, xmm0, 255
shufps xmm2, xmm0, 170
shufps xmm1, xmm0, 85
shufps xmm0, xmm0, 0
Someone said the followings:
But what really happens according to the manual: (a, b, c, d) means a are bits 0 to 31, b are bits 32 to 63 and so on
// xmm0 = (v0,v1,v2,v3)
movups xmm0, [eax]
// xmm0 = (v0, v0, v0, v0)
shufps xmm0, xmm0, 0
This makes sense to me since in linear array model [elt0, elt1, elt2, ....] elt0 is Array[0].
What confuses me is, according to the manual the bitmap of xmm register is [127...0] (see the picture below).
I was like the original poster looking at the bitmap and thought the leftmost of [elt0, elt2, elt3, elt4] was the bit "11".
So if I want xmm0 contains only v0
shufps xmm0, xmm0, 0xFF // 11 11 11 11 === 0xFF
Which explanation is correct?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
可能会出现一些混乱,因为 xmm 寄存器(以及所有其他寄存器 BTW)中的位是从右到左编号的,即最低位在右侧,最高位在左侧:
如果您考虑 xmm 的内容注册为 32 位双字,它们也是从右到左排列:
这种混乱的根源是,如果内存中有一个数组
并将该数组加载到 xmm 寄存器中,则元素以相反的方式出现在 xmm 寄存器中命令:
因此,将第一个双字复制到 xmm 寄存器中的所有双字中的正确方法是
另外,如果您想将单个浮点数加载并广播到 xmm 寄存器的所有元素中,出于性能原因,最好使用
AVX指令集(在最近的Intel Sandy Bridge和AMD Bulldozer CPU中支持)有一个特殊的指令vbroadcastss,它执行加载和广播:
SSE3指令集包括一个类似的指令MOVDDUP,它,然而,仅适用于双打
There may be some confusion because bits in xmm registers (and all other registers BTW) are numbered right-to-left, i.e. the lowest bit is on the right, and the highest bit is on the left:
If you consider the content of xmm register as 32-bit dwords, they are also arranged right-to-left:
The source of this confusion is that if you have an array in memory
and you load this array into xmm register, the elements appear in the xmm register in the reversed order:
Therefore, the right way to copy the first dword into all dwords in an xmm register is
Also, if you want to do load-and-broadcast of a single float into all elements of an xmm register, for performance reasons it is better to use
AVX instruction set (supported in the recent Intel Sandy Bridge and AMD Bulldozer CPUs) has a special instruction vbroadcastss which performs load-and-broadcast:
SSE3 instruction set includes a similar instruction MOVDDUP, which, however, only works for doubles