关于XMM寄存器位图的困惑

发布于 2024-12-18 07:57:15 字数 1040 浏览 1 评论 0原文

抱歉，我没有一个好的标题...

原贴有以下代码

// xmm0 = (v0,v1,v2,v3)
movups xmm0, [eax]

// xmm0 = (v0,v0,v0,v0)
// xmm1 = (v1,v1,v1,v1)
// xmm2 = (v2,v2,v2,v2)
// xmm3 = (v3,v3,v3,v3)
shufps xmm3, xmm0, 255
shufps xmm2, xmm0, 170
shufps xmm1, xmm0, 85
shufps xmm0, xmm0, 0

有人说了以下内容：

但根据手册实际发生的情况是：(a, b, c, d) 表示 a 是位 0 到 31，b 是位 32 到 63，依此类推

// xmm0 = (v0,v1,v2,v3)
movups xmm0, [eax]

// xmm0 = (v0, v0, v0, v0)
shufps xmm0, xmm0, 0

这对我来说很有意义，因为在线性数组模型中 [elt0, elt1, elt2, ....] elt0 是数组[0]。

令我困惑的是，根据手册，xmm寄存器的位图是[127...0]（见下图）。

我就像原始海报看着位图一样，认为 [elt0, elt2, elt3, elt4] 的最左边是位“11”。

那么如果我想要 xmm0 只包含 v0

shufps xmm0, xmm0, 0xFF  // 11 11 11 11  === 0xFF

哪个解释是正确的？

在此处输入图像描述

原文

Sorry I don't have a good title...

I was reading this thread: Vector Matrix Multiplication In SSE

The original poster had the following code

// xmm0 = (v0,v1,v2,v3)
movups xmm0, [eax]

// xmm0 = (v0,v0,v0,v0)
// xmm1 = (v1,v1,v1,v1)
// xmm2 = (v2,v2,v2,v2)
// xmm3 = (v3,v3,v3,v3)
shufps xmm3, xmm0, 255
shufps xmm2, xmm0, 170
shufps xmm1, xmm0, 85
shufps xmm0, xmm0, 0

Someone said the followings:

But what really happens according to the manual: (a, b, c, d) means a are bits 0 to 31, b are bits 32 to 63 and so on

// xmm0 = (v0,v1,v2,v3)
movups xmm0, [eax]

// xmm0 = (v0, v0, v0, v0)
shufps xmm0, xmm0, 0

This makes sense to me since in linear array model [elt0, elt1, elt2, ....] elt0 is Array[0].

What confuses me is, according to the manual the bitmap of xmm register is [127...0] (see the picture below).

I was like the original poster looking at the bitmap and thought the leftmost of [elt0, elt2, elt3, elt4] was the bit "11".

So if I want xmm0 contains only v0

shufps xmm0, xmm0, 0xFF  // 11 11 11 11  === 0xFF

Which explanation is correct?

enter image description here

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

思念绕指尖 2024-12-25 07:57:15

可能会出现一些混乱，因为 xmm 寄存器（以及所有其他寄存器 BTW）中的位是从右到左编号的，即最低位在右侧，最高位在左侧：

xmm0 = [bit 127, bit 126, ..., bit 1, bit 0]

如果您考虑 xmm 的内容注册为 32 位双字，它们也是从右到左排列：

xmm0 = [dword 3, dword 2, dword 1, dword 0]

这种混乱的根源是，如果内存中有一个数组

float A[4] = { 0.0f, 1.0f, 2.0f, 3.0f };

并将该数组加载到 xmm 寄存器中，则元素以相反的方式出现在 xmm 寄存器中命令：

; xmm0 = (A3 = 3.0f, A2 = 2.0f, A1 = 1.0f, A0 = 0.0f) after the load
movups xmm0, [A]

因此，将第一个双字复制到 xmm 寄存器中的所有双字中的正确方法是

shufps xmm0, xmm0, 0

另外，如果您想将单个浮点数加载并广播到 xmm 寄存器的所有元素中，出于性能原因，最好使用

; MOVSS can be much faster than MOVUPS, and is never slower
; Load A[0] into low dword of xmm0
movss xmm0, [A]
; Copy low dword of xmm0 to all dwords of xmm0
shufps xmm0, xmm0, 0

AVX指令集（在最近的Intel Sandy Bridge和AMD Bulldozer CPU中支持）有一个特殊的指令vbroadcastss，它执行加载和广播：

; xmm0 = (A[0], A[0], A[0], A[0]) after execution of vbroadcastss
vbroadcastss xmm0, [A]

SSE3指令集包括一个类似的指令MOVDDUP，它，然而，仅适用于双打

const double B = 2.718281828459045;

; xmm0 = ( 2.718281828459045, 2.718281828459045 ) after execution of movddup
movddup xmm0, [B]

There may be some confusion because bits in xmm registers (and all other registers BTW) are numbered right-to-left, i.e. the lowest bit is on the right, and the highest bit is on the left:

xmm0 = [bit 127, bit 126, ..., bit 1, bit 0]

If you consider the content of xmm register as 32-bit dwords, they are also arranged right-to-left:

xmm0 = [dword 3, dword 2, dword 1, dword 0]

The source of this confusion is that if you have an array in memory

float A[4] = { 0.0f, 1.0f, 2.0f, 3.0f };

and you load this array into xmm register, the elements appear in the xmm register in the reversed order:

; xmm0 = (A3 = 3.0f, A2 = 2.0f, A1 = 1.0f, A0 = 0.0f) after the load
movups xmm0, [A]

Therefore, the right way to copy the first dword into all dwords in an xmm register is

shufps xmm0, xmm0, 0

Also, if you want to do load-and-broadcast of a single float into all elements of an xmm register, for performance reasons it is better to use

; MOVSS can be much faster than MOVUPS, and is never slower
; Load A[0] into low dword of xmm0
movss xmm0, [A]
; Copy low dword of xmm0 to all dwords of xmm0
shufps xmm0, xmm0, 0

AVX instruction set (supported in the recent Intel Sandy Bridge and AMD Bulldozer CPUs) has a special instruction vbroadcastss which performs load-and-broadcast:

; xmm0 = (A[0], A[0], A[0], A[0]) after execution of vbroadcastss
vbroadcastss xmm0, [A]

SSE3 instruction set includes a similar instruction MOVDDUP, which, however, only works for doubles

const double B = 2.718281828459045;

; xmm0 = ( 2.718281828459045, 2.718281828459045 ) after execution of movddup
movddup xmm0, [B]

回复收藏 0 原文

~没有更多了~

关于作者

七七

暂无简介

文章

27 人气

关注发私信

友情链接

文江博客

关于XMM寄存器位图的困惑

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（1）

关于作者

相关话题

热门标签

推荐作者

李珊平

Quxin

范无咎

github_ZOJ2N8YxBm

若言

南…巷孤猫

友情链接

关于XMM寄存器位图的困惑

如果你对这篇内容有疑问，欢迎到本站社区发帖提问 参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（1）

关于作者

相关话题

热门标签

推荐作者

李珊平

Quxin

范无咎

github_ZOJ2N8YxBm

若言

南…巷孤猫

友情链接

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。