sse/sse2 双矩阵浮点向量乘法
我必须使用 sse/sse2 实现矩阵向量乘法。 向量和矩阵很大。 矩阵是双精度的,向量是浮点的。
关键是我必须在浮点数上进行的所有计算 - 当我从矩阵获取数据时,我将其提升为浮点数,进行计算并得到浮点数向量(稍后在对浮点数进行一些额外计算之后,我必须添加一些浮点数(float 我的问题是如何使用 SSE/SSE2 来做到这一点 - 问题是双精度数
- 我有指向 double* 的指针,我必须以某种方式将 4 个双精度数转换为 4 个浮点数以适合 __mm128 ...有任何说明可以做到这一点吗?
I have to implement matrix-vector multiplication using sse/sse2.
Vector and matrix are large.
Matrix is double, vector is float.
The point is that all calculations I have to do on floats - when I get data from matrix I promote it to float, do the calculations and I get float vector (later after some additional calculations on floats I have to add some float values (float matrix) to double values (double matrix).
My question is how I can do it using SSE/SSE2 - the problem is with doubles - I have pointer to double* and I have to somehow convert 4 doubles into 4 floats to fit in __mm128... Are there any intructions to do that?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
您需要调用
__m128 _mm_cvtpd_ps (__m128d a)
(CVTDP2PS
) 两次以获取两个单精度浮点向量,每个向量包含两个原始双精度值,然后合并这两个使用例如__m128 _mm_shuffle_ps(__m128 a, __m128 b, unsigned int imm8)
(SHUFPS
) 将向量浮点为单个向量。You need to call
__m128 _mm_cvtpd_ps (__m128d a)
(CVTDP2PS
) twice to get two single precision float vectors, each containing two of your original double precision values, then merge these two float vectors into a single vector, using e.g.__m128 _mm_shuffle_ps(__m128 a, __m128 b, unsigned int imm8)
(SHUFPS
).从
double
更改为float
会降低精度级别,而不是提高精度级别。为了获得更高的准确性,您应该对 double 进行计算(将向量提升为该类型),然后可能将结果转换回 float。转换所需的指令为cvtps2pd
(float
到double
)和/或cvtpd2ps
(double
到浮动
)。它们一次只能转换两个值(因为 SSE 寄存器中只有两个 double ),因此您需要分两部分进行转换。Changing from
double
tofloat
is reducing the level of precision, not increasing it. For more accuracy, you should do the computations ondouble
s (promoting the vector to that type), then possibly cast the result back down tofloat
afterwards. The instructions you need for conversion arecvtps2pd
(float
todouble
) and/orcvtpd2ps
(double
tofloat
). Those only convert two values at a time (since only twodouble
s fit into an SSE register), so you will need to do your conversion in two parts.