在汇编中将无符号字符转换为浮点数(为浮点向量计算做准备)
我正在尝试使用 SSE2 优化函数。我想知道是否可以比这种方式更好地为我的汇编代码准备数据。我的源数据是来自 pSrcData 的一堆无符号字符。我将其复制到这个浮点数数组中,因为我的计算需要以浮点数进行。
unsigned char *pSrcData = GetSourceDataPointer();
__declspec(align(16)) float vVectX[4];
vVectX[0] = (float)pSrcData[0];
vVectX[1] = (float)pSrcData[2];
vVectX[2] = (float)pSrcData[4];
vVectX[3] = (float)pSrcData[6];
__asm
{
movaps xmm0, [vVectX]
[...] // do some floating point calculations on float vectors using addps, mulps, etc
}
有没有一种更快的方法可以将 pSrcData 的每个其他字节转换为浮点数并将其存储到 vVectX 中?
谢谢!
I am trying to optimize a function using SSE2. I'm wondering if I can prepare the data for my assembly code better than this way. My source data is a bunch of unsigned chars from pSrcData. I copy it to this array of floats, as my calculation needs to happen in float.
unsigned char *pSrcData = GetSourceDataPointer();
__declspec(align(16)) float vVectX[4];
vVectX[0] = (float)pSrcData[0];
vVectX[1] = (float)pSrcData[2];
vVectX[2] = (float)pSrcData[4];
vVectX[3] = (float)pSrcData[6];
__asm
{
movaps xmm0, [vVectX]
[...] // do some floating point calculations on float vectors using addps, mulps, etc
}
Is there a quicker way for me to cast every other byte of pSrcData to a float and store it into vVectX?
Thanks!
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
(1) AND 使用掩码将奇数字节清零 (
PAND
)(2) 从 16 位解压到 32 位(使用零向量的
PUNPCKLWD
)(3)将 32 位整数转换为浮点数 (
CVTDQ2PS
)三个指令。
(1) AND with a mask to zero out the odd bytes (
PAND
)(2) Unpack from 16 bits to 32 bits (
PUNPCKLWD
with a zero vector)(3) Convert 32 bit ints to floats (
CVTDQ2PS
)Three instructions.
我意识到超级旧的线程,但我自己正在寻找代码来做到这一点。这是我的解决方案,我认为它更简单:
但是基准测试显示它并不比在启用编译器优化的情况下在 C 中循环数组更快。也许该方法作为一堆 AVX 计算的初始阶段会更有用。
Super old thread I realise, but I was searching for code myself to do this. This is my solution, which I think is simpler:
However benchmarking shows it no faster than just looping over the array in C, with compiler optimisation enabled. Maybe the approach will be more useful as the initial stage of a bunch of AVX computations.