SSE 内在函数导致正常浮点运算返回 -1.#INV
我在编写执行音频处理的 SSE 方法时遇到问题。我在这里基于英特尔的论文实现了 SSE 随机函数:
我也有一个方法就是也使用 SSE 执行从 Float 到 S16 的转换,转换执行起来非常简单,如下所示:
unsigned int Float_S16LE(float *data, const unsigned int samples, uint8_t *dest)
{
int16_t *dst = (int16_t*)dest;
const __m128 mul = _mm_set_ps1((float)INT16_MAX);
__m128 rand;
const uint32_t even = count & ~0x3;
for(uint32_t i = 0; i < even; i += 4, data += 4, dst += 4)
{
/* random round to dither */
FloatRand4(-0.5f, 0.5f, NULL, &rand);
__m128 rmul = _mm_add_ps(mul, rand);
__m128 in = _mm_mul_ps(_mm_load_ps(data),rmul);
__m64 con = _mm_cvtps_pi16(in);
memcpy(dst, &con, sizeof(int16_t) * 4);
}
}
FloatRand4 定义如下:
static inline void FloatRand4(const float min, const float max, float result[4], __m128 *sseresult = NULL)
{
const float delta = (max - min) / 2.0f;
const float factor = delta / (float)INT32_MAX;
...
}
如果 sseresult != NULL
则返回 __m128
结果,并且结果
未使用。 这在第一个循环中表现完美,但在下一个循环中,delta
变为 -1.#INF
而不是 1.0
。如果我注释掉行 __m64 con = _mm_cvtps_pi16(in);
问题就会消失。
我认为 FPU 正在进入未知状态或其他状态。
I am having a problem with a SSE method I am writing that performs audio processing. I have implemented a SSE random function based on Intel's paper here:
I also have a method that is performing conversions from Float to S16 using SSE also, the conversion is performed quite simply as follows:
unsigned int Float_S16LE(float *data, const unsigned int samples, uint8_t *dest)
{
int16_t *dst = (int16_t*)dest;
const __m128 mul = _mm_set_ps1((float)INT16_MAX);
__m128 rand;
const uint32_t even = count & ~0x3;
for(uint32_t i = 0; i < even; i += 4, data += 4, dst += 4)
{
/* random round to dither */
FloatRand4(-0.5f, 0.5f, NULL, &rand);
__m128 rmul = _mm_add_ps(mul, rand);
__m128 in = _mm_mul_ps(_mm_load_ps(data),rmul);
__m64 con = _mm_cvtps_pi16(in);
memcpy(dst, &con, sizeof(int16_t) * 4);
}
}
FloatRand4 is defined as follows:
static inline void FloatRand4(const float min, const float max, float result[4], __m128 *sseresult = NULL)
{
const float delta = (max - min) / 2.0f;
const float factor = delta / (float)INT32_MAX;
...
}
If sseresult != NULL
the __m128
result is returned and result
is unused.
This performs perfectly on the first loop, but on the next loop delta
becomes -1.#INF
instead of 1.0
. If I comment out the line __m64 con = _mm_cvtps_pi16(in);
the problem goes away.
I think that the FPU is getting into an unknown state or something.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
混合 SSE 整数算术和(常规)浮点算术。可能会产生奇怪的结果,因为两者都在相同的寄存器上操作。如果使用:
FPU 重置为正确状态。 Microsoft 提供了有关何时使用 EMMS 的指南
Mixing SSE Integer arithmetic and (regular) Floating point math. Can produce weird results because both are operating on the same registers. If you use:
the FPU is reset into a correct state. Microsoft has Guidelines for When to Use EMMS
如果不可能对 __m64 进行未对齐的流或存储,我要么将其保留在 _m128i 中,然后使用 _mm_maskmoveu_si128 进行屏蔽写入,要么手动存储这 8 个字节。
http://msdn.microsoft.com/en-us/library/bytwczae.aspx
If it's impossible to do an unaligned stream or store of an __m64, I'd either keep it inside an _m128i and do a masked write with _mm_maskmoveu_si128 or store those 8 bytes by hand.
http://msdn.microsoft.com/en-us/library/bytwczae.aspx