SSE 内在函数导致正常浮点运算返回 -1.#INV

发布于 2024-12-29 18:17:17 字数 1467 浏览 5 评论 0原文

我在编写执行音频处理的 SSE 方法时遇到问题。我在这里基于英特尔的论文实现了 SSE 随机函数：

http://software.intel.com/en-us/articles/fast-random-number-generator-on-the-intel-pentiumr-4-processor/

我也有一个方法就是也使用 SSE 执行从 Float 到 S16 的转换，转换执行起来非常简单，如下所示：

unsigned int Float_S16LE(float *data, const unsigned int samples, uint8_t *dest)
{
  int16_t *dst = (int16_t*)dest;
  const __m128 mul = _mm_set_ps1((float)INT16_MAX);
   __m128 rand;
  const uint32_t even = count & ~0x3;
  for(uint32_t i = 0; i < even; i += 4, data += 4, dst += 4)
  {
    /* random round to dither */
    FloatRand4(-0.5f, 0.5f, NULL, &rand);

    __m128 rmul = _mm_add_ps(mul, rand);
    __m128 in = _mm_mul_ps(_mm_load_ps(data),rmul);
    __m64 con = _mm_cvtps_pi16(in);

    memcpy(dst, &con, sizeof(int16_t) * 4);
  }
}

FloatRand4 定义如下：

static inline void FloatRand4(const float min, const float max, float result[4], __m128 *sseresult = NULL)
{
  const float delta  = (max - min) / 2.0f;
  const float factor = delta / (float)INT32_MAX;
  ...
}

如果 sseresult != NULL 则返回 __m128 结果，并且结果未使用。这在第一个循环中表现完美，但在下一个循环中，delta 变为 -1.#INF 而不是 1.0。如果我注释掉行 __m64 con = _mm_cvtps_pi16(in); 问题就会消失。

我认为 FPU 正在进入未知状态或其他状态。

原文

I am having a problem with a SSE method I am writing that performs audio processing. I have implemented a SSE random function based on Intel's paper here:

http://software.intel.com/en-us/articles/fast-random-number-generator-on-the-intel-pentiumr-4-processor/

I also have a method that is performing conversions from Float to S16 using SSE also, the conversion is performed quite simply as follows:

unsigned int Float_S16LE(float *data, const unsigned int samples, uint8_t *dest)
{
  int16_t *dst = (int16_t*)dest;
  const __m128 mul = _mm_set_ps1((float)INT16_MAX);
   __m128 rand;
  const uint32_t even = count & ~0x3;
  for(uint32_t i = 0; i < even; i += 4, data += 4, dst += 4)
  {
    /* random round to dither */
    FloatRand4(-0.5f, 0.5f, NULL, &rand);

    __m128 rmul = _mm_add_ps(mul, rand);
    __m128 in = _mm_mul_ps(_mm_load_ps(data),rmul);
    __m64 con = _mm_cvtps_pi16(in);

    memcpy(dst, &con, sizeof(int16_t) * 4);
  }
}

FloatRand4 is defined as follows:

static inline void FloatRand4(const float min, const float max, float result[4], __m128 *sseresult = NULL)
{
  const float delta  = (max - min) / 2.0f;
  const float factor = delta / (float)INT32_MAX;
  ...
}

If sseresult != NULL the __m128 result is returned and result is unused.
This performs perfectly on the first loop, but on the next loop delta becomes -1.#INF instead of 1.0. If I comment out the line __m64 con = _mm_cvtps_pi16(in); the problem goes away.

I think that the FPU is getting into an unknown state or something.

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

桃酥萝莉 2025-01-05 18:17:17

混合 SSE 整数算术和（常规）浮点算术。可能会产生奇怪的结果，因为两者都在相同的寄存器上操作。如果使用：

_mm_empty()

FPU 重置为正确状态。 Microsoft 提供了有关何时使用 EMMS 的指南

Mixing SSE Integer arithmetic and (regular) Floating point math. Can produce weird results because both are operating on the same registers. If you use:

_mm_empty()

the FPU is reset into a correct state. Microsoft has Guidelines for When to Use EMMS

回复收藏 0 原文

知你几分 2025-01-05 18:17:17

_mm_load_ps 不保证进行对齐加载。 float* 数据可以对齐到 4 个字节而不是 16 _ => _mm_loadu_ps
memcpy 可能会消除 SSE 所实现的优势，您应该使用 __m64 的存储命令，但这里再次注意对齐。
如果不可能对 __m64 进行未对齐的流或存储，我要么将其保留在 _m128i 中，然后使用 _mm_maskmoveu_si128 进行屏蔽写入，要么手动存储这 8 个字节。

http://msdn.microsoft.com/en-us/library/bytwczae.aspx

回复收藏 0 原文

~没有更多了~