已知范围内的定点乘法

发布于 2024-09-10 06:02:55 字数 487 浏览 5 评论 0原文

我正在尝试以 16 位定点乘以 A*B,同时保持尽可能高的准确性。 A 是无符号整数范围内的 16 位,B 除以 1000,并且始终在 0.0019.999 之间。我已经有一段时间没有处理过这样的问题了,所以:

  • 我知道我可以在转移到 32 位变量后执行 A*B/1000 ,然后剥离回 16
  • 位想要让它更快,
  • 我想在不移动到 32 位的情况下完成所有操作(因为我只有 16 位乘法)

有没有简单的方法可以做到这一点?

编辑:A 将在 0 到 4000 之间,因此所有可能的结果也在 16 位范围内。

编辑:B来自用户,在X.XXX掩码中逐位设置,因此运算为/1000

I'm trying to multiply A*B in 16-bit fixed point, while keeping as much accuracy as possible. A is 16-bit in unsigned integer range, B is divided by 1000 and always between 0.001 and 9.999. It's been a while since I dealt with problems like that, so:

  • I know I can just do A*B/1000 after moving to 32-bit variables, then strip back to 16-bit
  • I'd like to make it faster than that
  • I'd like to do all the operations without moving to 32-bit (since I've got 16-bit multiplication only)

Is there any easy way to do that?

Edit: A will be between 0 and 4000, so all possible results are in the 16-bit range too.

Edit: B comes from user, set digit-by-digit in the X.XXX mask, that's why the operation is /1000.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

錯遇了你 2024-09-17 06:02:55

不,你必须转到 32 位。一般来说,两个 16 位数字的乘积总是会得到 32 位宽的结果。

您应该检查您正在使用的 CPU 的 CPU 指令集,因为 16 位机器上的大多数乘法指令都可以选择直接以 32 位整数形式返回结果。

这会对您有很大帮助,因为:

short testfunction (short a, short b)
{
  int A32 = a;
  int B32 = b;

  return A32*B32/1000
}

会强制编译器执行 32 位 * 32 位乘法。在您的计算机上,这可能非常慢,甚至仅使用 16 位乘法分多个步骤完成。

一点点内联汇编或者更好的编译器内部函数可以大大加快速度。

下面是 Texas Instruments C64x+ DSP 的一个示例,它具有这样的内在函数:

short test (short a, short b) 
{
  int product = _mpy (a,b); // calculates product, returns 32 bit integer
  return product / 1000;
}

另一个想法:您要除以 1000。您的选择是恒定的吗?使用 2 的幂作为定点数的基数会快得多。 1024 已经很接近了。你为什么不:

  return (a*b)/1024 

相反?编译器可以通过右移 10 位来优化这一点。这应该比执行倒数乘法技巧要快得多。

No, you have to go to 32 bit. In general the product of two 16 bit numbers will always give you a 32 bit wide result.

You should check the CPU instruction set of the CPU you're working on because most multiply instructions on 16 bit machines have an option to return the result as a 32 bit integer directly.

This would help you a lot because:

short testfunction (short a, short b)
{
  int A32 = a;
  int B32 = b;

  return A32*B32/1000
}

Would force the compiler to do a 32bit * 32bit multiply. On your machine this could be very slow or even done in multiple steps using 16bit multiplies only.

A little bit of inline assembly or even better a compiler intrinsic could speed things up a lot.

Here is an example for the Texas Instruments C64x+ DSP which has such intrinsics:

short test (short a, short b) 
{
  int product = _mpy (a,b); // calculates product, returns 32 bit integer
  return product / 1000;
}

Another thought: You're dividing by 1000. Was that constant your choice? It would be much faster to use a power of two as the base for your fixed-point numbers. 1024 is close. Why don't you:

  return (a*b)/1024 

instead? The compiler could optimize this by using a shift right by 10 bits. That ought to be much faster than doing reciprocal multiplication tricks.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文