已知范围内的定点乘法
我正在尝试以 16 位定点乘以 A*B
,同时保持尽可能高的准确性。 A
是无符号整数范围内的 16 位,B
除以 1000,并且始终在 0.001
和 9.999
之间。我已经有一段时间没有处理过这样的问题了,所以:
- 我知道我可以在转移到 32 位变量后执行
A*B/1000
,然后剥离回 16 - 位想要让它更快,
- 我想在不移动到 32 位的情况下完成所有操作(因为我只有 16 位乘法)
有没有简单的方法可以做到这一点?
编辑:A
将在 0 到 4000 之间,因此所有可能的结果也在 16 位范围内。
编辑:B
来自用户,在X.XXX
掩码中逐位设置,因此运算为/1000
。
I'm trying to multiply A*B
in 16-bit fixed point, while keeping as much accuracy as possible. A
is 16-bit in unsigned integer range, B
is divided by 1000 and always between 0.001
and 9.999
. It's been a while since I dealt with problems like that, so:
- I know I can just do
A*B/1000
after moving to 32-bit variables, then strip back to 16-bit - I'd like to make it faster than that
- I'd like to do all the operations without moving to 32-bit (since I've got 16-bit multiplication only)
Is there any easy way to do that?
Edit: A
will be between 0 and 4000, so all possible results are in the 16-bit range too.
Edit: B
comes from user, set digit-by-digit in the X.XXX
mask, that's why the operation is /1000
.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
不,你必须转到 32 位。一般来说,两个 16 位数字的乘积总是会得到 32 位宽的结果。
您应该检查您正在使用的 CPU 的 CPU 指令集,因为 16 位机器上的大多数乘法指令都可以选择直接以 32 位整数形式返回结果。
这会对您有很大帮助,因为:
会强制编译器执行 32 位 * 32 位乘法。在您的计算机上,这可能非常慢,甚至仅使用 16 位乘法分多个步骤完成。
一点点内联汇编或者更好的编译器内部函数可以大大加快速度。
下面是 Texas Instruments C64x+ DSP 的一个示例,它具有这样的内在函数:
另一个想法:您要除以 1000。您的选择是恒定的吗?使用 2 的幂作为定点数的基数会快得多。 1024 已经很接近了。你为什么不:
相反?编译器可以通过右移 10 位来优化这一点。这应该比执行倒数乘法技巧要快得多。
No, you have to go to 32 bit. In general the product of two 16 bit numbers will always give you a 32 bit wide result.
You should check the CPU instruction set of the CPU you're working on because most multiply instructions on 16 bit machines have an option to return the result as a 32 bit integer directly.
This would help you a lot because:
Would force the compiler to do a 32bit * 32bit multiply. On your machine this could be very slow or even done in multiple steps using 16bit multiplies only.
A little bit of inline assembly or even better a compiler intrinsic could speed things up a lot.
Here is an example for the Texas Instruments C64x+ DSP which has such intrinsics:
Another thought: You're dividing by 1000. Was that constant your choice? It would be much faster to use a power of two as the base for your fixed-point numbers. 1024 is close. Why don't you:
instead? The compiler could optimize this by using a shift right by 10 bits. That ought to be much faster than doing reciprocal multiplication tricks.