估计平方根
我正在编写一个 iPhone 应用程序,需要每 1/30 秒计算大约 2000 次数字的平方根。 sqrt() 在计算机上运行良好,但在 iPhone 或 iPad 上帧速率下降到 10 FPS 左右,并且我已经优化了其余代码。我听说可以通过估计平方根来显着加快速度,但我找不到任何代码来执行此操作。我只需要一到两位小数的精度。任何有关如何执行此操作的建议或其他加快速度的方法将不胜感激。
谢谢!
I am writing an iPhone app that needs to calculate the square root of a number about 2000 times every 1/30th of a second. sqrt() works fine on a computer, but the frame rate drops to around 10 FPS on an iPhone or iPad, and I have already optimized the rest of the code. I have heard that this can be sped up dramatically by estimating the square root, but I can not find any code to do this. I only need one or two decimal places of precision. Any suggestions on how to do this, or other ways to speed things up would be appreciated.
Thanks!
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(9)
除非您确实需要平方根,否则请比较平方值而不是原始值和平方根。
如果您只需要比较,则平方比取平方根更快(也更准确)。这是大多数游戏的做法。
Unless you actually need the square root, compare the squared values rather than the raw values and the square root.
Squaring is much faster (and more accurate) than taking a square root, if you only need comparisons. This is the way most games do it.
您知道要求平方根的值的范围吗?假设您的值范围为 0 到 10。然后您可以预先计算一个数组:
然后在运行时获取您想要开方的数字,将其转换为整数(例如 3.123 变为 3)并将其用作索引(3)查找预先计算的值。
当然,如果您想要更精细的分辨率,您可以增加数组中的项目数量。
Do you know the range of values that you are trying to find the square root of? Say you have values ranging from 0 to 10. You can then precalculate an array:
Then during runtime you take the number that you want the sqrt of, convert that to an integer (so for example 3.123 becomes 3) and use that as an index (3) to look up the precalculated value.
Of course if you want finer resolution you can just increase the number of items in your array.
首先,您确定平方根实际上是瓶颈吗?你有简介吗?每 1/30 秒 2000 平方根实际上并不算多,即使在手机上也是如此。 ARM 文档引用了单精度平方根 33 个周期和双精度 60 个周期; 600mHz 处理器每秒可以执行1000 万 平方根(如果指令是流水线的,则更多)。
如果您已经进行了分析,并且平方根确实是瓶颈,那么您将需要使用 NEON
vrsqrte.f32
指令。该指令非常快,并同时给出四个浮点数的近似倒数平方根。然后,您可以使用 vmul.f32 指令来获取近似平方根(尽管对于许多用途而言,倒数比平方根本身更有用)。First off, are you certain that square root is actually the bottleneck? Have you profiled? 2000 square roots every 1/30th of a second actually isn't all that many, even on a cell phone. The ARM documentation quotes 33 cycles for a single-precision square root and 60 cycles for double-precision; a 600mHz processor can do 10 million square roots per second (more if the instruction is pipelined at all).
If you have profiled, and square root really is the bottleneck, you will want to use the NEON
vrsqrte.f32
instruction. This instruction is quite fast and gives you the approximate reciprocal square roots of four floating-point numbers simultaneously. You can then use thevmul.f32
instruction to get approximate square roots (though for many uses the reciprocal is more useful than the square root itself).您希望您的估算有多准确?如果您知道您希望估计值与真实 sqrt 有多接近,请使用牛顿法是你的朋友。
您知道传递给 sqrt 的值的范围吗?如果是这样,您可以创建一个在启动时预先计算的查找表(或者甚至在启动时从磁盘读取,具体取决于结果更快)。在表中找到最接近您的输入的值,即可得到您的估计值。
How accurate do you want your estimate to be? If you know how close you want your estimate to be to the real sqrt the Newton's Method is your friend.
Do you know the range of values that are passed to sqrt? If so you can make up a look up table that is precomputed at startup (or even read from disk at startup depending on what turns out to be faster). Find the closest in the table to your input and you get your estimate.
也许这适合你:
快速平方根倒数
如果此方法不能提供您需要的准确性,还有很多其他迭代方法,您可以在速度和准确性之间选择或多或少的精确度:
计算平方根的方法
Maybe this is for you:
Fast inverse square root
If this method doesn't provide the accuracy you need there are also alot of other iterative methods where you can choose more or less precise between speed and accuracy:
Methods of computing square roots
在 iPhone 上可以进行的最简单的更改是使用 sqrtf() 而不是 sqrt()。单精度浮点数学比双精度快得多,特别是在 3GS 老式和较新的设备上。
The easiest change you can make on an iPhone is to use sqrtf() instead of sqrt(). Single precision float math is much faster than double precision, especially on devices of 3GS vintage and newer.
如果您需要平方根来计算毕达哥拉斯三角形 (sqrt(x*x + y*y)),并且 x 和 y 均为非负数,则对此的快速近似为
最大误差为 5.7%。不过,请注意 min() 和 max() 中的分支错误预测。
If you need the square root to calculate a Pythagoras triangle (sqrt(x*x + y*y)), and both x and y are nonnegative, then a very fast approximation to that is
This has a maximum error of 5.7%. Watch out for branch misprediction in min() and max() though.
快速谷歌搜索就会出现各种各样的链接。
http://en.wikipedia.org/wiki/Methods_of_computing_square_roots#Approximations_that_depend_on_IEEE_representation
http://www.azillionmonkeys.com/qed/sqroot.html
A quick Google search turns up all sorts of links.
http://en.wikipedia.org/wiki/Methods_of_computing_square_roots#Approximations_that_depend_on_IEEE_representation
http://www.azillionmonkeys.com/qed/sqroot.html
如果您有一个“正常”正浮点数或双精度数,而不是整数,并且想要使用表查找方法,则可以执行两次单独的表查找,一次用于指数(重新偏置),一次用于指数尾数的几位(移位和掩码位域提取),然后将两个表查找结果相乘。
If you have a "normal" positive float or double, not an int, and want to use a table look-up method, you can do two separate table look ups, one for the exponent (re-biased), and one for a few bits of the mantissa (shift and mask bitfield extraction), and then multiply the two table look up results together.