.NET 中的 __CIAsin 和反正弦比正弦慢很多吗?
我一直在对缓慢的代码区域运行一些电子配置文件测试。这是使用 Visual Studio 2008 和 .NET 2(已完全修补)的情况。我大约 32% 的计算是由半正弦公式使用的。这需要两个正弦、两个余弦、一个平方根和一个反正弦 - 所有这些都使用标准 .NET 数学库(即 Math.Sin、Math.Asin、Math.Sqrt)。我已经能够轻松缓存余弦 - 使半正弦函数加速大约 25-30%。
在个人资料中,我看到 __CIasin_pentium4 和 __CIasin 除了人们发布的堆栈转储之类的内容之外,在 Google 上都找不到太多内容。 pentium4 变体抓取的样本数量大约是其两倍(包括在内和排除)。我假设这是反正弦,但它真的比正弦贵很多吗?尽管计算的数量是原来的两倍,但轮廓中没有正弦符号。
这两个函数都是反正弦,还是其中一个函数是正弦?如果不是,它们代表什么?
是的,我在互联网上和这里看过有关快速正弦的各种文章和帖子。我确实需要计算正弦的精度,而不是查找表或截断的泰勒级数。我正在使用半正弦来计算和/或比较地球表面的距离。 10m 精度(恕我直言,我的应用程序的最小精度)相当于大约 1/640000 弧度。
提高速度的一种想法是乘以三角恒等式。尽管这会产生更多的触发函数,但它们将仅依赖于各个端点,因此可以缓存。另一个方法是解开反正弦和平方根以进行比较。我认为后者有很大的改进空间,但是目前我正在尝试了解什么占用了处理时间以及 __CIasin 函数到底代表什么。
I've been running som eprofile tests of a slow area of code. This is with Visual Studio 2008 and .NET 2 (fully patched). About 32% of my computation is used by the Haversine formula. This requires two sines, two cosines, a square root, and an arc sine - all using the standard .NET Math library (ie. Math.Sin, Math.Asin, Math.Sqrt). I've been able to easily cache the cosines - resulting in a roughly 25-30% speedup of the Haversine function.
In the profile I'm seeing __CIasin_pentium4 and __CIasin neither of which find much on Google except for things like stack dumps that people have posted.
The pentium4 variant grabs about twice as many samples (both inclusive and exclusive). I'm assuming this is an arc sine, but is it really so much more expensive than a sine? There is no sign of a sine in the profile even though twice as many will be computed.
Are both of these functions arcsines, or is one a sine? If not, what do they represent?
Yes I've seen various articles and posts on the Internet and here about fast sines. I really do need the accuracy of a computed sine rather than a look up table or truncated Taylor series. I'm using the Haversine to compute and/or compare distances on the Earth's surface. 10m accuracy (the minimum IMHO for my app) equates to about 1/640000 radians.
One thought for speed is to multiple out the trigonometric identities. Although this would result in more trig functions, they would become dependent on individual end points only and hence become cacheable. Another is to unwrap the arcsine and the square root for my comparisons. I think the latter has a lot of scope for improvement, however at the moment I am trying to understand what is taking the processing time and exactly what the __CIasin functions represent.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
看起来 Pentium FPU 有用于正弦和余弦(fsin 和 fcos)的本机指令,但没有用于反正弦的指令。因此,我看到的 __CIasin 函数可能是 arcsine 的 .NET 实现,据我所知,它使用泰勒级数。这就解释了速度上的巨大差异,因此 asin 出现了,而 sin 没有出现。 (或者 cos 或 sqrt - 这些也是本机函数)。
我很早以前就直接编码了 x86 FPU。很久以前,我认为它一定是 8087 - 无论如何,当时唯一存在的三角函数是部分切线!
因此,优化的下一个工作是在可能的情况下从半正弦中解开反正弦和平方根。结果用于简单的大于/小于比较(排序等);并与“固定”值进行比较。在这两种情况下,都应该可以打开它们。例如。固定值变为平方( sin( 固定 ) ),并与 sqrt 内的值进行比较。
我仍然认为三角恒等式可能是一种有用的优化,但它肯定会使代码复杂化并引入错误的可能性。
Looks like the Pentium FPU has native instructions for sine and cosine (fsin and fcos), but not for arcsine. Hence the __CIasin functions that I am seeing are probably the .NET implementation of arcsine, which I understand uses a Taylor series. This explains the big difference in speed, so that asin shows up but sin does not. (or cos or sqrt for that matter - these are native functions too).
I have coded x86 FPUs directly long ago. So long ago, I think it must have been an 8087 - anyway the only trig present in those days was a partial tangent!
So the next job in the optimization is to unwrap the arcsine and square root from the Haversine where possible. The results are used for simple greater than/smaller than comparisons (sorting, etc); and to compare against "fixed" values. In both cases, it should be possible to unwrap these. Eg. the fixed value becomes square( sin( fixed ) ), and is compared against what was inside the sqrt.
I still think the trig identities might be a useful optimization but it would definitely complicate the code and introduce the possibility of errors.
是的,一定要解开 sqrt 和反正弦。反三角函数几乎总是比正三角函数慢,因为正三角函数通常在 FPU 中实现。
Yes definitely unwrap the sqrt and the arc-sine. Inverse trigonometric functions are almost always slower than their forward counterparts because the forward trig functions are usually implemented in the FPU.