分析 Java 中的本机方法 - 奇怪的结果
我一直在使用 Yourkit 8.0 来分析在 Mac OS X(10.5.7、Apple JDK 1.6.0_06-b06-57)下运行的数学密集型应用程序,并注意到 CPU 分析结果中有一些奇怪的行为。
例如,我使用采样进行了一次分析运行,结果报告应用程序 10 分钟运行时间的 40% 花费在 StrictMath.atan 方法中。 我觉得这很令人费解,但我相信了它的字面意思,并花了一些时间用极其简单的多项式拟合来替换 atan。
当我再次运行该应用程序时,它花费的时间几乎与之前完全相同(10 分钟) - 但我的 atan 替代品在分析结果中没有出现。 相反,其他主要热点的运行时间百分比只是增加来弥补。
总结一下:
StrictMath.atan 的结果(本机方法)
总时长:10 分钟
方法一:20%
方法二:20%
方法三:20%
StrictMath.atan:40%
简化、纯 Java atan 的结果
总时长:10 分钟
方法一:33%
方法2:33%
方法 3:33%
(方法 1、2、3 不执行任何 atan 调用)
知道这种行为是怎么回事吗? 我使用 EJ-Technologies 的 JProfiler 得到了相同的结果。 似乎 JDK 分析 API 报告的本机方法结果不准确,至少在 OS X 下是这样。
I have been using Yourkit 8.0 to profile a mathematically-intensive application running under Mac OS X (10.5.7, Apple JDK 1.6.0_06-b06-57), and have noticed some strange behavior in the CPU profiling results.
For instance - I did a profiling run using sampling, which reported that 40% of the application's 10-minute runtime was spent in the StrictMath.atan method. I found this puzzling, but I took it at it's word and spent a bit of time replacing atan with an extremely simply polynomial fit.
When I ran the application again, it took almost exactly the same time as before (10 minutes) - but my atan replacement showed up nowhere in the profiling results. Instead, the runtime percentages of the other major hotspots simply increased to make up for it.
To summarize:
RESULTS WITH StrictMath.atan (native method)
Total runtime: 10 minutes
Method 1: 20%
Method 2: 20%
Method 3: 20%
StrictMath.atan: 40%
RESULTS WITH simplified, pure Java atan
Total runtime: 10 minutes
Method 1: 33%
Method 2: 33%
Method 3: 33%
(Methods 1,2,3 do not perform any atan calls)
Any idea what is up with this behavior? I got the same results using EJ-Technologies' JProfiler. It seems like the JDK profiling API reports inaccurate results for native methods, at least under OS X.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(6)
发生这种情况的原因是采样时间不一致。 例如,如果一个方法使用了相当多的时间,但执行时间并不长,则采样可能会错过它。 另外,我认为垃圾收集在示例期间永远不会发生,但如果某些代码导致大量垃圾收集,它可能会极大地导致速度减慢,而不会出现在示例中。
在类似的情况下,我发现运行两次非常有帮助,一次是跟踪,一次是采样。 如果某个方法同时出现在两者中,则它可能会使用大量 CPU,否则它很可能只是采样过程的产物。
This can happen because of inconsistencies of when samples are taken. So for example, if a method uses a fair amount of time, but doesn't take very long to execute, it is possible for the sampling to miss it. Also, I think garbage collection never happens during a sample, but if some code causes a lot of garbage collection it can greatly contribute to a slowdown without appearing in the sample.
In similar situation I've found it very helpful to run twice, once with tracing as well as once with sampling. If a method appears in both it is probably using a lot of CPU, otherwise it could well just be an artifact of the sampling process.
由于您使用的是 Mac,因此您可以尝试 Apple 的 Shark profiler(从ADC)具有 Java 支持,Apple 的性能团队已在该工具上投入了大量时间。
正如 Nick 指出的那样,如果采样间隔足够接近函数的执行时间,并且探查器很少检查函数实际执行的时间,则采样可能会产生误导。 我不知道 YourKit 是否支持这一点,但在 Shark 中,您可以将采样间隔更改为默认的 10 毫秒以外的值,看看结果是否有很大不同。
还有一个单独的调用跟踪模式,它将记录每个函数输入/返回 - 这完全避免了别名错误的可能性,但会收集大量数据和更高的开销,如果您的应用程序正在进行任何类型的实时处理,这可能很重要。
Since you're using a Mac, you might try Apple's Shark profiler (free download from ADC) which has Java support and Apple's performance group has put a fair amount of time into the tool.
As Nick pointed out, sampling can be misleading if the sample interval is close enough to the function's execution time and the profiler rarely checks when the function is actually executing. I don't know whether YourKit supports this but in Shark you can change the sampling interval to something other than the default 10ms and see if the results are substantially different.
There's also a separate call-tracing mode which will record every function enter/return - this completely avoids the possibility of aliasing errors but collects a ton of data and higher overhead, which could matter if your app is doing any sort of real-time processing.
您可能想查看传递到这三个方法中的参数。 时间可能花在生成返回值或创建大量临时对象的方法上。
You may want to look at the parameters that are being passed into the three methods. It may be that the time is being spent generating return values or in methods that are creating a lot of temporary objects.
我发现 YourKit 极大地夸大了调用子方法的成本(我认为是由于其日志记录方法)。 如果您只遵循配置文件给您的建议,您最终只会合并功能而没有真正的收益,因为 HotSpot 通常在这方面做得很好。
因此,我强烈建议完全在分析器之外测试批次,以更好地了解更改是否真的有益(这可能看起来很明显,但这花费了我一些开发时间)。
I find YourKit greatly exaggerates the cost of calling sub-methods (due to its logging method, I assume). If you only follow the advice that the profile gives you you'll end up just merging functions with no real gain as the HotSpot usually does excellently for this.
Therefore, I'd highly advise to test batches completely outside profilers too, to get a better idea whether changes are really beneficial (it may seem obvious but this cost me some development time).
值得注意的是,如果 Java 方法足够小,则可以内联,但本机方法的内联规则不同。 如果一个方法是内联的,它不会出现在分析器中(当然也不会出现在 YourKit 中)
It worth noting that Java methods can be inlined if they are small enough, however native methods are inlined under different rules. If a method is inlined it doesn't appear in the profiler (certainly not YourKit anyway)
分析器可以是这样的。
这是方法我用过。
每次都有效。
这就是原因。
Profilers can be like that.
This is the method I use.
Works every time.
And this is why.