当前位置：文江博客话题详情

有效的代码检测？

发布于 2024-08-23 07:14:19 字数 167 浏览 5 评论 0原文

我经常阅读有关某些新框架及其“基准”的声明。我的问题是一般性问题，但针对具体问题：

开发人员应采取什么方法来有效地检测代码以衡量性能？
在阅读基准测试和性能测试时，需要注意哪些可能并不代表真实结果的危险信号？
在阅读基准测试和性能

原文

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

不念旧人 2024-08-30 07:14:19

衡量性能的方法有两种：使用代码检测和使用采样。

我过去使用的商业分析器（Hi-Prof、Rational Quantify、AQTime）使用了代码检测（其中一些还可以使用采样），根据我的经验，这给出了最好、最详细的结果。特别是 Rational Quantity 允许您放大结果、关注子树、删除完整的调用树以模拟改进，...

这些仪器分析器的缺点是：

往往很慢（您的代码运行速度大约慢 10 倍））
需要相当长的时间来检测您的应用程序
并不总是正确地处理应用程序中的异常（在 C++ 中）
如果您必须禁用 DLL 的检测（我们必须禁用 Oracle DLL 的检测），则可能很难

设置有时还会扭曲低级函数（如内存分配、关键部分等）报告的时间......

我使用的免费分析器（Very Sleepy，Luke Stackwalker）使用采样，这意味着进行快速性能测试要容易得多看看问题出在哪里。这些免费分析器不具备商业分析器的全部功能（尽管我自己为 Very Sleepy 提交了“关注子树”功能），但由于它们速度很快，因此非常有用。

目前，我个人最喜欢的是《非常困》（Very Sleepy），卢克·斯塔克沃克（Luke StackWalker）位居第二。

在这两种情况（检测和采样）中，我的经验是：

比较分析器在应用程序的不同版本上的结果是非常困难的。如果您的 2.0 版本存在性能问题，请分析您的 2.0 版本并尝试改进它，而不是寻找 2.0 比 1.0 慢的确切原因。
您绝不能将分析结果与在分析器外部运行的应用程序的计时（实时、CPU 时间）结果进行比较。如果您的应用程序在分析器之外消耗了 5 秒的 CPU 时间，而在分析器中运行时，分析器报告它消耗了 10 秒，则没有任何问题。不要认为您的应用程序实际上需要 10 秒。
这就是为什么您必须在同一环境中持续检查结果的原因。在探查器外部运行或在探查器内部运行时，一致地比较应用程序的结果。不要混淆结果。
还要使用一致的环境和系统。如果您拥有更快的 PC，您的应用程序仍然可能运行得更慢，例如因为屏幕更大并且需要在屏幕上更新更多内容。如果迁移到新 PC，请在新 PC 上重新测试应用程序的最后（一个或两个）版本，以便您了解时间如何适应新 PC。
这也意味着：使用固定的数据集并检查你对这些数据集的改进。应用程序中的改进可能会提高数据集 X 的性能，但会降低数据集 Y 的性能。在某些情况下，这可能是可以接受的。
与测试团队讨论您希望事先获得什么结果（请参阅 Oded 对我自己的问题的回答 “指示/计算”性能的最佳方法是什么一个应用程序？）。
请注意，如果较快的应用程序使用多线程而较慢的应用程序不使用多线程，则较快的应用程序仍然可以比较慢的应用程序使用更多的 CPU 时间。与测试时间讨论（如前所述）什么需要测量，什么不需要测量（在多线程情况下：实时而不是 CPU 时间）。
认识到许多小的改进可能会带来一项大的改进。如果您发现应用程序中有 10 个部分，每个部分花费 3% 的时间，并且您可以将其减少到 1%，那么您的应用程序将会快 20%。

There are two methods of measuring performance: using code instrumentation and using sampling.

The commercial profilers (Hi-Prof, Rational Quantify, AQTime) I used in the past used code instrumentation (some of them could also use sampling) and in my experience, this gives the best, most detailed result. Especially Rational Quantity allow you to zoom in on results, focus on sub trees, remove complete call trees to simulate an improvement, ...

The downside of these instrumenting profilers is that they:

tend to be slow (your code runs about 10 times slower)
take quite some time to instrument your application
don't always correctly handle exceptions in the application (in C++)
can be hard to set up if you have to disable the instrumentation of DLL's (we had to disable instrumentation for Oracle DLL's)

The instrumentation also sometimes skews the times reported for low-level functions like memory allocations, critical sections, ...

The free profilers (Very Sleepy, Luke Stackwalker) that I use use sampling, which means that it is much easier to do a quick performance test and see where the problem lies. These free profilers don't have the full functionality of the commercial profilers (although I submitted the "focus on subtree" functionality for Very Sleepy myself), but since they are fast, they can be very useful.

At this time, my personal favorite is Very Sleepy, with Luke StackWalker coming second.

In both cases (instrumenting and sampling), my experience is that:

It is very difficult to compare the results of profilers over different releases of your application. If you have a performance problem in your release 2.0, profile your release 2.0 and try to improve it, rather than looking for the exact reason why 2.0 is slower than 1.0.
You must never compare the profiling results with the timing (real time, cpu time) results of an application that is run outside the profiler. If your application consumes 5 seconds CPU time outside the profiler, and when run in the profiler the profiler reports that it consumes 10 seconds, there's nothing wrong. Don't think that your application actually takes 10 seconds.
That's why you must consistently check results in the same environment. Consistently compare results of your application when run outside the profiler, or when run inside the profiler. Don't mix the results.
Also use a consistent environment and system. If you get a faster PC, your application could still run slower, e.g. because the screen is larger and more needs to be updated on screen. If moving to a new PC, retest the last (one or two) releases of your application on the new PC so you get an idea on how times scale to the new PC.
This also means: use fixed data sets and check your improvements on these datasets. It could be that an improvement in your application improves the performance of dataset X, but makes it slower with dataset Y. In some cases this may be acceptible.
Discuss with the testing team what results you want to obtain beforehand (see Oded's answer on my own question What's the best way to 'indicate/numerate' performance of an application?).
Realize that a faster application can still use more CPU time than a slower application, if the faster one uses multi-threading and the slower one doesn't. Discuss (as said before) with the testing time what needs to be measured and what doesn't (in the multi-threading case: real time instead of CPU time).
Realize that many small improvements may lead to one big improvement. If you find 10 parts in your application that each take 3% of the time and you can reduce it to 1%, your application will be 20% faster.

回复收藏 0 原文

毁我热情 2024-08-30 07:14:19

这取决于你想做什么。

1）如果您想维护一般计时信息，以便对回归保持警惕，则可以使用各种仪器分析器。确保它们测量各种时间，而不仅仅是 CPU 时间。

2) 如果你想找到让软件更快的方法，那是一个截然不同的问题。
您应该将重点放在查找上，而不是测量上。

为此，您需要对调用堆栈进行采样的东西，而不仅仅是程序计数器（如果需要，可以在多个线程上采样）。这就排除了像gprof这样的分析器。
重要的是，它应该在挂钟时间上采样，而不是 CPU 时间，因为 I/O 造成的时间损失与运算造成的时间损失一样可能。这排除了一些分析器。
它应该仅在您关心时才能够采样，例如在等待用户输入时不能采样。这也排除了一些分析器。
最后，也是非常重要的，是您得到的摘要。
获取每行时间的百分比至关重要。
某行所用时间的百分比是包含该行的堆栈样本的百分比。
即使使用调用图，也不要满足于仅函数计时。
这排除了更多的分析器。
（忘记“自我时间”，忘记调用计数。这些很少有用，而且常常会产生误导。）

您所追求的是发现问题的准确性，而不是测量问题的准确性。这是非常重要的一点。（您不需要大量样本，尽管它没有什么害处。危害在于您的头脑，让您考虑测量，而不是它在做什么。）

一个很好的工具这是 RotateRight 的 Zoom 分析器。就我个人而言，我依赖手动采样。