超越堆栈采样:C++分析器

发布于 2024-10-06 14:21:35 字数 4012 浏览 4 评论 0 原文

黑客的故事

日期是 2010 年 2 月 12 日。圣诞节前的日子已经过去了,作为一名 Windows 程序员,我几乎遇到了一个主要障碍。我一直在使用 AQTime,我尝试过 sleepy、shiny、very sleepy,就在我们说话的时候,VTune 正在安装。我尝试过使用 VS2008 分析器,但它确实是一种惩罚,而且常常是不明智的。我使用了随机暂停技术。我检查了调用树。我已经触发了函数跟踪。但令人悲伤和痛苦的事实是,我正在使用的应用程序有超过一百万行代码,可能还有另外一百万行第三方应用程序。

我需要更好的工具。 我已经阅读了其他主题。我已经尝试了每个主题中列出的每个分析器。一定有比这些垃圾和昂贵的选择更好的东西,或者比几乎没有任何收获的可笑的大量工作。让事情变得更加复杂的是,我们的代码是大量线程的,并且运行许多 Qt 事件循环,其中一些循环非常脆弱,以至于由于时间延迟而在大量检测下崩溃。不要问我为什么我们要运行多个事件循环。没有人能告诉我。

在 Windows 环境中是否有更多类似于 Valgrind 的选项?
还有什么比我已经尝试过的一大堆破损工具更好的吗?
是否有任何设计可以与 Qt 集成,也许可以在队列中显示有用的事件?

我尝试过的工具的完整列表,其中真正有用的工具以斜体显示:

  • AQTime:相当好!深度递归存在一些问题,但在这些情况下调用图是正确的,并且可以用来消除您可能遇到的任何困惑。不是一个完美的工具,但值得尝试。它可能适合您的需求,而且大多数时候对我来说已经足够好了。
  • 调试模式下的随机暂停攻击:没有足够的时间信息。
    一个很好的工具,但不是一个完整的解决方案。
  • 并行工作室:核心选项。突兀、怪异、而且异常强大。我认为你应该进行 30 天评估,看看它是否合适。这也太酷了。
  • AMD Codeanalyst:非常棒,易于使用,非常容易崩溃,但我认为这是一个环境问题。我建议尝试一下,因为它是免费的。
  • Luke Stackwalker:在小型项目上运行良好,但在我们的项目上需要一些努力。虽然取得了一些不错的结果,但它绝对取代了 Sleepy 来完成我的个人任务。
  • PurifyPlus:不支持 Win-x64 环境,尤其是 Windows 7。其他方面都很好。我其他部门的许多同事都对此深信不疑。
  • VS2008 Profiler:在函数跟踪模式下以所需的分辨率产生 100+gigs 范围内的输出。从好的方面来说,会产生可靠的结果。
  • GProf:要求 GCC 具有中等效率。
  • VTune:VTune 对 W7 的支持近乎犯罪。否则优秀的
  • PIN:我需要破解我自己的工具,所以这是最后的手段。
  • Sleepy\VerySleepy:对于较小的应用程序很有用,但在这里让我失望。
  • EasyProfiler:如果您不介意手动注入一些代码来指示在何处进行检测,那么这还不错。
  • Valgrind:仅*nix,但当你处于那种环境中时非常好。
  • OProfile:仅限 Linux。
  • 普菲:他们射杀野马。

我还没有尝试过的建议工具:

  • XPerf:
  • Glowcode:
  • Devpartner:

注释: 目前的英特尔环境。 VS2008,boost 库。 Qt 4+。其中最糟糕的是:通过 trolltech 进行 Qt/MFC 集成。


Now: Almost two weeks later, it looks like my issue is resolved. Thanks to a variety of tools, including almost everything on the list and a couple of my personal tricks, we found the primary bottlenecks. However, I'm going to keep testing, exploring, and trying out new profilers as well as new tech. Why? Because I owe it to you guys, because you guys rock. It does slow the timeline down a little, but I'm still very excited to keep trying out new tools.

剧情简介
在许多其他问题中,许多组件最近被切换到不正确的线程模型,导致严重的挂起,因为我们下面的代码突然不再是多线程的。我不能说更多,因为它违反了我的保密协议,但我可以告诉你,通过随意检查甚至正常的代码审查永远不会发现这一点。如果没有分析器、书法和随机暂停,我们仍然会对着美丽的蓝色天空弧线尖叫我们的愤怒。值得庆幸的是,我与一些我见过的最优秀的黑客一起工作,并且我可以接触到充满伟大工具和伟大人物的令人惊叹的“诗篇”。

先生们,我非常感激这一点,唯一遗憾的是我没有足够的代表来奖励你们每个人。我仍然认为这是一个重要的问题,需要比我们迄今为止得到的问题更好的答案。

因此,在接下来的三周里,我每周都会提供我能负担得起的最大赏金,并用我认为不是常识的最好工具来奖励答案。三周后,如果您能原谅我的双关语,我们希望能够积累一份明确的分析器概况。

外卖
使用分析器。对于 Ritchie、Kernighan、Bentley 和 Knuth 来说,它们已经足够好了。我不在乎你认为你是谁。使用分析器。如果您现有的不起作用,请寻找另一个。如果找不到,请编码一个。如果您无法编写代码,或者只是一个小挂断,或者您只是卡住了,请使用随机暂停。如果一切都失败了,请聘请一些研究生来开发一个分析器。


A Longer View
So, I thought it might be nice to write up a bit of a retrospective. I opted to work extensively with Parallel Studios, in part because it is actually built on top of the PIN Tool. Having had academic dealings with some of the researchers involved, I felt that this was probably a mark of some quality. Thankfully, I was right. While the GUI is a bit dreadful, I found IPS to be incredibly useful, though I can't comfortably recommend it for everyone. Critically, there's no obvious way to get line-level hit counts, something that AQT and a number of other profilers provide, and I've found very useful for examining rate of branch-selection among other things. In net, I've enjoyed using AQTime as well, and I've found their support to be really responsive. Again, I have to qualify my recommendation: A lot of their features don't work that well, and some of them are downright crash-prone on Win7x64. XPerf also performed admirably, but is agonizingly slow for the sampling detail required to get good reads on certain kinds of applications.

现在,我不得不说,我认为没有一个明确的选项可以在 W7x64 环境中分析 C++ 代码,但肯定有一些选项根本无法执行任何有用的服务。

A Hacker's Tale

The date is 12/02/10. The days before Christmas are dripping away and I've pretty much hit a major road block as a windows programmer. I've been using AQTime, I've tried sleepy, shiny, and very sleepy, and as we speak, VTune is installing. I've tried to use the VS2008 profiler, and it's been positively punishing as well as often insensible. I've used the random pause technique. I've examined call-trees. I've fired off function traces. But the sad painful fact of the matter is that the app I'm working with is over a million lines of code, with probably another million lines worth of third-party apps.

I need better tools. I've read the other topics. I've tried out each profiler listed in each topic. There simply has to be something better than these junky and expensive options, or ludicrous amounts of work for almost no gain. To further complicate matters, our code is heavily threaded, and runs a number of Qt Event loops, some of which are so fragile that they crash under heavy instrumentation due to timing delays. Don't ask me why we're running multiple event loops. No one can tell me.

Are there any options more along the lines of Valgrind in a windows environment?
Is there anything better than the long swath of broken tools I've already tried?
Is there anything designed to integrate with Qt, perhaps with a useful display of events in queue?

A full list of the tools I tried, with the ones that were really useful in italics:

  • AQTime: Rather good! Has some trouble with deep recursion, but the call graph is correct in these cases, and can be used to clear up any confusion you might have. Not a perfect tool, but worth trying out. It might suit your needs, and it certainly was good enough for me most of the time.
  • Random Pause attack in debug mode: Not enough information enough of the time.
    A good tool but not a complete solution.
  • Parallel Studios: The nuclear option. Obtrusive, weird, and crazily powerful. I think you should hit up the 30 day evaluation, and figure out if it's a good fit. It's just darn cool, too.
  • AMD Codeanalyst: Wonderful, easy to use, very crash-prone, but I think that's an environment thing. I'd recommend trying it, as it is free.
  • Luke Stackwalker: Works fine on small projects, it's a bit trying to get it working on ours. Some good results though, and it definitely replaces Sleepy for my personal tasks.
  • PurifyPlus: No support for Win-x64 environments, most prominently Windows 7. Otherwise excellent. A number of my colleagues in other departments swear by it.
  • VS2008 Profiler: Produces output in the 100+gigs range in function trace mode at the required resolution. On the plus side, produces solid results.
  • GProf: Requires GCC to be even moderately effective.
  • VTune: VTune's W7 support borders on criminal. Otherwise excellent
  • PIN: I'd need to hack up my own tool, so this is sort of a last resort.
  • Sleepy\VerySleepy: Useful for smaller apps, but failing me here.
  • EasyProfiler: Not bad if you don't mind a bit of manually injected code to indicate where to instrument.
  • Valgrind: *nix only, but very good when you're in that environment.
  • OProfile: Linux only.
  • Proffy: They shoot wild horses.

Suggested tools that I haven't tried:

  • XPerf:
  • Glowcode:
  • Devpartner:

Notes:
Intel environment at the moment. VS2008, boost libraries. Qt 4+. And the wretched humdinger of them all: Qt/MFC integration via trolltech.


Now: Almost two weeks later, it looks like my issue is resolved. Thanks to a variety of tools, including almost everything on the list and a couple of my personal tricks, we found the primary bottlenecks. However, I'm going to keep testing, exploring, and trying out new profilers as well as new tech. Why? Because I owe it to you guys, because you guys rock. It does slow the timeline down a little, but I'm still very excited to keep trying out new tools.

Synopsis
Among many other problems, a number of components had recently been switched to the incorrect threading model, causing serious hang-ups due to the fact that the code underneath us was suddenly no longer multithreaded. I can't say more because it violates my NDA, but I can tell you that this would never have been found by casual inspection or even by normal code review. Without profilers, callgraphs, and random pausing in conjunction, we'd still be screaming our fury at the beautiful blue arc of the sky. Thankfully, I work with some of the best hackers I've ever met, and I have access to an amazing 'verse full of great tools and great people.

Gentlefolk, I appreciate this tremendously, and only regret that I don't have enough rep to reward each of you with a bounty. I still think this is an important question to get a better answer to than the ones we've got so far on SO.

As a result, each week for the next three weeks, I'll be putting up the biggest bounty I can afford, and awarding it to the answer with the nicest tool that I think isn't common knowledge. After three weeks, we'll hopefully have accumulated a definitive profile of the profilers, if you'll pardon my punning.

Take-away
Use a profiler. They're good enough for Ritchie, Kernighan, Bentley, and Knuth. I don't care who you think you are. Use a profiler. If the one you've got doesn't work, find another. If you can't find one, code one. If you can't code one, or it's a small hang up, or you're just stuck, use random pausing. If all else fails, hire some grad students to bang out a profiler.


A Longer View
So, I thought it might be nice to write up a bit of a retrospective. I opted to work extensively with Parallel Studios, in part because it is actually built on top of the PIN Tool. Having had academic dealings with some of the researchers involved, I felt that this was probably a mark of some quality. Thankfully, I was right. While the GUI is a bit dreadful, I found IPS to be incredibly useful, though I can't comfortably recommend it for everyone. Critically, there's no obvious way to get line-level hit counts, something that AQT and a number of other profilers provide, and I've found very useful for examining rate of branch-selection among other things. In net, I've enjoyed using AQTime as well, and I've found their support to be really responsive. Again, I have to qualify my recommendation: A lot of their features don't work that well, and some of them are downright crash-prone on Win7x64. XPerf also performed admirably, but is agonizingly slow for the sampling detail required to get good reads on certain kinds of applications.

Right now, I'd have to say that I don't think there's a definitive option for profiling C++ code in a W7x64 environment, but there are certainly options that simply fail to perform any useful service.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(19

我们只是彼此的过ke 2024-10-13 14:21:36

我使用 xperf/ETW 来满足我的所有分析需求。它的学习曲线很陡峭,但功能却非常强大。如果您在 Windows 上进行分析,那么您必须了解 xperf。我经常使用此分析器来查找我的代码和其他人的代码中的性能问题。

在我使用它的配置中:

  • xperf 从每个执行代码的核心中获取 CPU 样本
    多发性硬化症。采样率可提高至8KHz,采样数
    包括用户模式和内核代码。这可以找出什么是
    线程在运行时正在做什么
  • xperf 记录每个上下文
    开关(允许完美重建每个时间的时间)
    线程使用),加上线程切换时的调用堆栈,加上
    哪个线程准备另一个线程的调用堆栈,允许跟踪
    等待链并找出线程未运行
  • xperf 的原因
    记录所有进程的所有文件 I/O
  • xperf 记录所有磁盘 I/O
    xperf从所有进程中
  • 记录哪个窗口处于活动状态、CPU
    频率、CPU 电源状态、UI 延迟等。xperf
  • 还可以记录所有
    来自一个进程的堆分配,来自所有进程的所有虚拟分配
    流程等等。

对于所有流程来说,这是大量数据,全部集中在一条时间线上。 Windows 上没有其他分析器可以做到这一点。

我在博客中广泛介绍了如何使用 xperf/ETW。这些博客文章和一些专业品质的培训视频可以在这里找到:
http://randomascii.wordpress.com/2014/ 08/19/etw-training-videos-available-now/

如果您想了解如果不使用 xperf 会发生什么,请阅读以下博客文章:
http://randomascii.wordpress.com/category/investigative-reporting/
这些是我在其他人的代码中发现的性能问题的故事,这些问题应该由开发人员发现。这包括将 mshtml.dll 加载到 VC++ 编译器中、VC++ 的文件查找中拒绝服务、数量惊人的客户计算机中的热限制、Visual Studio 中的单步执行速度缓慢、硬盘中的 4 GB 分配。磁盘驱动程序、PowerPoint 性能错误等等。

I use xperf/ETW for all of my profiling needs. It has a steep learning curve but is incredibly powerful. If you are profiling on Windows then you must know xperf. I frequently use this profiler to find performance problems in my code and in other people's code.

In the configuration that I use it:

  • xperf grabs CPU samples from every core that is executing code every
    ms. The sampling rate can be increased to 8 KHz and the samples
    include user-mode and kernel code. This allows finding out what a
    thread is doing while it is running
  • xperf records every context
    switch (allowing for perfect reconstruction of how much time each
    thread uses), plus call stacks for when threads are switched in, plus
    call stacks for what thread readied another thread, allowing tracing
    of wait chains and finding out why a thread is not running
  • xperf
    records all file I/O from all processes
  • xperf records all disk I/O
    from all processes
  • xperf records what window is active, the CPU
    frequency, CPU power state, UI delays, etc.
  • xperf can also record all
    heap allocations from one process, all virtual allocations from all
    processes, and much more.

That's a lot of data, all on one timeline, for all processes. No other profiler on Windows can do that.

I have blogged extensively about how to use xperf/ETW. These blog posts, and some professionally quality training videos, can be found here:
http://randomascii.wordpress.com/2014/08/19/etw-training-videos-available-now/

If you want to find out what might happen if you don't use xperf read these blog posts:
http://randomascii.wordpress.com/category/investigative-reporting/
These are tales of performance problems I have found in other people's code, that should have been found by the developers. This includes mshtml.dll being loaded into the VC++ compiler, a denial of service in VC++'s find-in-files, thermal throttling in a surprising number of customer machines, slow single-stepping in Visual Studio, a 4 GB allocation in a hard-disk driver, a powerpoint performance bug, and more.

街角卖回忆 2024-10-13 14:21:36

我刚刚完成了 CxxProf 的第一个可用版本,这是一个用于 C++ 的便携式手动检测分析库。

它实现了以下目标:

  • 轻松集成
  • 在编译时轻松删除库
  • 在运行时轻松删除库
  • 支持多线程应用程序
  • 支持分布式系统
  • 将影响保持在最低限度

这些要点摘自 项目 wiki,查看那里了解更多详细信息。

免责声明:我是 CxxProf 的主要开发人员

I just finished the first usable version of CxxProf, a portable manual instrumented profiling library for C++.

It fulfills the following goals:

  • Easy integration
  • Easily remove the lib during compile time
  • Easily remove the lib during runtime
  • Support for multithreaded applications
  • Support for distributed systems
  • Keep impact on a minimum

These points were ripped from the project wiki, have a look there for more details.

Disclaimer: Im the main developer of CxxProf

我还不会笑 2024-10-13 14:21:36

只是把它扔掉,即使它不是一个成熟的分析器:如果您所追求的只是需要长时间处理事件的挂起事件循环,则 临时工具在 Qt 中很简单。该方法可以轻松扩展,以跟踪每个事件的处理时间以及这些事件是什么,等等。它不是一个通用的分析器,而是一个以事件循环为中心的分析器。

在 Qt 中,所有跨线程信号槽调用都通过事件循环传递,定时器、网络和串行端口通知以及所有用户交互也是如此。因此,观察事件循环是了解应用程序将时间花在哪里的重要一步。

Just to throw it out, even though it's not a full-blown profiler: if all you're after is hung event loops that take long processing an event, an ad-hoc tool is simple matter in Qt. That approach could be easily expanded to keep track of how long did each event take to process, and what those events were, and so on. It's not a universal profiler, but an event-loop-centric one.

In Qt, all cross-thread signal-slot calls are delivered via the event loop, as are timers, network and serial port notifications, and all user interaction,. Thus, observing the event loops is a big step towards understanding where the application is spending its time.

君勿笑 2024-10-13 14:21:36

DevPartner 最初由 NuMega 开发,现在由 MicroFocus 分销,曾经是分析和代码分析(例如内存和资源泄漏)的首选解决方案。
我最近没有尝试过,所以我不能向你保证它会对你有帮助;但我曾经用它取得了很好的结果,所以这是我确实考虑在我们的代码质量流程中重新安装的替代方案(他们提供 14 天的试用期)

DevPartner, originally developed by NuMega and now distributed by MicroFocus, was once the solution of choice for profiling and code analysis (memory and resource leaks for example).
I haven't tried it recently, so I cannot assure you it will help you; but I once had excellent results with it, so that this is an alternative I do consider to re-install in our code quality process (they provide a 14 days trial)

朕就是辣么酷 2024-10-13 14:21:36

虽然你的操作系统是win7,但是程序不能在xp下运行?
在xp下分析一下怎么样,结果应该是win7的提示。

though your os is win7,the programm cann't run under xp?
how about profile it under xp and the result should be a hint for win7.

许你一世情深 2024-10-13 14:21:36

这里列出了很多分析器,我自己尝试了其中一些 - 但我最终基于此编写了自己的分析器:

http://code.google.com/p/high-performance-cplusplus-profiler/

它当然需要您修改代码库,但它是完美的为了缩小瓶颈,应该适用于所有 x86(可能是多核盒子的问题,即它使用 rdtsc,但是 - 这纯粹是为了指示性计时 - 所以我发现它足以满足我的需求..)

There are lots of profilers listed here and I've tried a few of them myself - however I ended up writing my own based on this:

http://code.google.com/p/high-performance-cplusplus-profiler/

It does of course require that you modify the code base, but it's perfect for narrowing down bottlenecks, should work on all x86s (could be a problem with multi-core boxes, i.e. it uses rdtsc, however - this is purely for indicative timing anyway - so I find it's sufficient for my needs..)

骄兵必败 2024-10-13 14:21:36

我使用 Orbit profiler,简单,开源,功能强大! https://orbitprofiler.com/

I use Orbit profiler, easy, open source and powerfull ! https://orbitprofiler.com/

A君 2024-10-13 14:21:35

第一:

时间采样分析器比 CPU 采样分析器更强大。我对 Windows 开发工具不是很熟悉,所以我不能说哪个是哪个。大多数分析器都是 CPU 采样。

CPU 采样分析器每 N 个指令抓取一次堆栈跟踪。
此技术将揭示受 CPU 限制的代码部分。如果这是您应用程序中的瓶颈,那就太棒了。如果您的应用程序线程大部分时间都在争夺互斥锁,那就不太好了。

时间采样分析器每 N 微秒抓取一次堆栈跟踪。
此技术会将“慢速”代码归零。原因是否是 CPU 限制、阻塞 IO 限制、互斥锁限制或缓存颠簸代码部分。简而言之,任何一段代码都会减慢您的应用程序的速度。

因此,如果可能的话,请使用时间采样分析器,尤其是在分析线程代码时。

第二:

采样分析器生成大量数据。这些数据非常有用,但往往太多而难以使用。个人资料数据可视化工具在这里有很大帮助。我发现的个人资料数据可视化的最佳工具是 gprof2dot。不要让名字欺骗了您,它可以处理各种采样分析器输出(AQtime、Sleepy、XPerf 等)。一旦可视化指出了有问题的函数,就跳回到原始配置文件数据,以获得关于真正原因的更好提示。

gprof2dot 工具生成一个 点图描述,然后将其输入到 graphviz 工具。输出基本上是一个调用图,其中的函数根据其对应用程序的影响进行颜色编码。
alt text

让 gprof2dot 生成良好输出的一些提示。

  • 我在图表上使用了 0.001 的 --skew ,这样我就可以轻松地看到热门代码路径。否则,int main() 将主导该图。
  • 如果您使用 C++ 模板做一些疯狂的事情,您可能需要添加 --strip。对于 Boost 来说尤其如此。
  • 我使用 OProfile 生成采样数据。为了获得良好的输出,我需要将其配置为从我的第 3 方和系统库加载调试符号。请务必执行相同的操作,否则您会发现 CRT 占用了应用程序 20% 的时间,而实际情况是 malloc 正在破坏堆并占用了 15% 的时间。

First:

Time sampling profilers are more robust than CPU sampling profilers. I'm not extremely familiar with Windows development tools so I can't say which ones are which. Most profilers are CPU sampling.

A CPU sampling profiler grabs a stack trace every N instructions.
This technique will reveal portions of your code that are CPU bound. Which is awesome if that is the bottle neck in your application. Not so great if your application threads spend most of their time fighting over a mutex.

A time sampling profiler grabs a stack trace every N microseconds.
This technique will zero in on "slow" code. Whether the cause is CPU bound, blocking IO bound, mutex bound, or cache thrashing sections of code. In short what ever piece of code is slowing your application will standout.

So use a time sampling profiler if at all possible especially when profiling threaded code.

Second:

Sampling profilers generate gobs of data. The data is extremely useful, but there is often too much to be easily useful. A profile data visualizer helps tremendously here. The best tool I've found for profile data visualization is gprof2dot. Don't let the name fool you, it handles all kinds of sampling profiler output (AQtime, Sleepy, XPerf, etc). Once the visualization has pointed out the offending function(s), jump back to the raw profile data to get better hints on what the real cause is.

The gprof2dot tool generates a dot graph description that you then feed into a graphviz tool. The output is basically a callgraph with functions color coded by their impact on the application.
alt text

A few hints to get gprof2dot to generate nice output.

  • I use a --skew of 0.001 on my graphs so I can easily see the hot code paths. Otherwise the int main() dominates the graph.
  • If you're doing anything crazy with C++ templates you'll probably want to add --strip. This is especially true with Boost.
  • I use OProfile to generate my sampling data. To get good output I need configure it to load the debug symbols from my 3rd party and system libraries. Be sure to do the same, otherwise you'll see that CRT is taking 20% of your application's time when what's really going on is malloc is trashing the heap and eating up 15%.
南烟 2024-10-13 14:21:35

当您尝试随机暂停时发生了什么?我一直在一个怪物应用程序上使用它。你说它没有提供足够的信息,并且你建议你需要高分辨率。有时人们需要一点帮助来理解如何使用它。

我在 VS 下所做的就是配置堆栈显示,这样它就不会显示函数参数,因为这使得堆栈显示完全不可读,IMO。

然后,在等待期间,我按下“暂停”按钮,采集了大约 10 个样本。我使用^A、^C和^V将它们复制到记事本中,以供参考。然后我研究每一个,试图弄清楚当时正在努力实现什么目标。

如果它试图在 2 个或更多样本上完成某件事,而这件事并不是绝对必要的,那么我就发现了一个实时问题,并且我大致知道修复它会节省多少时间。

有些事情您实际上并不需要知道,例如精确的百分比并不重要,第 3 方代码内部发生的情况也不重要,因为您无法对这些进行任何操作。您可以做的就是在每个堆栈示例上显示的您可以修改的代码中丰富的调用点集。那是你快乐的狩猎场。

我发现的此类事情的示例:

  • 在启动期间,在尝试从 DLL 资源中提取国际化字符串的过程中,深度可能约为 30 层。如果检查实际的字符串,很容易发现这些字符串实际上并不需要国际化,就像它们是用户从未实际看到的字符串一样。

  • 在正常使用期间,某些代码无意中在某些对象中设置了 Modified 属性。该对象来自一个超类,该超类捕获更改并触发通知,这些通知会影响整个数据结构,操纵 UI,以难以预见的方式创建和销毁对象。这种情况可能经常发生 - 通知的意外后果。

  • 逐行、逐单元格填写工作表。事实证明,如果您从值数组一次构建所有行,速度会快得多。

PS如果你是多线程的,当你暂停它时,所有线程都会暂停。查看每个线程的调用堆栈。很可能,只有其中一个才是真正的罪魁祸首,其他人都在闲着。

What happened when you tried random pausing? I use it all the time on a monster app. You said it did not give enough information, and you've suggested you need high resolution. Sometimes people need a little help in understanding how to use it.

What I do, under VS, is configure the stack display so it doesn't show me the function arguments, because that makes the stack display totally unreadable, IMO.

Then I take about 10 samples by hitting "pause" during the time it's making me wait. I use ^A, ^C, and ^V to copy them into notepad, for reference. Then I study each one, to try to figure out what it was in the process of trying to accomplish at that time.

If it was trying to accomplish something on 2 or more samples, and that thing is not strictly necessary, then I've found a live problem, and I know roughly how much fixing it will save.

There are things you don't really need to know, like precise percents are not important, and what goes on inside 3rd-party code is not important, because you can't do anything about those. What you can do something about is the rich set of call-points in code you can modify displayed on each stack sample. That's your happy hunting ground.

Examples of the kinds of things I find:

  • During startup, it can be about 30 layers deep, in the process of trying to extract internationalized character strings from DLL resources. If the actual strings are examined, it can easily turn out that the strings don't really need to be internationalized, like they are strings the user never actually sees.

  • During normal usage, some code innocently sets a Modified property in some object. That object comes from a super-class that captures the change and triggers notifications that ripple throughout the entire data structure, manipulating the UI, creating and desroying obects in ways hard to foresee. This can happen a lot - the unexpected consequences of notifications.

  • Filling in a worksheet row-by-row, cell-by-cell. It turns out if you build the row all at once, from an array of values, it's a lot faster.

P.S. If you're multi-threaded, when you pause it, all threads pause. Take a look at the call stack of each thread. Chances are, only one of them is the real culprit, and the others are idling.

信仰 2024-10-13 14:21:35

我在 AMD CodeAnalyst 方面取得了一些成功。

I've had some success with AMD CodeAnalyst.

抚笙 2024-10-13 14:21:35

MFC 有 OnIdle 函数吗?过去,我有一个近乎实时的应用程序,我必须修复该应用程序在设置为 19.2K 速度时丢失串行数据包的问题,​​而 PentiumD 应该能够跟上。 OnIdle 函数就是杀死东西的原因。我不确定 QT 是否有这个概念,但我也会检查一下。

Do you have an MFC OnIdle function? In the past I had a near real-time app I had to fix that was dropping serial packets when set at 19.2K speed which a PentiumD should have been able to keep up with. The OnIdle function was what was killing things. I'm not sure if QT has that concept, but I'd check for that too.

澜川若宁 2024-10-13 14:21:35

关于 VS Profiler——如果它生成如此大的文件,也许你的采样间隔太频繁了?尝试降低它,因为无论如何你可能都有足够的样本。

理想情况下,请确保在实际锻炼问题区域之前不要收集样本。因此,从暂停收集开始,让您的程序执行其“慢速活动”,然后开始收集。您最多只需要 20 秒的收集时间。此后停止收集。

这应该有助于减少样本文件大小,并且仅捕获分析所需的内容。

Re the VS Profiler -- if it's generating such large files, perhaps your sampling interval is too frequent? Try lowering it, as you probably have enough samples anyway.

And ideally, make sure you're not collecting samples until you're actually exercising the problem area. So start with collection paused, get your program to do its "slow activity", then start collection. You only need at most 20 seconds of collection. Stop collection after this.

This should help reduce your sample file sizes, and only capture what is necessary for your analysis.

山川志 2024-10-13 14:21:35

我已成功使用 Windows 版 PurifyPlus。虽然价格不便宜,但 IBM 提供了一个稍微有缺陷的试用版。使用 quantify 进行分析所需的只是 pdb 文件并使用 /FIXED:NO 进行链接。唯一的缺点:不支持Win7/64。

I have successfully used PurifyPlus for Windows. Although it is not cheap, IBM provides a trial version that is slightly crippled. All you need for profiling with quantify are pdb files and linking with /FIXED:NO. Only drawback: No support for Win7/64.

﹉夏雨初晴づ 2024-10-13 14:21:35

Easyprofiler - 我还没见过它在这里提到过,所以不确定您是否已经看过它。它在收集指标数据方面采用了略有不同的方法。使用其编译时配置文件方法的一个缺点是您必须对代码库进行更改。因此,您需要了解速度慢的地方,并在那里插入分析代码。

不过,根据您最新的评论,听起来您至少取得了一些进展。也许这个工具可以为您提供一些有用的指标。如果没有别的的话,它有一些非常漂亮的图表和图片:P

Easyprofiler - I haven't seen it mentioned here yet so not sure if you've looked at it already. It takes a slightly different approach in how it gathers metric data. A drawback to using its compile-time profile approach is you have to make changes to the code-base. Thus you'll need to have some idea of where the slow might be and insert profiling code there.

Going by your latest comments though, it sounds like you're at least making some headway. Perhaps this tool might provide some useful metrics for you. If nothing else it has some really purdy charts and pictures :P

时光暖心i 2024-10-13 14:21:35

另外两个工具建议。

Luke Stackwalker 有一个可爱的名字(即使它有点不符合我的口味),它不会花费你任何费用,并且你可以获得源代码。它声称也支持多线程程序。所以它肯定值得一试。

http://lukestackwalker.sourceforge.net/

还有 Glowcode,我已经向我指出了它的价值使用:

http://www.glowcode.com/

不幸的是,我有一段时间没有做任何 PC 工作了,所以我还没有尝试过其中任何一个。无论如何,我希望这些建议能有所帮助。

Two more tool suggestions.

Luke Stackwalker has a cute name (even if it's trying a bit hard for my taste), it won't cost you anything, and you get the source code. It claims to support multi threaded programs, too. So it is surely worth a spin.

http://lukestackwalker.sourceforge.net/

Also Glowcode, which I've had pointed out to me as worth using:

http://www.glowcode.com/

Unfortunately I haven't done any PC work for a while, so I haven't tried either of these. I hope the suggestions are of help anyway.

昔日梦未散 2024-10-13 14:21:35

查看 XPerf

这是 MS 提供的免费、非侵入性且可扩展的分析器。它是由 Microsoft 开发的,用于分析 Windows。

Checkout XPerf

This is free, non-invasive and extensible profiler offered by MS. It was developed by Microsoft to profile Windows.

不美如何 2024-10-13 14:21:35

如果您怀疑事件循环,可以覆盖 QCoreApplication::notify()< /a> 和一些手动分析(一两个发件人/事件到计数/时间的映射)?

我认为您首先记录事件类型的频率,然后更仔细地检查这些事件(哪个对象发送它,它包含什么,等等)。跨线程的信号是隐式排队的,因此它们最终会进入事件循环(显然,显式排队的连接也是如此)。

我们这样做是为了捕获并报告事件处理程序中的异常,所以实际上,每个事件都会经过那里。

只是一个想法。

If you're suspicious of the event loop, could overriding QCoreApplication::notify() and dosome manual profiling (one or two maps of senders/events to counts/time)?

I'm thinking that you first log the frequency of event types, then examine those events more carefully (which object sends it, what does it contain, etc). Signals across threads are queued implicitly, so they end up in the event loop (as well explicit queued connections too, obviously).

We've done it to trap and report exceptions in our event handlers, so really, every event goes through there.

Just an idea.

挽清梦 2024-10-13 14:21:35

编辑:我现在看到你在第一篇文章中提到了这一点。该死,我从来没想过我会是那样的人。

您可以使用 Pin 更精细地检测代码。我认为 Pin 可以让你创建一个工具来计算你输入一个函数的次数或者你在那里花费了多少时钟信号,大致模拟 VTune 或 CodeAnalyst 之类的东西。然后,您可以精简哪些函数被检测,直到您的计时问题消失。

Edit: I see now you mentioned this in your first post. Dammit, I never thought I'd be that guy.

You can use Pin to instrument your code with finer granularity. I think Pin would let you create a tool to count how many times you enter a function or how many clockticks you spend there, roughly emulating something like VTune or CodeAnalyst. Then you could strip down which functions get instrumented until your timing issues go away.

心欲静而疯不止 2024-10-13 14:21:35

我可以告诉你我每天都用什么。

a) AMD 代码分析师

  • 这很简单,它可以让您快速了解正在发生的情况。大部分时间都可以。
  • 对于 AMD CPU,它会告诉您有关 cpu 管道的信息,但仅当您有大量循环时才需要此信息,例如在图形引擎、视频编解码器等中。

b) VTune。

  • 它在vs2008中集成得非常好

  • 在你知道热点之后,你不仅需要采样时间,还需要采样其他东西,比如缓存未命中、内存使用情况。这非常重要。设置采样会话并编辑属性。我总是对时间、内存读/写和缓存未命中(三种不同的运行)进行采样

但除了工具之外,您还需要获得分析经验。这意味着了解 CPU/内存/PCI 的工作原理...所以,这是我的第三个选项

c) 单元测试

如果您正在开发需要巨大性能的大型应用程序,这非常重要。如果您无法将应用程序分成几个部分,则将很难跟踪 CPU 使用情况。我没有测试所有的案例和类,但我有硬编码的执行和具有重要功能的输入文件。

我的建议是在几个小测试中使用随机抽样,并尝试标准化配置文件策略。

I can tell you what I use everyday.

a) AMD Code Analyst

  • It is easy, and it will give you a quick overview of what is happening. It will be ok for most of the time.
  • With AMD CPUs, it will tell you info about the cpu pipeline, but you only need this only if you have heavy loops, like in graphic engines, video codecs, etc.

b) VTune.

  • It is very well integrated in vs2008

  • after you know the hotspots, you need to sample not only time, but other things like cache misses, and memory usage. This is very important. Setup a sampling session, and edit the properties. I always sample for time, memory read/write, and cache misses (three different runs)

But more than the tool, you need to get experience with profiling. And that means understanding how the CPU/Memory/PCI works... so, this is my 3rd option

c) Unit testing

This is very important if you are developing a big application that needs huge performance. If you cannot split the app in some pieces, it will be difficult to track cpu usage. I dont test all the cases and classes, but I have hardcoded executions and input files with important features.

My advice is using random sampling in several small tests, and try to standardise a profile strategy.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文