当我的应用程序在客户计算机上锁定时崩溃报告看门狗
我正在使用一个有点不可靠的(Qt/windows)应用程序,该应用程序部分由第三方为我们编写(只是试图将责任转移到那里)。 他们的最新版本更加稳定。 有点。 我们收到的崩溃报告越来越少,但我们收到的报告却很多,称它只是挂起,再也没有回来。 情况各不相同,并且由于我们可以收集的信息很少,我们无法重现这些问题。
因此,理想情况下,我想创建某种看门狗,它会注意到应用程序已锁定,并主动向我们发送崩溃报告。 好主意,但存在问题:
看门狗如何知道进程已挂起? 大概我们对应用程序进行检测,定期向看门狗说“一切正常”,但是我们应该把它放在哪里,以保证它足够频繁地发生,但不太可能位于应用程序在运行时最终所处的代码路径上。锁定。
当崩溃发生时,看门狗应该报告什么信息? Windows 有一个不错的调试 API,因此我确信可以访问所有有趣的数据,但我不确定什么对跟踪问题有用。
I'm working with a somewhat unreliable (Qt/windows) application partly written for us by a third party (just trying to shift the blame there). Their latest version is more stable. Sort of. We're getting fewer reports of crashes, but we're getting lots of reports of it just hanging and never coming back. The circumstances are varied, and with the little information we can gather, we haven't been able to reproduce the problems.
So ideally, I'd like to create some sort of watchdog which notices that the application has locked up, and offers to send a crash report back to us. Nice idea, but there are problems:
How does the watchdog know the process has hung? Presumably we instrument the application to periodically say "all ok" to the watchdog, but where do we put that such that it's guarenteed to happen frequently enough, but isn't likely to be on a code path that the app ends up on when it's locked.
What information should the watchdog report when a crash happens? Windows has a decent debug api, so I'm confident that all the interesting data is accessible, but I'm not sure what would be useful for tracking down the problems.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(4)
您需要组合使用小型转储(如果您不想添加自己的小型转储生成代码,请使用 DrWatson 创建这些转储)和 userdump 以在挂起时触发小型转储创建。
自动检测挂起的问题在于,很难确定什么时候挂起,什么时候它只是缓慢或被 IO 等待阻塞。 我个人更喜欢允许用户在认为应用程序挂起时故意使应用程序崩溃。 除了变得更容易之外(我的应用程序不会经常挂起,如果有的话:)),它还帮助它们“成为解决方案的一部分”。 他们喜欢那样。
首先,查看有关故障转储和符号的经典 bugslayer 文章,其中还有一些关于这些事情的精彩信息。
其次,获取 userdump 允许您创建转储,以及说明用于设置它以生成
转储您拥有转储,在 WinDBG 中打开它,您将能够检查整个程序状态 - 包括线程和调用堆栈、寄存器、内存和函数参数。 我认为您会对使用“~*kp< Windbg 中的“/a>”命令获取每个线程的调用堆栈,“!locks”命令显示所有锁定对象。 我认为您会发现挂起是由于同步对象的死锁造成的,这将很难追踪,因为所有线程都倾向于等待 WaitForSingleObject 调用,但请进一步查看调用堆栈以查看应用程序线程(而不是比“框架”线程(如后台通知和网络例程)。 一旦你缩小了范围,你就可以看到正在进行的调用,可能会向应用程序添加一些日志记录工具,以尝试为你下次失败时提供更多信息。
祝你好运。
诗。 快速谷歌让我想起了这一点:调试死锁。 (CDB 是相当于windbg 的命令行)
You want a combination of a minidump (use DrWatson to create these if you don't want to add your own mini-dump generation code) and userdump to trigger a minidump creation on a hang.
The thing about automatically detecting a hang is that its difficult to decide when somethings hung and when its just slow or blocked by IO wait. I personally prefer to allow the user to crash the app deliberately when they think its hung. Apart from being a lot easier (my apps don't tend to hang often, if at all :) ), it also helps them to "be part of the solution". They like that.
Firstly, check out the classic bugslayer article concerning crashdumps and symbols, which also has some excellent information regarding what's going on with these things.
Second, get userdump which allows you to create the dumps, and instructions for setting it up to generate dumps
When you have the dump, open it in WinDBG, and you will be able to inspect the entire program state - including threads and callstacks, registers, memory and parameters to functions. I think you'll be particularly interested in using the "~*kp" command in Windbg to get the callstack of every thread, and the "!locks" command to show all locking objects. I think you'll find that the hang will be due to a deadlock of synchronisation objects, which will be difficult to track down as all threads tend to wait on a WaitForSingleObject call, but look further down the callstacks to see the application threads (rather than 'framework' threads like background notifications and network routines). Once you've narrowed them down, you can see what calls were being made, possibly add some logging instrumentation to the app to try and give you more information ready for the next time it fails.
Good luck.
Ps. Quick google reminded me of this: Debugging deadlocks. (CDB is the command line equivalent of windbg)
您可以使用 Microsoft Windows 调试工具中的 ADPlus 来识别挂起。 当进程挂起或崩溃时,它将附加到您的进程并创建转储(小型或完整)。
WinDbg 是可移植的,并且不必安装(但您必须配置符号)。 您可以创建一个特殊的安装,该安装将使用批处理启动您的应用程序,该安装还将在您的应用程序启动后运行 ADPlus(ADPlus 是一个命令行工具,因此您应该能够找到一种以某种方式合并它的方法)。
顺便说一句,如果您确实找到了一种方法来识别内部挂起并能够使进程崩溃,您可以注册 Windows 错误报告,以便将故障转储发送给您(如果用户允许)。
You can use ADPlus from Microsoft's Debugging Tools for Windows to identify the hangs. It will attach to your process and create a dump (mini or full) when the process hangs or crashes.
WinDbg is portable, and does not have to be installed (you do have to configure the symbols, though). You can create a special installation that will launch your app using a batch, which will also run ADPlus after your app starts (ADPlus is a commandline tool, so you should be able to find a way to incorporate it somehow).
BTW, if you do find a way to recognize the hang internally and are able to crash the process, you can register with Windows Error Reporting so that the crash dump will be sent to you (should the user allow it).
我认为一个单独的应用程序来进行监视可能会产生比它解决的问题更多的问题。 相反,我建议您首先创建处理程序以在应用程序崩溃时生成小型转储,然后向应用程序添加一个看门狗线程,如果应用程序脱轨,该线程将故意崩溃。 看门狗线程(相对于不同的应用程序)的优点是看门狗应该更容易确定应用程序是否偏离了轨道。
获得 MiniDump 后,您可以四处查看应用程序死机时的状态。 这应该会给你足够的线索来找出问题,或者至少下一步该去哪里寻找。
CodeProject 上有一些关于 MiniDumps 的内容,这可能是一个有用的示例。 MSDN 也有关于它们的更多信息。
I think a separate app to do the watchdogging is likely to produce more problems than it solves. I'd suggest that instead, you first create handlers to generate minidumps when the app crashes, then add a watchdog thread to the application, which will DELIBERATELY crash if the app goes off the rails. The advantage to the watchdog thread (vs a different app) is that it should be easier for the watchdog to know for sure that the app has gone off the rails.
Once you have the MiniDumps, you can poke around to find out the app's state when it dies. This should give you enough clues to figure out the problem, or at least where to look next.
There's some stuff at CodeProject about MiniDumps, which could be a useful example. MSDN has more information about them as well.
不要打扰看门狗。 订阅 Microsoft 的 Windows 错误报告 (winqual.microsoft.com)。 他们会为您收集堆栈跟踪。 事实上,他们今天很可能已经这样做了; 在您注册之前他们不会分享它们。
Don't bother with a watchdog. Subscribe to Microsoft's Windows Error Reproting (winqual.microsoft.com). They'll collect the stacktraces for you. In fact, it's quite likely they're already doing so today; they don't share them until you sign up.