获取异常的正确堆栈
我正在开发一个具有软件看门狗的应用程序。如果某个线程锁定,或者等待超过一分钟,看门狗会引发异常,以便拉下应用程序并重新启动它。
因此,我正在查看其错误堆栈指向看门狗线程的转储文件。我需要确定真正的故障线程。
我正在寻找一般建议和可能的策略,以使用 Windbg 来识别真正的故障线程。
I'm working on an application that has a software watchdog. If some thread locks up, or is in a wait for more than a minute, the watchdog causes an exception in order to pull down the application and restart it.
So I'm looking at dumpfiles whose faulting stack is pointing at the watchdog thread. I need to identify the real faulting thread.
I'm looking for general advice and a possible strategy for using windbg to identify the real faulting thread.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
在转储文件中,您没有要查找的信息(CPU 使用历史记录),但您可以列出所有线程并检查堆栈。这很可能会有所帮助,因为导致关闭的线程应该被阻塞等待或类似的事情
In dump files you don't have information you are looking for (cpu usage history), but you can list all threads and examine stack. This will most probably help because the thread that caused the closed should be blocked in wait or something like that
应用程序验证器是一个很好的工具。 http://msdn.microsoft.com/en -us/library/ms220948%28v=vs.90%29.aspx
应用程序验证程序会减慢代码速度并提供大量命令来分析问题。
一般来说,要在发布模式下查找问题,
否则您需要转储并应用 pdb 并查找out..您应该存档每个版本的 pdp 文件。如果您有 pdp 文件,那么您可以使用调试工具轻松找到堆栈跟踪。如果您没有 pdp,请提供新的构建并保留 pdb 并从客户那里取回小型转储(如果您的客户很灵活)。
Application verifier is good tool. http://msdn.microsoft.com/en-us/library/ms220948%28v=vs.90%29.aspx
Application verifier slows down the code and provide a lot of commands to analyses the issues.
in general, to find the issues in release mode,
otherwise you need to take the dump and apply pdbs and find out..you should archive the pdp files for each release. If you have pdp files then you can easily find the stack trace using debug tool. If you do not have pdp, give a new build and preserve the pdb and get back the minidump from customer (if your customer is flexible).
看门狗应该知道哪个线程没有及时响应,因为它是检查的线程。您提到每个线程都有一个事件..也许看门狗线程的堆栈仍然有一些信息可以帮助您推断哪个线程没有响应?
如果没有,则只需使用 ~*kc 命令转储所有堆栈跟踪,看看是否可以找到任何可疑的内容。
请记住,转储是应用程序状态的快照。这意味着从概率角度来看,任何线程都不应该位于不频繁运行或短暂运行的函数中。
一个技巧(对于相对确定性的应用程序)是获取在非挂起状态下运行的应用程序的一些转储。然后你就会知道堆栈跟踪应该是什么样子。当您检查挂起进程的转储时,某些线程的堆栈应该会跳出来。
The watchdog should know which thread didn't respond in time, since it's the one checking. You mention that there is an event for each thread.. perhaps the watchdog thread's stack still has some info on it that can help you deduce which thread was unresponsive?
If not, then just dump all the stack traces with a ~*kc command and see if you can find anything suspicious.
Keep in mind that a dump is a snapshot of the application's state. That means probability-wise, none of the threads should be in functions that don't run frequently or are short-lived.
One trick (for relatively deterministic applications) is to get a few dumps of the application running in a non-hung state. Then you'll know what the stack traces SHOULD look like. When you examine the dump of the hung process, certain threads' stacks should jump out at you.