诊断生产 .NET 桌面程序中的应用程序挂起
我有麻烦了。 我正在开发的应用程序的一位用户偶尔但经常遇到应用程序挂起的情况。
发生这种情况时,我们会在计算机的事件日志中找到一个来源为“应用程序挂起”的条目,其中包含信息性消息“挂起应用程序 [我的应用程序],版本 [正确版本],挂起模块hangapp,版本 0.0.0.0,挂起地址0x00000000。”
我正在记录应用程序引发的所有未处理的异常,并且发生这种情况时我的日志文件中没有任何条目。
我当前的工作假设是,这种挂起是在应用程序调用不安全的旧版 API 期间发生的。 这不会让我感到惊讶; 我使用这个 API 已经很多年了,虽然我以前没有见过它挂起,但它确实是蹩脚的代码。 此外,用户报告说该程序似乎随机挂起。 我认为这不是真的。 并不是说我不相信她,而是与旧版 API 对话的代码正在由 BackgroundWorker 调用的方法内运行。 如果后台线程导致应用程序挂起,则用户可能会觉得这是随机发生的。
所以,我有两个问题,一是具体的,一是一般性的。
具体问题:我希望如果在非 UI 线程上运行的方法挂起,它只会杀死该线程。 它真的会杀死整个应用程序吗?
一般问题:
我已经记录了所有未处理的异常。 我的程序已经设置为使用跟踪(尽管我需要添加检测代码来跟踪可疑方法中的活动)。 还有其他我应该做的事情吗? 是否有诊断工具可以在 .NET 应用程序挂起时进行某种崩溃后分析? .NET 框架内部是否有可以调用的机制来捕获更多(且更可用)的数据?
编辑:仔细检查我的代码,我记得它对BackgroundWorker的所有使用都是通过我实现的实用程序类来包装异常处理程序中调用的方法。 该处理程序记录异常,然后将其作为实用程序对象的属性返回。 UI 线程中的完成事件处理程序重新抛出异常(不太理想,因为我丢失了调用堆栈,但它已经被记录),导致 UI 的主异常处理程序将异常报告给消息框,然后终止应用程序。
由于这一切都没有发生,我非常有信心后台线程中不会抛出异常。 好吧,无论如何,.NET 也不例外。
进一步跟进:
幸运的是,我现在已经从用户那里获得了足够的数据,可以确定旧版 API 中不会发生挂起。 这意味着这显然是我做错的事情,这意味着我可以解决它,所以,赢了。 这也意味着我可以通过跟踪来隔离问题,这是另一个胜利。 我对这个问题得到的答案感到非常高兴; 我什至更高兴的是,我可能不需要它们来解决这个问题。
另外:PostSharp 非常出色。如果您需要向现有应用程序添加检测代码,那么您几乎肯定应该使用它。
I have trouble. One of the users of an application I'm developing is occasionally, but regularly, experiencing an application hang.
When this happens, we find an entry with a source of "Application Hang" in the machine's Event Log, with the informative message "Hanging application [my app], version [the right version], hang module hungapp, version 0.0.0.0, hang address 0x00000000."
I'm logging all unhandled exceptions that my application throws, and there aren't any entries in my log files when this happens.
My current working hypothesis is that this hang is occurring during the application's call to an unsafe legacy API. This wouldn't astonish me; I've been working with this API for years and while I haven't seen it hang before, it's genuinely crappy code. Also, the user's reporting that the program seems to hang at random times. I don't think this is really true. Not that I don't believe her, but that the code that talks to the legacy API is running inside a method called by a BackgroundWorker. If the background thread were making the application hang, this could very much look to the user like it were happening randomly.
So, I have two questions, one specific, one general.
The specific question: I would expect that if a method running on a non-UI thread were to hang, it would just kill the thread. Would it actually kill the whole application?
The general question:
I'm already logging all unhandled exceptions. My program's already set up to use tracing (though I'm going to need to add instrumentation code to trace activity in suspect methods). Are there other things I should be doing? Are there diagnostic tools that allow some kind of post-crash analysis when a .NET application hangs? Are there mechanisms inside the .NET framework that I can invoke to capture more (and more usable) data?
EDIT: On a closer examination of my code, I'm remembering that all of its usage of BackgroundWorker is through a utility class I implemented that wraps the method called in an exception handler. This handler logs the exception and then returns it as a propoerty of the utility object. The completion event handler in the UI thread re-throws the exception (less than ideal, since I lose the call stack, but it's already been logged), causing the UI's main exception handler to report the exception to a message box and then terminate the app.
Since none of that is happening, I'm pretty confident that there's no exception being thrown in the background thread. Well, no .NET exception, anyway.
Further followup:
Mercifully, I've now gotten enough data from the users to be certain that the hang isn't occurring inside the legacy API. This means it's clearly something I'm doing wrong, which means that I can fix it, so, win. It also means that I can isolate the problem through tracing, which is another win. I'm very happy at the answers I got to this question; I"m even happier that I probably don't need them for this problem.
Also: PostSharp is outstanding. If you need to add instrumentation code to an existing application, you almost certainly should be using it.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(7)
在回答您的具体问题时,当后台/工作线程阻塞或挂起时,对应用程序其余部分的影响将在很大程度上取决于应用程序中线程之间发生的同步。 没有特别的原因说明它一定会挂起整个应用程序,但完全有可能会挂起。
诊断此问题的一种可能方法是在进程挂起时生成进程转储(假设有人在周围注意到它发生时)。 这可以使用 dbghelp.dll 中的 MiniDumpWriteDump 来完成。 编写一个可以转储进程(基于其 pid)的简单工具相当简单,可以将其提供给遇到问题的客户。 由于这是一个托管应用程序,因此最好使用完整内存转储 (MiniDumpWithFullMemory),但普通转储仍应包含一些有用的信息。 获得转储后,您可以使用 Windbg 或您选择的事后调试器来查看可能发生的情况。
如果您选择这条路线,这篇 msdn 文章 是一个很好的起点托管转储调试点。
In answer to your specific question, when a background/worker thread blocks or hangs, the effect on the rest of the application would depend a lot on the synchronization happening between the threads in the app. There's no particular reason why it would necessarily hang the whole app, but it's entirely possible that it would.
One possible way to diagnose this would be to generate a dump of the process while it's hung (assuming someone is around to notice when it happens). This would be done using MiniDumpWriteDump, from dbghelp.dll. It's fairly straightforward to write a simple tool that can dump a process (based on its pid), which could be provided to the customer experiencing the issue. Since this is a managed app, a full memory dump is preferable (MiniDumpWithFullMemory), but a normal dump should still have some useful info. Once you have the dump, you can use windbg or your post-mortem debugger of choice to see what might be going on.
If you go this route, this msdn article is a good starting point for managed dump debugging.
我建议围绕您认为是问题根源的调用添加更详细的日志记录。
如果您使用的是 Vista,则可以使用新的 Vista API 在应用程序崩溃时让 Windows 调用您的代码。 当您看到 Office/IE 等 MS 产品表示“正在尝试恢复您的数据”时,就会发生这种情况。
I would suggest adding more detailed logging around the calls you believe are the source of the problem.
If you're on Vista you can use the a new Vista API to have Windows call into your code when your app crashes. This is what's happening when you see MS products like Office/IE say they are "Attempting to recover you data".
想法1)进入.net框架代码(来自我工作的知识库):
如果你已经安装了VS2008 SP1,你需要做的就是进入“工具”->“ 选项-> 调试
符号,添加 http://referencesource.microsoft.com/symbols现在调试某些内容时 调用堆栈中的框架代码呈灰色,只需右键单击调用行并选择“加载符号”即可。
想法2)设置远程调试 http://msdn.microsoft.com/en-我们/library/y7f5zaaa.aspx
Thought 1) step into .net framework code (from a KB at my work):
If you’ve installed VS2008 SP1, all you need to do is go to Tools -> Options -> Debugging
Now when debugging something that’s got greyed-out framework code in the call stack, just right click the call line and choose Load Symbols.
Thought 2) Setup remote debugging http://msdn.microsoft.com/en-us/library/y7f5zaaa.aspx
如果您控制的线程上有未处理的执行,它将导致您的整个进程崩溃应用程序。 一旦线程死亡,就没有办法“处理”这个问题。 您可能想了解如何将 APM 与代理一起使用 。 这提供了一层保护,防止其他线程上抛出的异常,因为当您调用 EndInvoke() 时会捕获并转发异常。
至于你还能做什么,我第二个 查理的< /a> 回答。
If you have an unhandled execption on a thread you control, it will bring down your entire application. There's no way to "handle" this once the thread dies. You might want to look into how you can use the APM with delegates. This provides a layer of protection from exceptions thrown on other threads, as the exception is captured and brought forward when you call EndInvoke().
As for what else you can do, I second Charlie's answer.
如果可能,将后台工作线程替换为 SafeThread 并查看是否捕获疑似异常。 如果没有,那么抛出的异常不是 CLR 异常,并且您可能无法从“纯”.NET 代码处理它 [C++ 中的 SEH 可能可以工作]
编辑:好吧,不是这样。 也许这个或此可能有帮助。 祝你好运!
if possible, replace the background worker thread with a SafeThread and see if that catches the suspected exception. If it doesn't, then the exception being thrown is not a CLR exception and you may be unable to handle it from 'pure' .NET code [SEH from C++ might work though]
EDIT: ok that's not it. maybe this or this might help. Good luck!
Robert,如果所有这些解决方案都让您失败,并且您仍然认为旧版 API 是罪魁祸首,那么也许答案是将旧版 API 沙箱到其自己的 AppDomain 或进程中。
.NET 3.5 框架使使用 System.AddIn API 可以非常轻松地完成此操作。
Robert, if all these solutions fail you, and you're still thinking the legacy API is the culprit, perhaps the answer is to sandbox the legacy API into its own AppDomain or process.
The .NET 3.5 framework makes this pretty easy to do using the System.AddIn APIs.
我建议附加 WinDbg(是的,其中一个核心的东西)并使用 SOS(Son Of Strike)和 SOSEx 来分析死锁(!dlk)或手动检查同步块(!syncblk)以查找相互等待的锁。
I'd suggest attaching WinDbg (yeah, one of those hardcore things) and using SOS (Son Of Strike) and SOSEx to analyze deadlocks (!dlk) or manually check sync blocks (!syncblk) to find mutually waiting locks.