诊断生产 .NET 桌面程序中的应用程序挂起

发布于 2024-07-07 02:54:51 字数 1257 浏览 13 评论 0原文

我有麻烦了。我正在开发的应用程序的一位用户偶尔但经常遇到应用程序挂起的情况。

发生这种情况时，我们会在计算机的事件日志中找到一个来源为“应用程序挂起”的条目，其中包含信息性消息“挂起应用程序 [我的应用程序]，版本 [正确版本]，挂起模块hangapp，版本 0.0.0.0，挂起地址0x00000000。”

我正在记录应用程序引发的所有未处理的异常，并且发生这种情况时我的日志文件中没有任何条目。

我当前的工作假设是，这种挂起是在应用程序调用不安全的旧版 API 期间发生的。这不会让我感到惊讶；我使用这个 API 已经很多年了，虽然我以前没有见过它挂起，但它确实是蹩脚的代码。此外，用户报告说该程序似乎随机挂起。我认为这不是真的。并不是说我不相信她，而是与旧版 API 对话的代码正在由 BackgroundWorker 调用的方法内运行。如果后台线程导致应用程序挂起，则用户可能会觉得这是随机发生的。

所以，我有两个问题，一是具体的，一是一般性的。

具体问题：我希望如果在非 UI 线程上运行的方法挂起，它只会杀死该线程。它真的会杀死整个应用程序吗？

一般问题：

我已经记录了所有未处理的异常。我的程序已经设置为使用跟踪（尽管我需要添加检测代码来跟踪可疑方法中的活动）。还有其他我应该做的事情吗？是否有诊断工具可以在 .NET 应用程序挂起时进行某种崩溃后分析？ .NET 框架内部是否有可以调用的机制来捕获更多（且更可用）的数据？

编辑：仔细检查我的代码，我记得它对BackgroundWorker的所有使用都是通过我实现的实用程序类来包装异常处理程序中调用的方法。该处理程序记录异常，然后将其作为实用程序对象的属性返回。 UI 线程中的完成事件处理程序重新抛出异常（不太理想，因为我丢失了调用堆栈，但它已经被记录），导致 UI 的主异常处理程序将异常报告给消息框，然后终止应用程序。

由于这一切都没有发生，我非常有信心后台线程中不会抛出异常。好吧，无论如何，.NET 也不例外。

进一步跟进：

幸运的是，我现在已经从用户那里获得了足够的数据，可以确定旧版 API 中不会发生挂起。这意味着这显然是我做错的事情，这意味着我可以解决它，所以，赢了。这也意味着我可以通过跟踪来隔离问题，这是另一个胜利。我对这个问题得到的答案感到非常高兴；我什至更高兴的是，我可能不需要它们来解决这个问题。

另外：PostSharp 非常出色。如果您需要向现有应用程序添加检测代码，那么您几乎肯定应该使用它。

原文

I have trouble. One of the users of an application I'm developing is occasionally, but regularly, experiencing an application hang.

When this happens, we find an entry with a source of "Application Hang" in the machine's Event Log, with the informative message "Hanging application [my app], version [the right version], hang module hungapp, version 0.0.0.0, hang address 0x00000000."

I'm logging all unhandled exceptions that my application throws, and there aren't any entries in my log files when this happens.

My current working hypothesis is that this hang is occurring during the application's call to an unsafe legacy API. This wouldn't astonish me; I've been working with this API for years and while I haven't seen it hang before, it's genuinely crappy code. Also, the user's reporting that the program seems to hang at random times. I don't think this is really true. Not that I don't believe her, but that the code that talks to the legacy API is running inside a method called by a BackgroundWorker. If the background thread were making the application hang, this could very much look to the user like it were happening randomly.

So, I have two questions, one specific, one general.

The specific question: I would expect that if a method running on a non-UI thread were to hang, it would just kill the thread. Would it actually kill the whole application?

The general question:

I'm already logging all unhandled exceptions. My program's already set up to use tracing (though I'm going to need to add instrumentation code to trace activity in suspect methods). Are there other things I should be doing? Are there diagnostic tools that allow some kind of post-crash analysis when a .NET application hangs? Are there mechanisms inside the .NET framework that I can invoke to capture more (and more usable) data?

EDIT: On a closer examination of my code, I'm remembering that all of its usage of BackgroundWorker is through a utility class I implemented that wraps the method called in an exception handler. This handler logs the exception and then returns it as a propoerty of the utility object. The completion event handler in the UI thread re-throws the exception (less than ideal, since I lose the call stack, but it's already been logged), causing the UI's main exception handler to report the exception to a message box and then terminate the app.

Since none of that is happening, I'm pretty confident that there's no exception being thrown in the background thread. Well, no .NET exception, anyway.

Further followup:

Mercifully, I've now gotten enough data from the users to be certain that the hang isn't occurring inside the legacy API. This means it's clearly something I'm doing wrong, which means that I can fix it, so, win. It also means that I can isolate the problem through tracing, which is another win. I'm very happy at the answers I got to this question; I"m even happier that I probably don't need them for this problem.

Also: PostSharp is outstanding. If you need to add instrumentation code to an existing application, you almost certainly should be using it.

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

裸钻 2024-07-14 02:54:51

在回答您的具体问题时，当后台/工作线程阻塞或挂起时，对应用程序其余部分的影响将在很大程度上取决于应用程序中线程之间发生的同步。没有特别的原因说明它一定会挂起整个应用程序，但完全有可能会挂起。

诊断此问题的一种可能方法是在进程挂起时生成进程转储（假设有人在周围注意到它发生时）。这可以使用 dbghelp.dll 中的 MiniDumpWriteDump 来完成。编写一个可以转储进程（基于其 pid）的简单工具相当简单，可以将其提供给遇到问题的客户。由于这是一个托管应用程序，因此最好使用完整内存转储 (MiniDumpWithFullMemory)，但普通转储仍应包含一些有用的信息。获得转储后，您可以使用 Windbg 或您选择的事后调试器来查看可能发生的情况。

如果您选择这条路线，这篇 msdn 文章是一个很好的起点托管转储调试点。

回复收藏 0 原文