Rad Studio 调试器线程中未处理的异常

发布于 2024-11-16 10:35:55 字数 3294 浏览 9 评论 0原文

我有一个大型应用程序，最近在调试器中运行时开始表现出相当奇怪的行为。首先，基础知识：

OS: Windows 7 64-bit.
Application: Multithreaded VCL app with many dlls, bpls, and other components.
Compiler/IDE: Embarcadero RAD Studio 2010.

观察到的症状是这样的：当调试器附加到我的应用程序时，某些任务会导致应用程序崩溃。细节更令人困惑：我的应用程序停止并显示一条 Windows 消息：“YourApplication 已停止工作”。它还可以帮助您向 Microsoft 发送小型转储。

应该注意的是：当调试器未连接时应用程序不会崩溃。此外，调试器在应用程序运行时不会指示任何异常或其他问题。

设置和单步调试断点似乎会影响应用程序崩溃的点，但我怀疑这是调试有问题的线程之外的线程的一种症状。

这些崩溃也发生在我同事的计算机上，与我观察到的行为相同。这使我不会特别怀疑我的计算机上的某些东西安装失败。我遇到此问题的同事也运行 Windows 7 64 位。我没有同事没有遇到过这个问题。

我收集并分析了一些崩溃的完整转储。我发现故障实际上每次都发生在同一个地方。以下是来自转储的异常数据（它总是相同的，当然 ThreadId 除外）：

Exception Information

ThreadId:         0x000014C0
Code:             0x4000001F Unknown (4000001F)
Address:          0x773F2507
Flags:            0x00000000
NumberParameters: 0x00000001
    0x00000000

Google 显示代码 0x4000001F 实际上是 STATUS_WX86_BREAKPOINT。 Microsoft 无助地将其描述为“Win32 x86 仿真子系统使用的异常状态代码”。

以下是堆栈详细信息（似乎没有变化）：

0x773F2507: ntdll.dll+0x000A2507: RtlQueryCriticalSectionOwner + 0x000000E8
0x773F3DAB: ntdll.dll+0x000A3DAB: RtlQueryProcessLockInformation + 0x0000020D
0x773D2ED9: ntdll.dll+0x00082ED9: RtlUlonglongByteSwap + 0x00005C69
0x773F3553: ntdll.dll+0x000A3553: RtlpQueryProcessDebugInformationRemote + 0x00000044
0x74F73677: kernel32.dll+0x00013677: BaseThreadInitThunk + 0x00000012
0x77389F02: ntdll.dll+0x00039F02: RtlInitializeExceptionChain + 0x00000063
0x77389ED5: ntdll.dll+0x00039ED5: RtlInitializeExceptionChain + 0x00000036

值得注意的是，0x773F24ED 处似乎有一个函数 Epilog，这表明 RtlQueryCriticalSectionOwner 是一个转移注意力的东西。同样，函数 Epilog 对 RtlQueryProcessLockInformation 产生了怀疑。 0x5C69 偏移量使人对 RtlUlonglongByteSwap 产生怀疑。不过，其他符号看起来是合法的。

具体来说，RtlpQueryProcessDebugInformationRemote 看起来是合法的。互联网上的一些人 (http://www.cygwin. com/ml/cygwin-talk/2006-q2/msg00050.html）似乎认为它是由调试器创建来收集调试信息的。这个理论对我来说似乎是合理的，因为它似乎只在连接调试器时才会出现。

与往常一样，当某件事发生破裂时，某些事情发生了改变，导致了它的破裂。在这种情况下，某些东西正在动态加载新的 dll。我可以通过不动态加载特定的 dll 来使崩溃停止发生。我不相信 dll 加载是相关的，但这里是详细信息，以防万一：

dll 源是 C。以下是未设置为默认值的编译选项：（

Language Compliance: ANSI
Merge duplicate strings: True
Read-only strings: True
PCH usage: Do not use
Dynamic RTL: False

项目选项说 False 是默认值动态 RTL，尽管在我创建 dll 项目时它被设置为 True。）

该 dll 使用 LoadLibrary 加载并使用 FreeLibrary 释放。模块的加载和卸载似乎一切顺利。然而，在卸载库后不久（使用 FreeLibrary），上述线程就会使程序崩溃。为了进行调试，我删除了对库的所有实际调用（包括用于更多测试的 DllMain）。调用或不调用、DllMain 或无 DllMain 或其他任何内容的组合似乎都不会以任何方式改变崩溃的行为。简单地加载和卸载 dll 会在稍后引发崩溃。

此外，更改 dll 以使用动态 RTL 也会导致调试器线程崩溃停止。这是不可取的，因为编译后的 dll 实际上应该可以在 CodeGear 运行时不可用的情况下使用。另外，dll 大小也很重要。 dll 中包含的 C 代码不使用任何库。（它不包含标头，甚至标准库标头。没有 malloc/free，没有 printf，什么都没有。它只包含完全依赖于其输入并且不需要动态分配的函数。）这也是不可取的，因为“修复”在不了解其工作原理的情况下更改内容直到其工作为止确实不是一个好计划。（它往往会导致错误重复出现和奇怪的编码实践。但实际上，在这一点上，如果我找不到其他任何东西，我可能会在这方面承认失败。）

最后，我的问题可能与以下之一有关这些问题：

任何想法或建议都会赞赏。

原文

I have a large application that recently started exhibiting rather strange behavior when running in a debugger. First, the basics:

OS: Windows 7 64-bit.
Application: Multithreaded VCL app with many dlls, bpls, and other components.
Compiler/IDE: Embarcadero RAD Studio 2010.

The observed symptom is this: While the debugger is attached to my application, certain tasks cause the application to crash. The particulars are furthermore perplexing: My application stops with a Windows message saying, "YourApplication has stopped working." And it helpfully offers to send a minidump to Microsoft.

It should be noted: the application doesn't crash when the debugger is not attached. Also, the debugger doesn't indicate any exceptions or other issues while the application is running.

Setting and stepping through breakpoints seems to affect the point at which the application crashes, but I suspect that is a symptom of debugging a thread other than the problematic one.

These crashes also occur on the computers on my colleagues, with the same behavior I observe. This leads me to not suspect a failed installation of something on my computer particularly. My colleagues experiencing the issue are also running Windows 7 64-bit. I have no colleagues not experiencing the issue.

I've collected an analyzed a number of full dumps from the crashes. I discovered that the failure was actually happening in the same place each time. Here is the exception data from the dumps (it is always the same, except of course the ThreadId):

Exception Information

ThreadId:         0x000014C0
Code:             0x4000001F Unknown (4000001F)
Address:          0x773F2507
Flags:            0x00000000
NumberParameters: 0x00000001
    0x00000000

Google reveals that Code 0x4000001F is actually STATUS_WX86_BREAKPOINT. Microsoft unhelpfully describes it as "An exception status code that is used by the Win32 x86 emulation subsystem."

Here are the stack details (which don't seem to vary):

0x773F2507: ntdll.dll+0x000A2507: RtlQueryCriticalSectionOwner + 0x000000E8
0x773F3DAB: ntdll.dll+0x000A3DAB: RtlQueryProcessLockInformation + 0x0000020D
0x773D2ED9: ntdll.dll+0x00082ED9: RtlUlonglongByteSwap + 0x00005C69
0x773F3553: ntdll.dll+0x000A3553: RtlpQueryProcessDebugInformationRemote + 0x00000044
0x74F73677: kernel32.dll+0x00013677: BaseThreadInitThunk + 0x00000012
0x77389F02: ntdll.dll+0x00039F02: RtlInitializeExceptionChain + 0x00000063
0x77389ED5: ntdll.dll+0x00039ED5: RtlInitializeExceptionChain + 0x00000036

It is worth noting that there appears to be a function epilog at 0x773F24ED, which rather suggests that the RtlQueryCriticalSectionOwner is a red herring. Likewise, a function epilog casts doubt on RtlQueryProcessLockInformation. The 0x5C69 offset casts doubt on the RtlUlonglongByteSwap. The other symbols look legit, though.

Specifically, RtlpQueryProcessDebugInformationRemote looks legitimate. Some people on the internet (http://www.cygwin.com/ml/cygwin-talk/2006-q2/msg00050.html) seem to think that it is created by the debugger to collect debug information. That theory seems sound to me, since it only seems to appear when the debugger is attached.

As always, when something breaks, something changed that broke it. In this case, that something is dynamically loading a new dll. I can cause the crash to stop happening by not dynamically loading the particular dll. I'm not convinced that the dll loading is related, but here are the details, just in case:

The dll source is C. Here are the compile options that are not set to the default:

Language Compliance: ANSI
Merge duplicate strings: True
Read-only strings: True
PCH usage: Do not use
Dynamic RTL: False

(The Project Options say False is default for Dynamic RTL, though it was set to True when I created the dll project.)

The dll is loaded with LoadLibrary and freed with FreeLibrary. All seems to be fine with the loading and unloading of the module. However, shortly after the library is unloaded (with FreeLibrary), the aforementioned thread crashes the program. For debugging, I removed all actual calls to the library (including, for more testing, DllMain). No combination of calls or not calls, DllMain or no DllMain, or anything else seemed to alter the behavior of the crash in any way. Simply loading and unloading the dll invokes the crash later on.

Furthermore, changing the dll to use the Dynamic RTL also causes the debugger thread crash to cease. This is undesirable because the compiled dll really should be usable without CodeGear Runtime available. Also, dll size is important. The C code contained in the dll does not make use of any libraries. (It includes no headers, even standard library headers. No malloc/free, no printf, no nothin'. It contains only functions that depend exclusively on their inputs and do not require dynamic allocation.) It is also undesirable because "fixing" a bug by changing stuff until it works without understanding why it works is really never a good plan. (It tends to lead to bug recurrences and strange coding practices. But really, at this point, if I can't find anything else, I may admit defeat on this count.)

And finally, my issue may be related to one of these issues:

Any ideas or suggestions would be appreciated.

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

浪推晚风 2024-11-23 10:35:55

我通过使用 PatchINT3 解决方法的修改版本解决了上述问题，该解决方法于 2007 年针对 BDS 2006 发布：

procedure PatchINT3;
const
  INT3: Byte = $CC;
  NOP: Byte = $90;
var
  NTDLL: THandle;
  BytesWritten: DWORD;
  Address: PByte;
begin
  if Win32Platform <> VER_PLATFORM_WIN32_NT then
    Exit;
  NTDLL := GetModuleHandle('NTDLL.DLL');
  if NTDLL = 0 then
    Exit;
  Address := GetProcAddress(NTDLL, 'RtlQueryCriticalSectionOwner');
  if Address = nil then
    Exit;
  Inc(Address, $E8);
  try
    if Address^ <> INT3 then
      Exit;

    if WriteProcessMemory(GetCurrentProcess, Address, @NOP, 1, BytesWritten)
      and (BytesWritten = 1) then
      FlushInstructionCache(GetCurrentProcess, Address, 1);
  except
    //Do not panic if you see an EAccessViolation here, it is perfectly harmless!
    on EAccessViolation do
      ;
  else
    raise;
  end;
end;

在线程中加载 DLL 后调用此例程一次。该补丁修复了 ntdll.dll 版本 6.1.7601.17725 中的用户断点并将其更改为 NOP。

如果预期地址处没有用户断点（INT3 (=$CC) 操作码），则修补例程不执行任何操作并退出。

希望有帮助，
安德烈亚斯

脚注
PatchINT3 的原始来源可以在这里找到：
http://coding.derkeiler.com/Archive/ Delphi/borland.public.delphi.non-technical/2007-01/msg04431.html

脚注2
C++ 中的相同函数：

void PatchINT3()
{
   unsigned char INT3   = 0xCC;
   unsigned char NOP    = 0x90;

   if (Win32Platform != VER_PLATFORM_WIN32_NT)
   {
      return;
   }

   HMODULE ntdll = GetModuleHandle(L"NTDLL.DLL");
   if (ntdll == NULL)
   {
      return;
   }

   unsigned char *address = (unsigned char*)GetProcAddress(ntdll,
      "RtlQueryCriticalSectionOwner");
   if (address == NULL)
   {
      return;
   }

   address += 0xE8;

   try
   {
      if (*address != INT3)
      {
         return;
      }

      unsigned long bytes_written = 0;
      if (WriteProcessMemory(GetCurrentProcess(), address, &NOP, 1,
         &bytes_written) && (bytes_written == 1))
      {
         FlushInstructionCache(GetCurrentProcess, address, 1);
      }
   }
   catch (EAccessViolation &e)
   {
      //Do not panic if you see an EAccessViolation
      //here, it is perfectly harmless!
   }
   catch(...)
   {
      throw;
   }
}

I solved the above mentioned problem by using a modified version of the PatchINT3 workaround, which was published in 2007 for BDS 2006:

procedure PatchINT3;
const
  INT3: Byte = $CC;
  NOP: Byte = $90;
var
  NTDLL: THandle;
  BytesWritten: DWORD;
  Address: PByte;
begin
  if Win32Platform <> VER_PLATFORM_WIN32_NT then
    Exit;
  NTDLL := GetModuleHandle('NTDLL.DLL');
  if NTDLL = 0 then
    Exit;
  Address := GetProcAddress(NTDLL, 'RtlQueryCriticalSectionOwner');
  if Address = nil then
    Exit;
  Inc(Address, $E8);
  try
    if Address^ <> INT3 then
      Exit;

    if WriteProcessMemory(GetCurrentProcess, Address, @NOP, 1, BytesWritten)
      and (BytesWritten = 1) then
      FlushInstructionCache(GetCurrentProcess, Address, 1);
  except
    //Do not panic if you see an EAccessViolation here, it is perfectly harmless!
    on EAccessViolation do
      ;
  else
    raise;
  end;
end;

Call this routine once after you have loaded the DLL in your thread. The patch fixes a user breakpoint in ntdll.dll version 6.1.7601.17725 and changes it to a NOP.

If there is no user breakpoint (INT3 (=$CC) opcode) at the expected address, the patch routine does nothing and exits.

Hope that helps,
Andreas

Footnote
The original source of PatchINT3 can be found here:
http://coding.derkeiler.com/Archive/Delphi/borland.public.delphi.non-technical/2007-01/msg04431.html

Footnote2
The same function in C++:

void PatchINT3()
{
   unsigned char INT3   = 0xCC;
   unsigned char NOP    = 0x90;

   if (Win32Platform != VER_PLATFORM_WIN32_NT)
   {
      return;
   }

   HMODULE ntdll = GetModuleHandle(L"NTDLL.DLL");
   if (ntdll == NULL)
   {
      return;
   }

   unsigned char *address = (unsigned char*)GetProcAddress(ntdll,
      "RtlQueryCriticalSectionOwner");
   if (address == NULL)
   {
      return;
   }

   address += 0xE8;

   try
   {
      if (*address != INT3)
      {
         return;
      }

      unsigned long bytes_written = 0;
      if (WriteProcessMemory(GetCurrentProcess(), address, &NOP, 1,
         &bytes_written) && (bytes_written == 1))
      {
         FlushInstructionCache(GetCurrentProcess, address, 1);
      }
   }
   catch (EAccessViolation &e)
   {
      //Do not panic if you see an EAccessViolation
      //here, it is perfectly harmless!
   }
   catch(...)
   {
      throw;
   }
}

回复收藏 0 原文

め七分饶幸 2024-11-23 10:35:55

只是一个想法...

也许您需要关闭崩溃的线程。您所观察到的状态似乎比实际错误晚一些。

首先，你的堆栈跟踪对我来说似乎不完整。该线程堆栈的基本根是什么？这条线索的起源是什么？

并且，在 VS 调试器中，可以在异常时中断（调试->异常...->[添加]）。那么所有线程都会在异常发生的那一刻冻结。我不知道 RAD，但以编程方式执行此操作的技巧似乎是 WaitForDebugEvent()。

我可能是错的，但我认为错误很可能存在于调试器中，而不是您的代码中。在这种情况下，恕我直言，一个丑陋的解决方法是完全可以原谅的。祝你好运！

回复收藏 0 原文

城歌 2024-11-23 10:35:55

我无法回答这个问题，因为我看不到代码...

但是...

1) 在 Borland C++ 中，至少在 BDS 的 C++ 中，多线程库中的 realloc 函数可能存在一个可证明的问题。您的 C++ 代码使用 realloc 吗？

2）您显示的堆栈很可能是由于您的代码实际命中“CALL BAD_ADRESS”而被调用的，并且这可能是由于您自己的代码中的错误而发生的。换句话说，在您加载的 DLL 中，可能有一个函数正在执行一些操作，用垃圾覆盖程序中的可执行代码，然后当现在的垃圾部分运行时，它就会崩溃。

另一种方法是，如果 C++ dll 中的某些内容正在修改其运行位置下方的堆栈，那么您的代码稍后会命中该堆栈。

3) 检查 DLL 的 CPU 标志设置。 Borland 库有时会在输入时使用冲突的 CPU 标志，您可能需要在调用 DLL 之前进行保存和恢复。例如，如果您从 Delphi 调用使用 C++ 制作的 VST 插件，并且没有正确设置标志，则可能会从关闭该异常的情况下编译的 VST 插件中获得后续被零除错误。

回复收藏 0 原文