COM 故障排除应用程序死锁

发布于 2024-08-24 03:20:48 字数 1236 浏览 8 评论 0原文

我正在尝试对间歇性死锁的 COM+ 应用程序进行故障排除。上次锁定时，我能够获取 dllhost 进程的用户模式转储并使用 WinDbg 对其进行分析。检查完所有线程和锁后，一切都归结为该线程拥有的关键部分：

ChildEBP RetAddr  Args to Child              
0deefd00 7c822114 77e6bb08 000004d4 00000000 ntdll!KiFastSystemCallRet
0deefd04 77e6bb08 000004d4 00000000 0deefd48 ntdll!ZwWaitForSingleObject+0xc
0deefd74 77e6ba72 000004d4 00002710 00000000 kernel32!WaitForSingleObjectEx+0xac
0deefd88 75bb22b9 000004d4 00002710 00000000 kernel32!WaitForSingleObject+0x12
0deeffb8 77e660b9 000a5cc0 00000000 00000000 comsvcs!PingThread+0xf6
0deeffec 00000000 75bb21f1 000a5cc0 00000000 kernel32!BaseThreadStart+0x34

它正在等待的对象是一个事件：

0:016> !handle 4d4 f
Handle 000004d4
  Type          Event
  Attributes    0
  GrantedAccess 0x1f0003:
         Delete,ReadControl,WriteDac,WriteOwner,Synch
         QueryState,ModifyState
  HandleCount   2
  PointerCount  4
  Name          <none>
  No object specific information available

据我所知，该事件永远不会收到信号，导致线程挂起并保持在此过程中启动其他几个线程。有没有人对弄清楚发生了什么事情的后续步骤有任何建议？

现在，看到该方法被称为 PingThread，它是否有可能尝试 ping 已经死锁的进程中的另一个线程？

更新
这实际上是 Oracle 10.2.0.1 客户端中的一个错误。尽管如此，我仍然对如何在没有在 Oracle 错误数据库中找到错误的情况下解决这个问题的想法感兴趣。

原文

I'm trying to troubleshoot a COM+ application that deadlocks intermittently. The last time it locked up, I was able to take a usermode dump of the dllhost process and analyze it using WinDbg. After inspecting all the threads and locks, it all boils down to a critical section owned by this thread:

ChildEBP RetAddr  Args to Child              
0deefd00 7c822114 77e6bb08 000004d4 00000000 ntdll!KiFastSystemCallRet
0deefd04 77e6bb08 000004d4 00000000 0deefd48 ntdll!ZwWaitForSingleObject+0xc
0deefd74 77e6ba72 000004d4 00002710 00000000 kernel32!WaitForSingleObjectEx+0xac
0deefd88 75bb22b9 000004d4 00002710 00000000 kernel32!WaitForSingleObject+0x12
0deeffb8 77e660b9 000a5cc0 00000000 00000000 comsvcs!PingThread+0xf6
0deeffec 00000000 75bb21f1 000a5cc0 00000000 kernel32!BaseThreadStart+0x34

The object it's waiting on is an event:

0:016> !handle 4d4 f
Handle 000004d4
  Type          Event
  Attributes    0
  GrantedAccess 0x1f0003:
         Delete,ReadControl,WriteDac,WriteOwner,Synch
         QueryState,ModifyState
  HandleCount   2
  PointerCount  4
  Name          <none>
  No object specific information available

As far as I can tell, the event never gets signaled, causing the thread to hang and hold up several other threads in the process. Does anyone have any suggestions for next steps in figuring out what's going on?

Now, seeing as the method is called PingThread, is it possible that it's trying to ping another thread in the process that's already deadlocked?

UPDATE
This actually turned out to be a bug in the Oracle 10.2.0.1 client. Although, I'm still interested in ideas on how I could have figured this out without finding the bug in Oracle's bug database.

分享到QQ

分享到微博