为什么 fclose 会挂起/死锁? (视窗)

发布于 2024-11-08 20:47:47 字数 1665 浏览 5 评论 0原文

我有一个目录更改监视器进程,它从一组目录中的文件读取更新。我有另一个进程,可以对这些目录中的大量文件执行少量写入(测试程序)。想象一下大约 100 个目录,每个目录有 10 个文件,每秒修改大约 500 个文件。

运行一段时间后,目录监视器进程会在基本上跟踪文件的方法中调用 fclose() 时挂起。在此方法中,我 fopen() 文件,检查句柄是否有效,进行一些查找和读取,然后调用 fclose()。这些读取都是由进程中的同一个线程执行的。挂起后,线程永远不会继续进行。

我找不到任何关于为什么 fclose() 可能会死锁而不是返回某种错误代码的好信息。该文档确实提到了 _fclose_nolock(),但它似乎对我不可用(Visual Studio 2003)。

调试和发布版本都会发生挂起。在调试版本中,我可以看到 fclose() 调用 _free_base(),它在返回之前挂起。对 kernel32.dll 的某种调用 => ntdll.dll => KernelBase.dll => ntdll.dll 正在旋转。这是来自 ntdll.dll 的无限循环的程序集:

77CEB83F  cmp         dword ptr [edi+4Ch],0 
77CEB843  lea         esi,[ebx-8] 
77CEB846  je          77CEB85E 
77CEB848  mov         eax,dword ptr [edi+50h] 
77CEB84B  xor         dword ptr [esi],eax 
77CEB84D  mov         al,byte ptr [esi+2] 
77CEB850  xor         al,byte ptr [esi+1] 
77CEB853  xor         al,byte ptr [esi] 
77CEB855  cmp         byte ptr [esi+3],al 
77CEB858  jne         77D19A0B 
77CEB85E  mov         eax,200h 
77CEB863  cmp         word ptr [esi],ax 
77CEB866  ja          77CEB815 
77CEB868  cmp         dword ptr [edi+4Ch],0 
77CEB86C  je          77CEB87E 
77CEB86E  mov         al,byte ptr [esi+2] 
77CEB871  xor         al,byte ptr [esi+1] 
77CEB874  xor         al,byte ptr [esi] 
77CEB876  mov         byte ptr [esi+3],al 
77CEB879  mov         eax,dword ptr [edi+50h] 
77CEB87C  xor         dword ptr [esi],eax 
77CEB87E  mov         ebx,dword ptr [ebx+4] 
77CEB881  lea         eax,[edi+0C4h] 
77CEB887  cmp         ebx,eax 
77CEB889  jne         77CEB83F 

你知道这里可能发生什么吗?

I have a directory change monitor process that reads updates from files within a set of directories. I have another process that performs small writes to a lot of files to those directories (test program). Figure about 100 directories with 10 files in each, and about 500 files being modified per second.

After running for a while, the directory monitor process hangs on a call to fclose() in a method that is basically tailing the file. In this method, I fopen() the file, check that the handle is valid, do a few seeks and reads, and then call fclose(). These reads are all performed by the same thread in the process. After the hang, the thread never progresses.

I couldn't find any good information on why fclose() might deadlock instead of returning some kind of error code. The documentation does mention _fclose_nolock(), but it doesn't seem to be available to me (Visual Studio 2003).

The hang occurs for both debug and release builds. In a debug build, I can see that fclose() calls _free_base(), which hangs before returning. Some kind of call into kernel32.dll => ntdll.dll => KernelBase.dll => ntdll.dll is spinning. Here's the assembly from ntdll.dll that loops indefinitely:

77CEB83F  cmp         dword ptr [edi+4Ch],0 
77CEB843  lea         esi,[ebx-8] 
77CEB846  je          77CEB85E 
77CEB848  mov         eax,dword ptr [edi+50h] 
77CEB84B  xor         dword ptr [esi],eax 
77CEB84D  mov         al,byte ptr [esi+2] 
77CEB850  xor         al,byte ptr [esi+1] 
77CEB853  xor         al,byte ptr [esi] 
77CEB855  cmp         byte ptr [esi+3],al 
77CEB858  jne         77D19A0B 
77CEB85E  mov         eax,200h 
77CEB863  cmp         word ptr [esi],ax 
77CEB866  ja          77CEB815 
77CEB868  cmp         dword ptr [edi+4Ch],0 
77CEB86C  je          77CEB87E 
77CEB86E  mov         al,byte ptr [esi+2] 
77CEB871  xor         al,byte ptr [esi+1] 
77CEB874  xor         al,byte ptr [esi] 
77CEB876  mov         byte ptr [esi+3],al 
77CEB879  mov         eax,dword ptr [edi+50h] 
77CEB87C  xor         dword ptr [esi],eax 
77CEB87E  mov         ebx,dword ptr [ebx+4] 
77CEB881  lea         eax,[edi+0C4h] 
77CEB887  cmp         ebx,eax 
77CEB889  jne         77CEB83F 

Any ideas what might be happening here?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

牵你手 2024-11-15 20:47:47

我将其作为评论发布,但我意识到这本身就是一个答案...

根据反汇编,我猜测您已经覆盖了 ntdll 维护的一些内部堆结构,并且它永远循环遍历链表。

特别是在循环开始时,当前列表节点似乎位于 ebx 中。在循环结束时,预期的最后一个节点(或终止符,如果你喜欢的话——看起来有点像这些是循环列表,最后一个节点与第一个节点相同,指向该节点的指针位于 [ edi+4Ch])包含在 eax 中。可能 cmp ebx, eax 的结果永远不会相等,因为列表中存在由堆损坏引入的某些循环。

我认为这与锁没有任何关系,否则我们会看到一些原子指令(例如 lock cmpxchgxchg 等)或对其他同步的调用功能。

I posted this as a comment, but I realize this could be an answer in its own right...

Based on the disassembly, my guess is you've overwritten some internal heap structure maintained by ntdll, and it is looping forever iterating through a linked list.

In particular at the start of the loop, the current list node seems to be in ebx. At the end of the loop, the expected last node (or terminator, if you like -- it looks a bit like these are circular lists and the last node is the same as the first, pointer to this node being at [edi+4Ch]) is contained in eax. Probably the result of cmp ebx, eax is never equal, because there is some cycle in the list introduced by a heap corruption.

I don't think this has anything to do with locks, otherwise we would see some atomic instructions (eg. lock cmpxchg, xchg, etc.) or calls to other synchronization functions.

所谓喜欢 2024-11-15 20:47:47

我有一个与文件关闭功能相同的情况。就我而言,我通过将 close 函数嵌入其他函数体而不是拥有自己的函数来解决。

我也曾怀疑过
(1)被重复的文件名(2)Windows调度(在下一个任务开始之前文件IO没有完成。Windows调度和多线程是幕后的,所以很难验证,但我有当我尝试在循环中以 ASCII 保存许多数据时,类似的问题在这种情况下得到解决。)

我的环境,IDE:Visual Studio 2015,操作系统:Windows 7,语言:C++

I had a same case with file close function. In my case, I solved by located the close function embedded other function body instead of having own function.

I was also suspicious on
(1) the name of file being duplicated (2) Windows scheduling (file IO wasn't completed before next task treading being started. Windows scheduling and multi-threading is behind of the curtain, so it is hard to verify, but I have similar issue when I tried to save many data in ASCII in the loop. Saving on binary solved at this case.)

My environment, IDE: Visual Studio 2015, OS: Windows 7, language: C++

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文