为什么 fclose 会挂起/死锁? (视窗)
我有一个目录更改监视器进程,它从一组目录中的文件读取更新。我有另一个进程,可以对这些目录中的大量文件执行少量写入(测试程序)。想象一下大约 100 个目录,每个目录有 10 个文件,每秒修改大约 500 个文件。
运行一段时间后,目录监视器进程会在基本上跟踪文件的方法中调用 fclose()
时挂起。在此方法中,我 fopen()
文件,检查句柄是否有效,进行一些查找和读取,然后调用 fclose()
。这些读取都是由进程中的同一个线程执行的。挂起后,线程永远不会继续进行。
我找不到任何关于为什么 fclose()
可能会死锁而不是返回某种错误代码的好信息。该文档确实提到了 _fclose_nolock()
,但它似乎对我不可用(Visual Studio 2003)。
调试和发布版本都会发生挂起。在调试版本中,我可以看到 fclose()
调用 _free_base()
,它在返回之前挂起。对 kernel32.dll 的某种调用 => ntdll.dll => KernelBase.dll => ntdll.dll 正在旋转。这是来自 ntdll.dll 的无限循环的程序集:
77CEB83F cmp dword ptr [edi+4Ch],0
77CEB843 lea esi,[ebx-8]
77CEB846 je 77CEB85E
77CEB848 mov eax,dword ptr [edi+50h]
77CEB84B xor dword ptr [esi],eax
77CEB84D mov al,byte ptr [esi+2]
77CEB850 xor al,byte ptr [esi+1]
77CEB853 xor al,byte ptr [esi]
77CEB855 cmp byte ptr [esi+3],al
77CEB858 jne 77D19A0B
77CEB85E mov eax,200h
77CEB863 cmp word ptr [esi],ax
77CEB866 ja 77CEB815
77CEB868 cmp dword ptr [edi+4Ch],0
77CEB86C je 77CEB87E
77CEB86E mov al,byte ptr [esi+2]
77CEB871 xor al,byte ptr [esi+1]
77CEB874 xor al,byte ptr [esi]
77CEB876 mov byte ptr [esi+3],al
77CEB879 mov eax,dword ptr [edi+50h]
77CEB87C xor dword ptr [esi],eax
77CEB87E mov ebx,dword ptr [ebx+4]
77CEB881 lea eax,[edi+0C4h]
77CEB887 cmp ebx,eax
77CEB889 jne 77CEB83F
你知道这里可能发生什么吗?
I have a directory change monitor process that reads updates from files within a set of directories. I have another process that performs small writes to a lot of files to those directories (test program). Figure about 100 directories with 10 files in each, and about 500 files being modified per second.
After running for a while, the directory monitor process hangs on a call to fclose()
in a method that is basically tailing the file. In this method, I fopen()
the file, check that the handle is valid, do a few seeks and reads, and then call fclose()
. These reads are all performed by the same thread in the process. After the hang, the thread never progresses.
I couldn't find any good information on why fclose()
might deadlock instead of returning some kind of error code. The documentation does mention _fclose_nolock()
, but it doesn't seem to be available to me (Visual Studio 2003).
The hang occurs for both debug and release builds. In a debug build, I can see that fclose()
calls _free_base()
, which hangs before returning. Some kind of call into kernel32.dll => ntdll.dll => KernelBase.dll => ntdll.dll is spinning. Here's the assembly from ntdll.dll that loops indefinitely:
77CEB83F cmp dword ptr [edi+4Ch],0
77CEB843 lea esi,[ebx-8]
77CEB846 je 77CEB85E
77CEB848 mov eax,dword ptr [edi+50h]
77CEB84B xor dword ptr [esi],eax
77CEB84D mov al,byte ptr [esi+2]
77CEB850 xor al,byte ptr [esi+1]
77CEB853 xor al,byte ptr [esi]
77CEB855 cmp byte ptr [esi+3],al
77CEB858 jne 77D19A0B
77CEB85E mov eax,200h
77CEB863 cmp word ptr [esi],ax
77CEB866 ja 77CEB815
77CEB868 cmp dword ptr [edi+4Ch],0
77CEB86C je 77CEB87E
77CEB86E mov al,byte ptr [esi+2]
77CEB871 xor al,byte ptr [esi+1]
77CEB874 xor al,byte ptr [esi]
77CEB876 mov byte ptr [esi+3],al
77CEB879 mov eax,dword ptr [edi+50h]
77CEB87C xor dword ptr [esi],eax
77CEB87E mov ebx,dword ptr [ebx+4]
77CEB881 lea eax,[edi+0C4h]
77CEB887 cmp ebx,eax
77CEB889 jne 77CEB83F
Any ideas what might be happening here?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
我将其作为评论发布,但我意识到这本身就是一个答案...
根据反汇编,我猜测您已经覆盖了 ntdll 维护的一些内部堆结构,并且它永远循环遍历链表。
特别是在循环开始时,当前列表节点似乎位于 ebx 中。在循环结束时,预期的最后一个节点(或终止符,如果你喜欢的话——看起来有点像这些是循环列表,最后一个节点与第一个节点相同,指向该节点的指针位于
[ edi+4Ch]
)包含在eax
中。可能cmp ebx, eax
的结果永远不会相等,因为列表中存在由堆损坏引入的某些循环。我认为这与锁没有任何关系,否则我们会看到一些原子指令(例如
lock cmpxchg
、xchg
等)或对其他同步的调用功能。I posted this as a comment, but I realize this could be an answer in its own right...
Based on the disassembly, my guess is you've overwritten some internal heap structure maintained by
ntdll
, and it is looping forever iterating through a linked list.In particular at the start of the loop, the current list node seems to be in
ebx
. At the end of the loop, the expected last node (or terminator, if you like -- it looks a bit like these are circular lists and the last node is the same as the first, pointer to this node being at[edi+4Ch]
) is contained ineax
. Probably the result ofcmp ebx, eax
is never equal, because there is some cycle in the list introduced by a heap corruption.I don't think this has anything to do with locks, otherwise we would see some atomic instructions (eg.
lock cmpxchg
,xchg
, etc.) or calls to other synchronization functions.我有一个与文件关闭功能相同的情况。就我而言,我通过将 close 函数嵌入其他函数体而不是拥有自己的函数来解决。
我也曾怀疑过
(1)被重复的文件名(2)Windows调度(在下一个任务开始之前文件IO没有完成。Windows调度和多线程是幕后的,所以很难验证,但我有当我尝试在循环中以 ASCII 保存许多数据时,类似的问题在这种情况下得到解决。)
我的环境,IDE:Visual Studio 2015,操作系统:Windows 7,语言:C++
I had a same case with file close function. In my case, I solved by located the close function embedded other function body instead of having own function.
I was also suspicious on
(1) the name of file being duplicated (2) Windows scheduling (file IO wasn't completed before next task treading being started. Windows scheduling and multi-threading is behind of the curtain, so it is hard to verify, but I have similar issue when I tried to save many data in ASCII in the loop. Saving on binary solved at this case.)
My environment, IDE: Visual Studio 2015, OS: Windows 7, language: C++