是什么导致电脑崩溃前将 nul 字符写入文件?
我们有一个应用程序在数千台相同的机器上运行。相同的操作系统、相同的硬件、相同的应用程序安装。在极少数情况下,机器会锁住。 Alt tab、ctrl-alt-del、应用程序都没有响应。检查我们的应用程序日志文件后,一系列空字符被写入末尾,作为崩溃前的最后数据。
我希望利用这个事实作为调试锁定的方法。我的猜测是,写入的空字符数相当于我需要为日志语句分配的空间,但内容从未实际写入磁盘。我还猜测发生了磁盘 IO 问题,阻止了写入,当然还有操作系统锁定。我无法证实这一点。所以我想我的问题是 - 您是否见过这样的情况,它是如何发生的,以及您如何解决它?
We have an application running on several thousand identical machines. Same OS, same hardware, same application installation. On very rare occasions, the machine locks up. Alt tab, ctrl-alt-del, application are all unresponsive. After inspecting our applications log file, a series of null characters are written to the end, as the last data before the crash.
I'm hoping to use this fact as a means to debug the lockup. My guess is that the number of null characters written is equivalent to the space I need to allocate for my log statement, but the content is never actually written to disk. I'm also guessing a disk IO problem occurred, prevent the write, and of course, the OS lockup. I can't confirm of this. So I guess my question is - have you ever seen a condition like this, how did it occur, and how might you go about troubleshooting it?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
NTFS 不记录数据(仅元数据),因此可能会发生类似的情况。原因只是在崩溃/挂起时,提交了元数据(文件大小、数据块分配),但没有提交数据(数据块内容)。不幸的是,这是 NTFS 的正常行为,不会让您深入了解导致挂起的问题。
所以答案是:在“正确”的时间发生崩溃可能会导致这种情况。
顺便说一句:FAT/FAT32 当然也会发生同样的情况。
NTFS does not journal data (only metadata), so things like that can happen. The reason why is just that at the time of the crash/hang, the metadata (file size, data block allocation) was committed, but not the data (data block contents). Unfortunately this is normal behavior with NTFS and will not give you any insight into the problem causing the hang.
So the answer is: a crash at the "right" time can cause this.
BTW: The same thing can of course happen with FAT/FAT32.
我见过这种事情发生,我认为你正在寻找正确的大方向。
当发生这种情况时,我假设您能够查明确切的硬件?失败后我建议运行 memtest (http://www.memtest.org/)。
我见过电源、坏磁盘控制器等的这种情况。你可能会疯狂地试图追踪它们。
看来你正在以正确的方式解决这个问题 - 看看你是否能找到一种方法来迫使问题更快地发生,当它发生时运行 memtest,运行 chkdsk /R (在此期间检查事件日志中的控制器错误)
任何你有机会附加一个内核调试器吗?
有可能生成 %SystemRoot%\memory.dmp 吗?
I've seen this type of thing happen, I think you're looking in the right general direction.
When this happens I assume you're able to pinpoint the exact hardware? after failure I'd recommend running a memtest (http://www.memtest.org/).
I've seen this sort of thing with power supplies, bad disk controllers, etc. You can go insane trying to track them down.
Seems like you're going about this the right way - see if you can find a way to force the problem to happen more quickly, when it happens run the memtest, run chkdsk /R (check the eventlog for controller errors during this)
any chance you could get a kernel debugger attached?
any chance %SystemRoot%\memory.dmp was produced?