机器在客户端断电时保留文件存在/锁定
我们的应用程序在客户端服务器 A 上运行,并使用以下方法在服务器 2008 R2 文件服务器上创建文件:
CreateFile(LockFileName,
GENERIC_READ or GENERIC_WRITE,
FILE_SHARE_READ, nil,
CREATE_ALWAYS,
FILE_FLAG_WRITE_THROUGH or FILE_FLAG_DELETE_ON_CLOSE,
0);
客户端正在测试灾难情况并关闭“服务器 A”电源并保持关闭状态。 他们报告说,我们在“服务器 B”上运行的应用程序使用相同的文件名和上述相同的代码片段失败(即文件继续存在)至少 15 分钟,直到我们相信他们浏览到包含该文件的文件夹Windows 资源管理器此时该文件将被自动删除。
有谁知道在这种情况下应该如何表现,创建服务器已经消失,是否应该释放句柄并自动删除文件?为什么查看该文件会导致其被删除?
有趣的是,在另一个据称类似的设置中,该问题不会发生。
Our app running on client server A and creates a file on the server 2008 R2 file-server using:
CreateFile(LockFileName,
GENERIC_READ or GENERIC_WRITE,
FILE_SHARE_READ, nil,
CREATE_ALWAYS,
FILE_FLAG_WRITE_THROUGH or FILE_FLAG_DELETE_ON_CLOSE,
0);
The client is testing a disaster situation and powering off 'server A' and leaving it off.
They're reporting that our app running on 'server B' using the same filename and the same code fragment above fails (ie the file continues to exist) for at least 15 minutes until, we believe, they browse to folder containing the file in Windows Explorer at which point the file is deleted automatically.
Is anyone aware of how this is supposed to behave in this situation, where the creating server has gone away, should the handles be released and the file removed automatically? And why does looking at the file cause it to delete?
Interestingly, on another supposedly similar setup the issue does not occur.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
最终是的,但不是立即。当您运行 Windows Server 2008 R2(以及 SMBv2,请注意,我假设服务器和客户端都在 Windows Server 2008 R2 上运行)时,客户端将请求一个持久文件句柄。根据 SMBv2 规范,部分3.3.6.2 和 3.3.7.1 服务器必须启动持久打开清理程序计时器(在 Windows Server 上默认设置为 16 分钟)。一旦计时器到期,服务器必须检查所有打开的句柄并关闭那些尚未被客户端回收的句柄。
当然,在您的场景中,一个悬而未决的问题是服务器是否检测到与客户端的连接丢失,因为根据您的描述,客户端(即整个服务器,而不仅仅是进程)会立即被终止。
现在假设另一个客户端正在尝试打开该文件,而持久超时仍在运行/服务器仍然认为该文件是由第一个客户端打开的。然后它应该向最初打开该文件的客户端发送一个 oplock 中断通知(第 2.2.23.1 节)。由于客户端无法响应(已被杀死),服务器将等待 oplock 中断确认超时到期(第 3.3.2.1 节,默认情况下为 35 秒),然后才授予新客户端对该文件的访问权限。
还有一件事需要注意:如果第二个客户端通过本地路径而不是通过 UNC 路径访问文件,则行为会有所不同。在这种情况下,客户端不必等待机会锁中断确认超时发生。 Windows 将立即授予他对该文件的访问权限,同时尝试向第一个客户端发送关闭请求。
这就是系统应有的行为方式。至于为什么您会遇到所描述的问题,我无法告诉您。如果您偶然发现 Win Server 2008 的文件服务器实现中的错误,我不会感到惊讶。我会尝试使用其他答案中提到的工具来解决问题(procmon 非常好),并且 Wireshark 也有很大帮助。
Eventually yes, but not immediately. As you are running Windows Server 2008 R2 (and thus SMBv2, note that I assume that both server and client are running on Windows Server 2008 R2) the client will request a durable file handle. According to the SMBv2 specification, section 3.3.6.2 and 3.3.7.1 the server must start the durable open scavenger timer (set to 16 minutes on Windows Server by default). Once the timer expires the server must examine all open handles and close those that have not been reclaimed by a client.
In your scenario of course, an open question is whether the server detects the connection loss to the client at all, as the client (i.e. the whole server, not just the process) according to your description is killed immediately.
Now assume that another client is trying to open the file while the durable timeout is still running/the server still considers the file to be open by the first client. Then it is supposed to send an oplock break notification (section 2.2.23.1) to the client that initially opened the file. As the client is unable to respond (it has been killed) the server will wait for the oplock break acknowledgment timeout to expire (section 3.3.2.1, 35 seconds by default) before it will grant the new client access to the file.
There is one other thing to note: The behavior will be different if the second client accesses the file via a local path rather than via an UNC path. In this case the client won't have to wait for the oplock break ack timeout to occur. Windows will grant him access to the file immediately while it will try to send a close request to the first client.
This is how the system is supposed to behave. As to why you are experiencing the issues described I cannot tell. I wouldn't be surprised if you'd stumbled upon a bug in the Fileserver implementation of Win Server 2008. I would try to troubleshoot the issue using the tools mentioned in the other answers (procmon is really nice) and Wireshark helps a lot too.
没有什么可说的,当创建服务器宕机时,不应该再有任何句柄。为了删除句柄,必须有一些东西来启动该删除。如果服务器突然停机,它无法删除其句柄,因此这些句柄保持打开状态。就服务器仍在运行而言,一切都很好,并且不应强行关闭任何文件句柄。
直到您实际尝试对文件句柄进行操作。突然,服务器注意到文件句柄的主机消失了,因为它尝试发起与所述主机的通信。一旦意识到这一点,文件句柄就会被强制关闭。
因此,回答你的问题,这对我来说似乎是完全可预测和预期的行为。
文件句柄在另一个环境中立即关闭的原因可能与使这些服务器保持持续通信的原因有关:某些东西不断访问远程文件。但这只是一个猜测。
更新
Sysinternals,几年前被微软收购,有一个很棒的工具,叫做进程资源管理器,允许您搜索进程的打开文件句柄。这可能有助于您确定哪些程序正在刷新文件句柄。
Sysinternals 还具有进程监视器,它允许您实时查看程序对文件句柄进行操作。这可能是解决问题的另一个有用的程序。
编辑:哦,如果你真的想玩得开心,还有 Handle 。
There is nothing to say there should no longer be any handles when the creating servers goes down. In order for a handle to be removed, something has to initiate that removal. If a server abruptly goes down, it cannot remove its handles, so those handles remain open. As far as the server still up is concerned, all is good and well, and no file handles should be forcibly closed.
Until you actually try to act upon the file handle. Suddenly, the server notices that the host of the file handle is gone, because it tries to initiate communications with said host. Once it realizes this, the file handle gets forcibly closed.
Thus, to answer your question, this seems like perfectly predictable and expected behavior to me.
The reason file handles get closed immediately in another environment probably has to do with something keeping those servers in constant communication: something is constantly accessing a remote file. That's just a guess, though.
Update
Sysinternals, bought out by Microsoft a few years ago, has a great tool called Process Explorer that allows you to search processes' open file handles. This might be of use to you in determining which program(s) are refreshing the file handle(s).
Sysinternals also has Process Monitor, which allows you to see in real-time as programs act upon file handles. This could be another useful program in troubleshooting the issue.
Edit: Oh, and if you really want to have fun, there's Handle, too.
到目前为止,这对我来说似乎不是问题。或者是一种无法在 Microsoft 编程之外处理的问题,并且处理时会产生副作用。基本上,您必须考虑客户端和服务器之间通信的小中断并优化网络流量,因此服务器不能仅仅为了查看客户端是否仍在附近而永久地与客户端交换数据包。
计算机编程必须尽可能考虑到这一点,但除非客户端应用程序正确处理,否则这样的超时是正常的。主要问题(完全没有回答)是这是否是一个问题 - 到目前为止它看起来像是“标准行为”。
服务器怎么知道?
P9可能是读取触发了超时刷新,因此 - 最后 - 这触发了定义的行为(DELETE_ON_CLOSE)。
我会暗示对文件的某些元素的任何访问都会触发此操作,但测试人员并没有这样做,只是刷新了资源管理器。
This looks so far like a non issue to me. Or one that can not be handled outside of Microsoft's programming AND one that has side effects when hnandled. Basically you ahve to account for small disruptuons of communication between client and server and optimize network traffic, so the server can not permanently exchange packets with the client just to see whether the client is still around.
Computer programming must take that into account as far as possible, but timeouts like that are normal unless the client application handles that properly. THe main question (totally not answered) is whether this is an issue at all - so far it looks like "standard behavior".
How would the server know?
P9ossibly it is the reading that triggered a refresh that timed out, so - at the end - this tirggered the defined behavior (DELETE_ON_CLOSE).
I would hint that any access to certain elements of the file would trigger this, but the tester did not do that excpt just refreshing the explorer.