优雅的文件读取,无需锁定
白板概述
下面的图像是 ImageShack 上托管的 1000 x 750 px、~130 kB JPEG
。
其他信息
我应该提到的是,每个用户(客户端的)都将直接使用 /Foo
共享进行工作。由于业务的性质,用户永远不需要同时查看或处理彼此的文档,因此这种性质的冲突永远不会成为问题。对他们来说,访问需要尽可能简单,这可能意味着将驱动器映射到各自的 /Foo/username
子目录。
此外,除了我的应用程序(内部应用程序和服务器上的应用程序)之外,没有人会直接使用 FTP 目录。
可能的实现
不幸的是,我似乎无法使用现成的工具(例如 WinSCP),因为一些其他逻辑需要与该过程紧密相关。
我认为有两种简单的方法可以让我在内部完成上述任务。
方法一(慢):
每 N 分钟遍历一次
/Foo
目录树。使用时间戳(可以通过文件复制工具伪造,但在本例中不相关)和校验和的组合来与先前的树进行比较。
将更改与场外 FTP 服务器合并。
方法二:
注册目录更改通知(例如,使用 WinAPI 中的
ReadDirectoryChangesW
或FileSystemWatcher
(如果使用 .NET)。记录更改。
每 N 分钟与场外 FTP 服务器合并一次更改。
出于性能考虑,我可能最终会使用第二种方法。
问题
由于此同步必须在工作时间内进行,因此出现的第一个问题是在异地上传阶段。
当我在异地传输文件时,我实际上需要防止用户写入文件(例如,将 CreateFile
与 FILE_SHARE_READ
或其他内容一起使用)我正在读它。他们办公室的互联网上行速度与他们将要处理的文件大小远不对称,因此他们很可能会在我仍在读取文件时返回文件并尝试修改它。
可能的解决方案
针对上述问题的最简单的解决方案是在文件系统的其他位置创建相关文件的副本,并在不受干扰的情况下传输这些“快照”。
这些人将使用的文件(有些是二进制的)相对较小,可能≤20 MB,因此复制(并因此暂时锁定)它们几乎是即时的。他们在我复制文件的同一瞬间尝试写入该文件的机会应该接近于零。
不过,这个解决方案看起来有点丑陋,而且我很确定有更好的方法来处理此类问题。
我想到的一件事是类似文件系统过滤器的东西,它负责 IRP 级别的复制和同步,有点像某些 A/V 所做的那样。然而,这对我的项目来说有点过分了。
问题
这是我第一次处理此类问题,所以也许我想得太多了。
我对干净的解决方案感兴趣,这些解决方案不需要过度复杂的实现。也许我错过了 WinAPI 中优雅地处理这个问题的一些东西?
我还没有决定用什么语言来写这篇文章,但我很喜欢:C、C++、C#、D 和 Perl。
Whiteboard Overview
The images below are 1000 x 750 px, ~130 kB JPEGs
hosted on ImageShack.
Additional Information
I should mention that each user (of the client boxes) will be working straight off the /Foo
share. Due to the nature of the business, users will never need to see or work on each other's documents concurrently, so conflicts of this nature will never be a problem. Access needs to be as simple as possible for them, which probably means mapping a drive to their respective /Foo/username
sub-directory.
Additionally, no one but my applications (in-house and the ones on the server) will be using the FTP directory directly.
Possible Implementations
Unfortunately, it doesn't look like I can use off the shelf tools such as WinSCP because some other logic needs to be intimately tied into the process.
I figure there are two simple ways for me to accomplishing the above on the in-house side.
Method one (slow):
Walk the
/Foo
directory tree every N minutes.Diff with previous tree using a combination of timestamps (can be faked by file copying tools, but not relevant in this case) and check-summation.
Merge changes with off-site FTP server.
Method two:
Register for directory change notifications (e.g., using
ReadDirectoryChangesW
from the WinAPI, orFileSystemWatcher
if using .NET).Log changes.
Merge changes with off-site FTP server every N minutes.
I'll probably end up using something like the second method due to performance considerations.
Problem
Since this synchronization must take place during business hours, the first problem that arises is during the off-site upload stage.
While I'm transferring a file off-site, I effectively need to prevent the users from writing to the file (e.g., use CreateFile
with FILE_SHARE_READ
or something) while I'm reading from it. The internet upstream speeds at their office are nowhere near symmetrical to the file sizes they'll be working with, so it's quite possible that they'll come back to the file and attempt to modify it while I'm still reading from it.
Possible Solution
The easiest solution to the above problem would be to create a copy of the file(s) in question elsewhere on the file-system and transfer those "snapshots" without disturbance.
The files (some will be binary) that these guys will be working with are relatively small, probably ≤20 MB, so copying (and therefore temporarily locking) them will be almost instant. The chances of them attempting to write to the file in the same instant that I'm copying it should be close to nil.
This solution seems kind of ugly, though, and I'm pretty sure there's a better way to handle this type of problem.
One thing that comes to mind is something like a file system filter that takes care of the replication and synchronization at the IRP level, kind of like what some A/Vs do. This is overkill for my project, however.
Questions
This is the first time that I've had to deal with this type of problem, so perhaps I'm thinking too much into it.
I'm interested in clean solutions that don't require going overboard with the complexity of their implementations. Perhaps I've missed something in the WinAPI that handles this problem gracefully?
I haven't decided what I'll be writing this in, but I'm comfortable with: C, C++, C#, D, and Perl.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
经过评论中的讨论,我的建议如下:
所以基本上你有你的驱动器:
C:
Windows 安装D:
共享存储X:
临时分区然后你将拥有以下服务:
LocalMirrorService
- 监视D:
并使用 dir 结构复制到X:
TransferClientService
- 从X:< 移动文件/code> 到 ftp 服务器,从- 还可以使用多线程来移动多个线程并监控带宽。
X:
中删除我敢打赌,这就是您想到的想法,但这似乎是一种合理的方法,只要您确实擅长应用程序开发并且能够创建一个可以处理大多数问题的可靠系统。
例如,当用户在 Microsoft Word 中编辑文档时,该文件将在共享上发生更改,并且可能会被复制到
X:
,即使用户仍在处理该文档,在 Windows 中也会出现API 查看文件句柄是否仍被用户打开,如果是这种情况,那么您可以创建一个钩子来监视用户何时实际关闭文档,以便完成所有编辑,然后您可以迁移到驱动器X:。
也就是说,如果用户正在处理文档并且 PC 由于某种原因崩溃,则文档/文件句柄可能无法释放,直到稍后打开文档,从而导致问题。
After the discussions in the comments my proposal would be like so:
So basically you have your drives:
C:
Windows InstallationD:
Share StorageX:
Temporary PartitionThen you would have following services:
LocalMirrorService
- WatchesD:
and copies toX:
with the dir structureTransferClientService
- Moves files fromX:
to ftp server, removes fromX:
I would bet that this is the idea that you had in mind but this seems like a reasonable approach as long as your really good with your application development and your able create a solid system that would handle most issues.
When a user edits a document in Microsoft Word for instance, the file will change on the share and it may be copied to
X:
even though the user is still working on it, within windows there would be an API see if the file handle is still opened by the user, if this is the case then you can just create a hook to watch when the user actually closes the document so that all there edits are complete, then you can migrate to driveX:
.this being said that if the user is working on the document and there PC crashes for some reason, the document / files handle may not get released until the document is opened at a later date, thus causing issues.
对于处于类似情况的任何人(我假设提出问题的人很久以前就实现了解决方案),我建议实施 rsync。
rsync.net 的 Windows 备份代理 执行方法 1 中描述的操作,并且可以也作为服务运行(请参阅“高级用法”)。虽然我不完全确定它是否具有内置带宽限制...
确实具有带宽限制的另一个(可能更好)解决方案是重复。它还可以正确备份当前打开或锁定的文件。使用 SharpRSync(一种托管 rsync 实现)作为其后端。也是开源的,这总是一个优点!
For anyone in a similar situation (I'm assuming the person who asked the question implemented a solution long ago), I would suggest an implementation of rsync.
rsync.net's Windows Backup Agent does what is described in method 1, and can be run as a service as well (see "Advanced Usage"). Though I'm not entirely sure if it has built-in bandwidth limiting...
Another (probably better) solution that does have bandwidth limiting is Duplicati. It also properly backs up currently-open or locked files. Uses SharpRSync, a managed rsync implementation, for its backend. Open source too, which is always a plus!