优雅的文件读取,无需锁定

发布于 2024-10-18 08:29:27 字数 1803 浏览 0 评论 0原文

白板概述

下面的图像是 ImageShack 上托管的 1000 x 750 px、~130 kB JPEG


其他信息

我应该提到的是,每个用户(客户端的)都将直接使用 /Foo 共享进行工作。由于业务的性质,用户永远不需要同时查看或处理彼此的文档,因此这种性质的冲突永远不会成为问题。对他们来说,访问需要尽可能简单,这可能意味着将驱动器映射到各自的 /Foo/username 子目录。

此外,除了我的应用程序(内部应用程序和服务器上的应用程序)之外,没有人会直接使用 FTP 目录。


可能的实现

不幸的是,我似乎无法使用现成的工具(例如 WinSCP),因为一些其他逻辑需要与该过程紧密相关。

我认为有两种简单的方法可以让我在内部完成上述任务。

  1. 方法一(慢):

    • 每 N 分钟遍历一次 /Foo 目录树。

    • 使用时间戳(可以通过文件复制工具伪造,但在本例中不相关)和校验和的组合来与先前的树进行比较。

    • 将更改与场外 FTP 服务器合并。

  2. 方法二:

    • 注册目录更改通知(例如,使用 WinAPI 中的 ReadDirectoryChangesWFileSystemWatcher(如果使用 .NET)。

    • 记录更改。

    • 每 N 分钟与场外 FTP 服务器合并一次更改。

出于性能考虑,我可能最终会使用第二种方法。


问题

由于此同步必须在工作时间内进行,因此出现的第一个问题是在异地上传阶段。

当我在异地传输文件时,我实际上需要防止用户写入文件(例如,将 CreateFileFILE_SHARE_READ 或其他内容一起使用)我正在读它。他们办公室的互联网上行速度与他们将要处理的文件大小远不对称,因此他们很可能会在我仍在读取文件时返回文件并尝试修改它。


可能的解决方案

针对上述问题的最简单的解决方案是在文件系统的其他位置创建相关文件的副本,并在不受干扰的情况下传输这些“快照”。

这些人将使用的文件(有些是二进制的)相对较小,可能≤20 MB,因此复制(并因此暂时锁定)它们几乎是即时的。他们在我复制文件的同一瞬间尝试写入该文件的机会应该接近于零。

不过,这个解决方案看起来有点丑陋,而且我很确定有更好的方法来处理此类问题。

我想到的一件事是类似文件系统过滤器的东西,它负责 IRP 级别的复制和同步,有点像某些 A/V 所做的那样。然而,这对我的项目来说有点过分了。


问题

这是我第一次处理此类问题,所以也许我想得太多了。

我对干净的解决方案感兴趣,这些解决方案不需要过度复杂的实现。也许我错过了 WinAPI 中优雅地处理这个问题的一些东西?

我还没有决定用什么语言来写这篇文章,但我很喜欢:C、C++、C#、D 和 Perl。

Whiteboard Overview

The images below are 1000 x 750 px, ~130 kB JPEGs hosted on ImageShack.


Additional Information

I should mention that each user (of the client boxes) will be working straight off the /Foo share. Due to the nature of the business, users will never need to see or work on each other's documents concurrently, so conflicts of this nature will never be a problem. Access needs to be as simple as possible for them, which probably means mapping a drive to their respective /Foo/username sub-directory.

Additionally, no one but my applications (in-house and the ones on the server) will be using the FTP directory directly.


Possible Implementations

Unfortunately, it doesn't look like I can use off the shelf tools such as WinSCP because some other logic needs to be intimately tied into the process.

I figure there are two simple ways for me to accomplishing the above on the in-house side.

  1. Method one (slow):

    • Walk the /Foo directory tree every N minutes.

    • Diff with previous tree using a combination of timestamps (can be faked by file copying tools, but not relevant in this case) and check-summation.

    • Merge changes with off-site FTP server.

  2. Method two:

    • Register for directory change notifications (e.g., using ReadDirectoryChangesW from the WinAPI, or FileSystemWatcher if using .NET).

    • Log changes.

    • Merge changes with off-site FTP server every N minutes.

I'll probably end up using something like the second method due to performance considerations.


Problem

Since this synchronization must take place during business hours, the first problem that arises is during the off-site upload stage.

While I'm transferring a file off-site, I effectively need to prevent the users from writing to the file (e.g., use CreateFile with FILE_SHARE_READ or something) while I'm reading from it. The internet upstream speeds at their office are nowhere near symmetrical to the file sizes they'll be working with, so it's quite possible that they'll come back to the file and attempt to modify it while I'm still reading from it.


Possible Solution

The easiest solution to the above problem would be to create a copy of the file(s) in question elsewhere on the file-system and transfer those "snapshots" without disturbance.

The files (some will be binary) that these guys will be working with are relatively small, probably ≤20 MB, so copying (and therefore temporarily locking) them will be almost instant. The chances of them attempting to write to the file in the same instant that I'm copying it should be close to nil.

This solution seems kind of ugly, though, and I'm pretty sure there's a better way to handle this type of problem.

One thing that comes to mind is something like a file system filter that takes care of the replication and synchronization at the IRP level, kind of like what some A/Vs do. This is overkill for my project, however.


Questions

This is the first time that I've had to deal with this type of problem, so perhaps I'm thinking too much into it.

I'm interested in clean solutions that don't require going overboard with the complexity of their implementations. Perhaps I've missed something in the WinAPI that handles this problem gracefully?

I haven't decided what I'll be writing this in, but I'm comfortable with: C, C++, C#, D, and Perl.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

物价感观 2024-10-25 08:29:27

经过评论中的讨论,我的建议如下:

  • 在数据服务器上创建一个分区,大约 5GB 以确保安全。
  • 使用 C# 创建一个 Windows 服务项目来监视您的数据驱动程序/位置。
  • 修改文件后,创建该文件的本地副本,其中包含相同的目录结构并放置在新分区上。
  • 创建另一个服务来执行以下操作:
    • 监控带宽使用情况
    • 监控临时分区上的文件创建情况。
    • 一次将多个文件传输(使用线程)到您的 FTP 服务器,遵守当前的带宽使用情况,根据网络流量减少/增加工作线程。
    • 从已成功传输的分区中删除文件。

所以基本上你有你的驱动器:

  • C: Windows 安装
  • D: 共享存储
  • X: 临时分区

然后你将拥有以下服务:

  • LocalMirrorService - 监视 D: 并使用 dir 结构复制到 X:
  • TransferClientService - 从 X:< 移动文件/code> 到 ftp 服务器,从 X: 中删除
    • 还可以使用多线程来移动多个线程并监控带宽。

我敢打赌,这就是您想到的想法,但这似乎是一种合理的方法,只要您确实擅长应用程序开发并且能够创建一个可以处理大多数问题的可靠系统。

例如,当用户在 Microsoft Word 中编辑文档时,该文件将在共享上发生更改,并且可能会被复制到 X:,即使用户仍在处理该文档,在 Windows 中也会出现API 查看文件句柄是否仍被用户打开,如果是这种情况,那么您可以创建一个钩子来监视用户何时实际关闭文档,以便完成所有编辑,然后您可以迁移到驱动器 X:。

也就是说,如果用户正在处理文档并且 PC 由于某种原因崩溃,则文档/文件句柄可能无法释放,直到稍后打开文档,从而导致问题。

After the discussions in the comments my proposal would be like so:

  • Create a partition on your data server, about 5GB for safety.
  • Create a Windows Service Project in C# that would monitor your data driver / location.
  • When a file has been modified then create a local copy of the file, containing the same directory structure and place on the new partition.
  • Create another service that would do the following:
    • Monitor Bandwidth Usages
    • Monitor file creations on the temporary partition.
    • Transfer several files at a time (Use Threading) to your FTP Server, abiding by the bandwidth usages at the current time, decreasing / increasing the worker threads depending on network traffic.
    • Remove the files from the partition that have successfully transferred.

So basically you have your drives:

  • C: Windows Installation
  • D: Share Storage
  • X: Temporary Partition

Then you would have following services:

  • LocalMirrorService - Watches D: and copies to X: with the dir structure
  • TransferClientService - Moves files from X: to ftp server, removes from X:
    • Also use multi threads to move multiples and monitors bandwidth.

I would bet that this is the idea that you had in mind but this seems like a reasonable approach as long as your really good with your application development and your able create a solid system that would handle most issues.

When a user edits a document in Microsoft Word for instance, the file will change on the share and it may be copied to X: even though the user is still working on it, within windows there would be an API see if the file handle is still opened by the user, if this is the case then you can just create a hook to watch when the user actually closes the document so that all there edits are complete, then you can migrate to drive X:.

this being said that if the user is working on the document and there PC crashes for some reason, the document / files handle may not get released until the document is opened at a later date, thus causing issues.

场罚期间 2024-10-25 08:29:27

对于处于类似情况的任何人(我假设提出问题的人很久以前就实现了解决方案),我建议实施 rsync

rsync.net 的 Windows 备份代理 执行方法 1 中描述的操作,并且可以也作为服务运行(请参阅“高级用法”)。虽然我不完全确定它是否具有内置带宽限制...

确实具有带宽限制的另一个(可能更好)解决方案是重复。它还可以正确备份当前打开或锁定的文件。使用 SharpRSync(一种托管 rsync 实现)作为其后端。也是开源的,这总是一个优点!

For anyone in a similar situation (I'm assuming the person who asked the question implemented a solution long ago), I would suggest an implementation of rsync.

rsync.net's Windows Backup Agent does what is described in method 1, and can be run as a service as well (see "Advanced Usage"). Though I'm not entirely sure if it has built-in bandwidth limiting...

Another (probably better) solution that does have bandwidth limiting is Duplicati. It also properly backs up currently-open or locked files. Uses SharpRSync, a managed rsync implementation, for its backend. Open source too, which is always a plus!

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文