我应该如何轮询大量文件的更改?
我想轮询文件系统以查找任何更改、添加或删除的文件或子目录。所有变化都应该被快速检测到,但不会给机器带来压力。操作系统为Windows>=Vista,观察到的部分是本地目录。
通常,我会求助于 FileSystemWatcher,但这会导致尝试监视同一位置的其他程序(尤其是 Windows 资源管理器)出现问题。另外,我听说即使对于本地文件夹和缓冲区较大,FSW 也不是很可靠。
我遇到的主要问题是文件和目录的数量可能非常大(猜测是 7 位数字)。仅仅每秒对所有文件进行一次检查确实会明显影响我的机器。
我的下一个想法是每秒检查整个树的不同部分,以减少总体影响,并可能添加一种启发式方法,例如检查连续快速更改的文件。
我想知道此类问题是否有模式,或者是否有人有过这种情况的经验。
I'd like to poll the file system for any changed, added or removed files or sub-directories. All changes should be detected quickly but without putting pressure on the machine. The OS is Windows >= Vista, the observed part is a local directory.
Typically, I would resort to a FileSystemWatcher, but this led to problems with other programs that tried to watch the same spot (prominently, Windows Explorer). Also, I heard that FSW is not really reliable even for local folders and with a large buffer.
The main issue I have is that the number of files and directories may be very large (guess 7-digits). Simply running a check for all files every second did noticeably affect my machine.
My next idea was to check different parts of the whole tree per second to reduce the overall impact, and possibly add a kind of heuristic, like checking files that get changed frequently in quicker succession.
I'm wondering if there are patterns for this kind of problem, or if anyone has experiences with this situation.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
我们使用 C# 实现了类似的功能。 FileSystemWatcher 对于大型目录树效率很低。
我们的替代方案是使用
FSNodes
,这是我们通过以下 Windows API 调用创建的结构:我们所做的是静态处理。我们在磁盘上保存一棵元数据树,并将存储的目录树与加载的目录树进行比较,搜索修改的目录树(基于其时间戳(更快)或文件哈希)。此外,我们还可以管理删除、添加和移动,甚至移动修改的文件(也基于文件哈希)。
这个实现与每个 POLL_TIME 执行它的守护进程混合在一起,对我们来说是有效的。希望有帮助。
We have implemented a similar feature, using C#. The FileSystemWatcher was inefficient with large directory trees.
Our alternative, was using
FSNodes
, an struct created by us, using the following Windows API calls:What we do is a static processing. We save a metadata tree on disk and compare the stored directory tree vs the loaded one, searching modified (based on its timestamp (faster), or on the file hash). Also, we can manage deleted, added and moved, even moved-modified files (also based on the file hash).
This implementation mixed with a daemon that executed it each POLL_TIME, was valid for us. Hope it helps.
我最好的猜测是使用 USN 日志,如果它是本地计算机,您具有管理员权限并且分区是 NTFS。 USN 期刊极其快速且可靠。这是一个很长的主题,这个链接解释了一切:
http://www.microsoft.com/msj/0999/journal/journal.aspx
My best guess would be to use USN journal if it is a local machine, you have administrator privileges and partitions are NTFS. USN journal is extremely fast and reliable. It is a long topis and this link explains everything:
http://www.microsoft.com/msj/0999/journal/journal.aspx
对于 *nix 环境,您可以使用 inotify https://github.com/rvoicilas/inotify-tools/ wiki/,在我有限的研究中效果很好。可能有一个与 Windows 一起使用的版本,我对此缺乏经验...快速谷歌搜索让我找到了一个名为 jnotify http://jnotify.sourceforge.net/ 宣传它可以在 Windows 上运行,因此可能值得尝试。
For *nix environments you can use inotify https://github.com/rvoicilas/inotify-tools/wiki/, which worked great in my limited research on it. There might be a version out there that work with windows which I have less experience with ... quick googling led me to a java clone called jnotify http://jnotify.sourceforge.net/ which is advertised to work on windows so it might be worth trying.