如何仅检测卷上已删除、更改和创建的文件?
我需要知道是否有一种简单的方法可以仅检测在 NTFS 卷上删除、修改或创建的文件。
我用C++编写了一个异地备份程序。第一次备份后,我检查每个文件的存档位以查看是否进行了任何更改,并仅备份已更改的文件。此外,它还从 VSS 快照进行备份,以防止文件锁定。
这似乎在大多数文件系统上工作得很好,但对于一些具有大量文件和目录的系统来说,这个过程花费的时间太长,并且备份通常需要一天以上的时间才能完成备份。
我尝试使用更改日志轻松检测 NTFS 卷上所做的更改,但更改日志会显示大量记录,其中大多数与创建和销毁的小型临时文件有关。另外,我可以获得文件名、文件参考号和父文件参考号,但无法获得完整的文件路径。父文件参考号应该以某种方式为您提供父目录路径。
编辑:这需要每天运行,因此在每次扫描开始时,它应该只记录自上次扫描以来发生的更改。或者至少,应该有一种方式来表示自某时间和日期以来的变化。
I need to know if there is an easy way of detecting only the files that were deleted, modified or created on an NTFS volume.
I have written a program for offsite backup in C++. After the first backup, I check the archive bit of each file to see if there was any change made, and back up only the files that were changed. Also, it backs up from the VSS snapshot in order to prevent file locks.
This seems to work fine on most file systems, but for some with lots of files and directories, this process takes too long and often the backup takes more than a day to finish backing up.
I tried using the change journal to easily detect changes made on an NTFS volume, but the change journal would show a lot of records, most of them relating to small temporary files created and destroyed. Also, I could the file name, file reference number, and the parent file reference number, but I could not get the full file path. The parent file reference number is somehow supposed to give you the parent directory path.
EDIT: This needs to run everyday, so at the beginning of every scan, it should record only the changes that took place since the last scan. Or atleast, there should be a way to say changes since so and so time and date.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(4)
您可以使用 FSCTL_ENUM_USN_DATA 枚举卷上的所有文件。这是一个快速的过程(即使在一台非常旧的机器上,我的测试每秒返回的记录也超过 6000 条,更典型的是 20000 条以上),并且仅包含当前存在的文件。
返回的数据包括文件标志和 USN,因此您可以按照您喜欢的方式检查更改。
您仍然需要通过将父 ID 与目录的文件 ID 进行匹配来计算出文件的完整路径。一种方法是使用足够大的缓冲区来同时保存所有文件记录,并搜索记录以查找需要备份的每个文件的匹配父文件。对于大容量,您可能需要将目录记录处理为更有效的数据结构,可能是哈希表。
或者,您可以根据需要读取/重新读取父目录的记录。这会降低效率,但性能可能仍然令人满意,具体取决于备份的文件数量。 Windows 似乎确实缓存了 FSCTL_ENUM_USN_DATA 返回的数据。
该程序在 C 卷中搜索名为 test.txt 的文件,并返回有关找到的所有文件及其父目录的信息。
其他说明
正如评论中所讨论的,在 Windows 7 之后的 Windows 版本上,您可能需要将
MFT_ENUM_DATA
替换为MFT_ENUM_DATA_V0
。(这也可能取决于您使用的编译器和 SDK。)我正在打印 64 位文件参考号,就好像它们是32 位。这只是我的一个错误。也许在生产代码中您无论如何都不会打印它们,但仅供参考。
You can enumerate all the files on a volume using FSCTL_ENUM_USN_DATA. This is a fast process (my tests returned better than 6000 records per second even on a very old machine, and 20000+ is more typical) and only includes files that currently exist.
The data returned includes the file flags as well as the USNs so you could check for changes whichever way you prefer.
You will still need to work out the full path for the files by matching the parent IDs with the file IDs of the directories. One approach would be to use a buffer large enough to hold all the file records simultaneously, and search through the records to find the matching parent for each file you need to back up. For large volumes you would probably need to process the directory records into a more efficient data structure, perhaps a hash table.
Alternately, you can read/reread the records for the parent directories as needed. This would be less efficient, but the performance might still be satisfactory depending on how many files are being backed up. Windows does appear to cache the data returned by FSCTL_ENUM_USN_DATA.
This program searches the C volume for files named test.txt and returns information about any files found, as well as about their parent directories.
Additional notes
As discussed in the comments, you may need to replace
MFT_ENUM_DATA
withMFT_ENUM_DATA_V0
on versions of Windows later than Windows 7. (This may also depend on what compiler and SDK you are using.)I'm printing the 64-bit file reference numbers as if they were 32-bit. That was just a mistake on my part. Probably in production code you won't be printing them anyway, but FYI.
改变日记是你最好的选择。您可以使用文件参考号来匹配文件创建/删除对,从而忽略临时文件,而无需进一步处理它们。
我认为您必须扫描主文件表才能理解 ParentFileReferenceNumber。当然,执行此操作时您只需要跟踪目录,并使用允许您快速查找信息的数据结构,因此您只需要扫描 MFT 一次。
The change journal is your best bet. You can use the file reference numbers to match file creation/deletion pairs and thus ignore temporary files, without having to process them any further.
I think you have to scan the Master File Table to make sense of ParentFileReferenceNumber. Of course you only need to keep track of directories when doing this, and use a data structure that will allow you to quickly lookup the information, so you only need to scan the MFT once.
您可以使用 ReadDirectoryChanges 和周围的窗口API。
You can use ReadDirectoryChanges and surrounding windows API.
我知道如何在java中实现这一点。如果您在 C++ 中实现 Java 代码,它将对您有所帮助。
在 Java 中,您可以使用 Jnotify API 来实现此目的。它还会查找子目录中的更改。
I know how to achieve this in java. It will help you if you implement Java code inside C++.
In Java you can achieve this using
Jnotify
API.It looks for changes in sub-directory also.