当前位置：文江博客话题详情

如何比较两个卷并列出修改的文件？

发布于 2024-09-16 12:31:07 字数 281 浏览 14 评论 0原文

我有2个硬盘卷（一个是另一个的备份映像），我想比较这些卷并列出所有修改的文件，以便用户可以选择他/她想要回滚的文件。

目前，我正在递归新卷并将每个文件的时间戳与旧卷的文件进行比较（如果它们位于旧卷中）。显然这是一种错误的做法。这既耗时又错误！

有没有一种有效的方法来做到这一点？

编辑：
- 我正在使用 FindFirstFile 并且喜欢递归卷，并收集每个文件的信息（不是很慢，只需几分钟）。
- 我正在使用卷影复制进行备份。
- 备份卷是远程的，因此我无法连续监控实际卷。

原文

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

剩一世无双 2024-09-23 12:31:07

这部分取决于两卷的复制方式；如果从文件系统的角度来看它们是“真实”副本（例如卷影副本或其他块级副本），您可以对 USN 做一些棘手的小事情，这是其他人建议您研究的通用技术。您可能想查看类似 FSCTL_READ_FILE_USN_DATA 的 API ，例如。该 API 可以让您比较文件的两个不同副本（再次假设它们是同一个文件，具有来自块级备份的相同文件参考号）。如果您想在很大程度上实现无状态，那么这个 API 和类似的 API 将会对您有很大帮助。我的算法看起来像这样：

foreach( file in backup_volume ) {
    file_still_exists = try_open_by_id( modified_volume )
    if (file_still_exists) {
        usn_result = compare_usn_values_of_files( file, file_in_modified_volume )
        if (usn_result == equal_to) {
           // file hasn't changed at all
        } else {
           // file has changed (somehow)
        }
    } else {
        // file was deleted (possibly deleted and recreated)
    }
}
// we still don't know about files new in modified_volume

综上所述，我的经验让我相信这将比我即兴解释所暗示的更复杂。不过，这可能是一个很好的起点。

如果卷不是彼此的块级副本，那么比较 USN 编号和文件 ID 将非常困难（如果不是不可能的话）。相反，您很可能会按文件名进行操作，如果不打开每个文件，这将是很困难的（如果不是不可能的话）（时间可以由应用程序修改，大小和时间在 findfirst/next 查询中可能会过时，并且您必须处理删除然后重新创建的案例、重命名案例等）。

因此，了解您对环境的控制程度非常重要。

Part of this depends upon how the two volumes are duplicated; if they are 'true' copies from the file system's point of view (e.g. shadow copies or other block-level copies), you can do a few tricky little things with respect to USN, which is the general technology others are suggesting you look into. You might want to look at an API like FSCTL_READ_FILE_USN_DATA, for example. That API will let you compare two different copies of a file (again, assuming they are the same file with the same file reference number from block-level backups). If you wanted to be largely stateless, this and similar APIs would help you a lot here. My algorithm would look something like this:

foreach( file in backup_volume ) {
    file_still_exists = try_open_by_id( modified_volume )
    if (file_still_exists) {
        usn_result = compare_usn_values_of_files( file, file_in_modified_volume )
        if (usn_result == equal_to) {
           // file hasn't changed at all
        } else {
           // file has changed (somehow)
        }
    } else {
        // file was deleted (possibly deleted and recreated)
    }
}
// we still don't know about files new in modified_volume

All of that said, my experience leads me to believe that this will be more complicated than my off-the-cuff explanation hints at. This might be a good starting place, though.

If the volumes are not block-level copies of one another, then it will be very difficult to compare USN numbers and file IDs, if not impossible. Instead, you may very well be going by file name, which will be difficult if not impossible to do without opening every file (times can be modified by apps, sizes and times can be out of date in the findfirst/next queries, and you have to handle deleted-then-recreated cases, rename cases, etc.).

So knowing how much control you have over the environment is pretty important.

回复收藏 0 原文

梅窗月明清似水 2024-09-23 12:31:07

我不会等到更改发生后，然后扫描整个磁盘来查找已更改的（通常很少）文件，而是设置一个程序来使用 ReadDirectoryChangesW 监控发生的变化。这将使您能够以最少的麻烦和麻烦来构建文件列表。

回复收藏 0 原文

感性不性感 2024-09-23 12:31:07

假设您没有将新卷上的每个文件与快照中的每个文件进行比较，这是您可以做到的唯一方法。在不查看所有文件的情况下，如何找到哪些文件未被修改？

回复收藏 0 原文

苏辞 2024-09-23 12:31:07

我不是 Windows 程序员。
但是，您不应该有 stat 函数来检索文件的修改时间。
根据修改时间对文件进行排序。
修改时间大于上次备份时间的文件是您感兴趣的文件。

您第一次可以迭代备份卷，从您感兴趣的集合中找出最大修改时间和创建时间。
我假设备份卷中感兴趣的目录没有被修改。

回复收藏 0 原文

猫瑾少女 2024-09-23 12:31:07

如果不知道您在这里要做什么的更多细节，就很难说。不过，关于我认为您想要实现的目标的一些提示：

如果您只关心 NTFS 卷，我建议查看 USN/更改日志 API。它们自 2000 年起就已存在。这样，在初始清单之后，您只能查看从该点开始的变化。这是一个很好的起点，尽管这里有一篇非常旧的文章： http:// /www.microsoft.com/msj/0999/journal/journal.aspx
另外，利用 USN API，您可以省略哈希步骤，而只自己记录日志中的信息（这将当/如果您查看所述 API 时会变得更加清楚）
第一次通过比较驱动器的内容时，请使用 SHA-1 或 MD5 等哈希值。
将哈希值和其他此类信息存储在某种数据库中。例如，SQLite3。请注意，这本身可能会占用大量空间。快速浏览一下包含 40k 多个文件的音频文件夹，会得到大约 750 兆的 MD5 信息。