如何比较两个卷并列出修改的文件?

发布于 2024-09-16 12:31:07 字数 281 浏览 7 评论 0原文

我有2个硬盘卷(一个是另一个的备份映像),我想比较这些卷并列出所有修改的文件,以便用户可以选择他/她想要回滚的文件。

目前,我正在递归新卷并将每个文件的时间戳与旧卷的文件进行比较(如果它们位于旧卷中)。显然这是一种错误的做法。这既耗时又错误!

有没有一种有效的方法来做到这一点?

编辑:
- 我正在使用 FindFirstFile 并且喜欢递归卷,并收集每个文件的信息(不是很慢,只需几分钟)。
- 我正在使用卷影复制进行备份。
- 备份卷是远程的,因此我无法连续监控实际卷。

I have 2 hard-disk volumes(one is a backup image of the other), I want to compare the volumes and list all the modified files, so that the user can select the ones he/she wants to roll-back.

Currently I'm recursing through the new volume and comparing each file's time-stamps to the old volume's files (if they are int the old volume). Obviously this is a blunder approach. It's time consuming and wrong!

Is there an efficient way to do it?

EDIT:
- I'm using FindFirstFile and likes to recurse the volume, and gather info of each file (not very slow, just a few minutes).
- I'm using Volume Shadow Copy to backup.
- The backup-volume is remote so I cannot continuously monitor the actual volume.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(5

剩一世无双 2024-09-23 12:31:07

这部分取决于两卷的复制方式;如果从文件系统的角度来看它们是“真实”副本(例如卷影副本或其他块级副本),您可以对 USN 做一些棘手的小事情,这是其他人建议您研究的通用技术。您可能想查看类似 FSCTL_READ_FILE_USN_DATA 的 API ,例如。该 API 可以让您比较文件的两个不同副本(再次假设它们是同一个文件,具有来自块级备份的相同文件参考号)。如果您想在很大程度上实现无状态,那么这个 API 和类似的 API 将会对您有很大帮助。我的算法看起来像这样:

foreach( file in backup_volume ) {
    file_still_exists = try_open_by_id( modified_volume )
    if (file_still_exists) {
        usn_result = compare_usn_values_of_files( file, file_in_modified_volume )
        if (usn_result == equal_to) {
           // file hasn't changed at all
        } else {
           // file has changed (somehow)
        }
    } else {
        // file was deleted (possibly deleted and recreated)
    }
}
// we still don't know about files new in modified_volume

综上所述,我的经验让我相信这将比我即兴解释所暗示的更复杂。不过,这可能是一个很好的起点。

如果卷不是彼此的块级副本,那么比较 USN 编号和文件 ID 将非常困难(如果不是不可能的话)。相反,您很可能会按文件名进行操作,如果不打开每个文件,这将是很困难的(如果不是不可能的话)(时间可以由应用程序修改,大小和时间在 findfirst/next 查询中可能会过时,并且您必须处理删除然后重新创建的案例、重命名案例等)。

因此,了解您对环境的控制程度非常重要。

Part of this depends upon how the two volumes are duplicated; if they are 'true' copies from the file system's point of view (e.g. shadow copies or other block-level copies), you can do a few tricky little things with respect to USN, which is the general technology others are suggesting you look into. You might want to look at an API like FSCTL_READ_FILE_USN_DATA, for example. That API will let you compare two different copies of a file (again, assuming they are the same file with the same file reference number from block-level backups). If you wanted to be largely stateless, this and similar APIs would help you a lot here. My algorithm would look something like this:

foreach( file in backup_volume ) {
    file_still_exists = try_open_by_id( modified_volume )
    if (file_still_exists) {
        usn_result = compare_usn_values_of_files( file, file_in_modified_volume )
        if (usn_result == equal_to) {
           // file hasn't changed at all
        } else {
           // file has changed (somehow)
        }
    } else {
        // file was deleted (possibly deleted and recreated)
    }
}
// we still don't know about files new in modified_volume

All of that said, my experience leads me to believe that this will be more complicated than my off-the-cuff explanation hints at. This might be a good starting place, though.

If the volumes are not block-level copies of one another, then it will be very difficult to compare USN numbers and file IDs, if not impossible. Instead, you may very well be going by file name, which will be difficult if not impossible to do without opening every file (times can be modified by apps, sizes and times can be out of date in the findfirst/next queries, and you have to handle deleted-then-recreated cases, rename cases, etc.).

So knowing how much control you have over the environment is pretty important.

梅窗月明清似水 2024-09-23 12:31:07

我不会等到更改发生后,然后扫描整个磁盘来查找已更改的(通常很少)文件,而是设置一个程序来使用 ReadDirectoryChangesW 监控发生的变化。这将使您能够以最少的麻烦和麻烦来构建文件列表。

Instead of waiting until after changes have happened, and then scanning the whole disk to find the (usually few) files that have changed, I'd set up a program to use ReadDirectoryChangesW to monitor changes as they happen. This will let you build a list of files with a minimum of fuss and bother.

感性不性感 2024-09-23 12:31:07

假设您没有将新卷上的每个文件与快照中的每个文件进行比较,这是您可以做到的唯一方法。在不查看所有文件的情况下,如何找到哪些文件未被修改?

Assuming you're not comparing each file on the new volume to every file in the snapshot, that's the only way you can do it. How are you going to find which files aren't modified without looking at all of them?

苏辞 2024-09-23 12:31:07

我不是 Windows 程序员。
但是,您不应该有 stat 函数来检索文件的修改时间。
根据修改时间对文件进行排序。
修改时间大于上次备份时间的文件是您感兴趣的文件。

您第一次可以迭代备份卷,从您感兴趣的集合中找出最大修改时间和创建时间。
我假设备份卷中感兴趣的目录没有被修改。

I am not a Windows programmer.
However shouldn't u have stat function to retrieve the modified time of a file.
Sort the files based on mod time.
The files having mod time greater than your last backup time are the ones of your interest.

For the first time u can iterate over the back up volume to figure out the max mod time and created time from your interested set.
I am assuming the directories of interest don't get modified in the backup volume.

猫瑾少女 2024-09-23 12:31:07

如果不知道您在这里要做什么的更多细节,就很难说。不过,关于我认为您想要实现的目标的一些提示:

  • 如果您只关心 NTFS 卷,我建议查看 USN/更改日志 API。它们自 2000 年起就已存在。这样,在初始清单之后,您只能查看从该点开始的变化。这是一个很好的起点,尽管这里有一篇非常旧的文章: http:// /www.microsoft.com/msj/0999/journal/journal.aspx
  • 另外,利用 USN API,您可以省略哈希步骤,而只自己记录日志中的信息(这将当/如果您查看所述 API 时会变得更加清楚)
  • 第一次通过比较驱动器的内容时​​,请使用 SHA-1 或 MD5 等哈希值。
  • 将哈希值和其他此类信息存储在某种数据库中。例如,SQLite3。请注意,这本身可能会占用大量空间。快速浏览一下包含 40k 多个文件的音频文件夹,会得到大约 750 兆的 MD5 信息。

Without knowing more details about what you're trying to do here, it's hard to say. However, some tips about what I think you're trying to achieve:

  • If you're only concerned about NTFS volumes, I suggest looking into the USN / change journal API's. They have been around since 2000. This way, after the initial inventory you can only look at changes from that point on. A good starting point for this, though a very old article is here: http://www.microsoft.com/msj/0999/journal/journal.aspx
  • Also, utilizing USN API's, you could omit the hash step and just record information from the journal yourself (this will become more clear when/if you look into said APIs)
  • The first time through comparing a drive's contents, utilize a hash such as SHA-1 or MD5.
  • Store hashes and other such information in a database of some sort. For example, SQLite3. Note that this can take up a huge amount of space itself. A quick look at my audio folder with 40k+ files would result in ~750 megs of MD5 information.
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文