增量备份:如何跟踪文件删除
我有一个异地备份解决方案,它在 C++ 上运行,将文件分成块,并在 SQLITE3 数据库上使用 md5 哈希来跟踪块。它将块与数据库一起传输到远程站点。
因此,当我想要进行恢复时,它会查询 SQLITE3 数据库并相应地恢复块。
当第一个备份运行时,它会创建一个名为 base_backup 的大表。每个后续文件更改或新文件都会作为新记录添加到新表中。如果我想进行恢复,我会查询 base_backup 表以及所有差异并恢复文件。
备份运行的方式是,它会扫描给定文件夹中的所有文件以查找存档位,如果清除了存档位,则验证数据库中是否尚不存在记录并决定是否备份它。
说到我的问题,如果本地计算机上的文件被删除,我如何跟踪它并相应地更新异地备份?因为当我进行恢复时,我不想恢复所有垃圾文件。有没有办法知道文件是否已从文件夹中删除?我不想从数据库运行验证检查,因为这会花费太长时间。
I have an offsite backup solution which runs on C++ to break the files into blocks, and keeps track of the blocks using md5 hashes on a SQLITE3 database. And it transfers the blocks along with the database to a remote site.
So, when I want to do a restore, it queries the SQLITE3 database and restores the blocks accordingly.
When the first backup runs, it creates a big table called the base_backup. Every subsequent file changes or new files are added as new records in a new table. If I want to do a restore, I query the base_backup table plus all the differences and restore the files.
The way the backup runs, it scans for all the files in a given folder for the archive bit, and if it is cleared, then verifies if a record does not already exist in the database and decides whether to back it up or not.
Coming to my question, if a file is deleted on the local computer, how do I keep track of it and update the offsite backup accordingly? Because when I do a restore, I don't want to restore all the garbage files. Is there anyway of knowing if files have been deleted from a folder or not? I do not want to run a verify check from the database since it will take too long.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
inotify 与
IN_DELETE
?inotify with
IN_DELETE
?创建一个服务来监视目录(使用 FindFirstChangeNotification 或 ReadDirectoryChangesW)
Create a Service to monitor the directory (Use FindFirstChangeNotification or ReadDirectoryChangesW)
您可以向数据库添加一条新信息,其中列出了上次备份期间存在的文件。然后,即使文件没有更改,备份期间也会创建一个新的(小)条目,表明它仍然存在。
从过去的给定日期恢复备份时,仅选择具有指定它们在上一次备份期间存在的条目的文件。
例如,像这样的一对表可能会起作用:
请注意,
path/to/file2
不会出现在备份 #2 中,因为它在备份期间不在目录中(它必须已被删除) )。有人想要恢复 3 月 15 日存在的文件,您查看备份索引表,发现备份 #1 是最新的,并从路径表中查找备份 1 中存在的所有路径。
因此,基本上,您将决定文件是否被删除的时间推迟到恢复操作,而不是备份操作。
You could add a new piece of information to your database which lists which files existed during the last backup. Then, even if a file had not changed, a new (small) entry would be made during the backup, indicating that it still existed.
When restoring a backup from a given date in the past, only select the files which had entries specifying that they existed during the previous backup.
For example, a pair of tables like this might work:
Notice that
path/to/file2
does not appear in backup #2, as it was not in the directory during the backup (it must have been deleted).Somebody wants to restore as files existed on March 15th, you look at the table of backup indices, see that backup #1 was the most recent, and look up all paths that existed in backup 1 from the paths table.
So basically, you are pushing off deciding whether a file was deleted onto the restore operation, rather than the backup operation.