加密安全备份
到目前为止,我一直在使用 rsync 从计算机到外部驱动器进行备份。备份数据由数万个小文件和数百个大文件(Maildir 电子邮件和我最喜欢的系列节目的剧集)组成。这样做的问题是,如果我的备份磁盘的单个扇区发生故障,也许单个消息可能会被损坏,这是我无法容忍的。
我想到了一种替代方案,其工作原理如下。共有三棵树:由我希望备份的数据组成的文件树、包含给定时刻的文件树副本的备份树以及包含备份树的文件哈希和元数据哈希的哈希树。整个哈希树的哈希值也被保留。在备份之前,会检查哈希树的哈希值。此处发生故障会使整个备份数据失效。检查成功后,将哈希树形状与备份树形状进行比较,并验证元数据哈希以确保备份树的元数据和形状一致。如果不是,可以列出个别罪魁祸首。之后,进行rsync备份遍历。每当 rsync 更新文件时,都会计算其新哈希和元数据哈希并将其插入到哈希树中。每当 rsync 删除文件时,该文件就会从哈希树中删除。最后,计算并存储哈希树的哈希值。
此过程非常有用,因为哈希值是为正确的数据计算的,这意味着即使文件树中的文件在插入哈希树后损坏,这种不一致也不会导致备份(或将来的备份)无效。然而,最重要的特性是,如果攻击者随心所欲地破坏了备份介质,那么当且仅当它是正确的时,其中的信息才会被信任,除非攻击者破坏了哈希算法。此外,可以增量验证发送到备份或从备份恢复的数据。
我的问题是:这样的备份方案有合理的实现吗?我的搜索告诉我,唯一可用的备份方案要么进行完整备份或差异备份(例如基于 tar),要么无法提供加密正确性保证(rsync)。
如果没有类似的实现,也许我会写一个,但我想避免重新发明轮子。
Until now I have been doing backups with rsync from my computer to an external drive. The backup data is made of tens of thousands of small files and hundreds of big ones (Maildir email messages and episodes of my favorite series). The problem with this is that if a single sector of my backup disk fails, perhaps a single message may be corrupted, which I find intolerable.
I have thought of an alternative that works as follows. There are three trees: the file tree consisting of the data I wish to backup, the backup tree containing a copy of the file tree at a given moment in time and a hash tree which contains file hashes and metadata hashes of the backup tree. A hash of the whole hash tree is also kept. Prior to a backup, the hash of the hash tree is checked. A failure here invalidates the whole backed up data. After the check succeeds, the hash tree shape is compared to the backup tree shape and the metadata hashes are verified to ensure the backup tree is metadata and shape consistent. If it is not, individual culprits can be listed. After this, the rsync backup traversal is performed. Whenever rsync updates a file, its new hash and metadata hash are computed and inserted into the hash tree. Whenever rsync deletes a file, that file is removed from the hash tree. In the end, the hash of the hash tree is computed and stored.
This process is very useful because the hashes are computed for correct data, meaning even if a file in the file tree is corrupted after it has been inserted in the hash tree, this inconsistency does not invalidate the backup (or future backups). The most important property, however, is that if an attacker corrupts the backup medium however he likes, the information that lies there will be trusted if and only if it is correct, unless the attacker has broken the hash algorithm. Also, the data sent to the backup or restored from it can be verified incrementally.
My question is: is there a reasonable implementation of such a backup scheme? My searches tell me that the only backup schemes available either do full or differential backups (tar based, for instance) or fail to provide a cryptographic correctness guarantee (rsync).
If there are no implementations of anything like that, maybe I will write one, but I would like to avoid reinventing the wheel.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(4)
你所说的听起来很像 Git。我认为它几乎可以达到你所描述的效果。只需将“备份”过程实现为
git commit
即可。然后,您可以使用 git checkout 恢复到任何以前的版本。它的存储效率非常高,并且传输内容的速度极其,这可能会为您节省大量备份时间。作为奖励,它是免费的、可移植的并且已经调试过!
What you're talking about sounds a lot like Git. I think it would pretty much do what you're describing. Just implement the process of "backing up" as
git commit
. You can then restore to any previous version withgit checkout
.It is amazingly storage efficient and extremely fast for transfering content, which would probably save you a lot of time on your backups. As a bonus, it's free, portable and already debugged!
这听起来与 Mercurial 存储系统的工作原理几乎完全相同。 “rsync 命令”将使用 Mercurial 的
push
来实现,它的网络效率非常高。This sounds almost exactly identical to how the Mercurial storage system works. The 'rsync command' would be implement using Mercurial's
push
, which is remarkably network efficient.如果我必须解决这个问题,我会采用驱动器的 RAID 阵列(以防止损坏),该阵列使用内置 AES 加密,然后使用我习惯的任何备份方法。
If I had to solve the problem, I'd take RAID array (to prevent corruption) of drives, which use built-in AES encryption, and then would use any backup method I am used to.
Git-Annex 是在可用工具的情况下解决此问题的正确方法。它是 git 的扩展,可以对任意大的文件提供强大的支持,在数据存储之间自动同步,具有可选的图形用户界面,跟踪您拥有的备份数量以及精确的文件存储位置,并允许您设置规则它应该如何管理不同的内容。您还可以自定义用于验证内容完整性的加密哈希值。
对于驱动器备份的需求,git-annex 与 bup 具有互操作性,bup 具有更多功能,适合那些寻求整个系统定期备份的人。
Git-Annex is the proper solution to this problem given available tools. It is an extension to git which allows robust support for files which are arbitrarily large, synchronizes between datastores automatically, has an optional graphical user interface, tracks how many backups you have and precisely what files are stored where, and allows you to set rules for how it should manage different content. You can also customize what cryptographic hashes are used to validate the integrity of the content.
For needs of drive backups, git-annex has interoperability with bup which has more features tuned towards those looking for regular backups of entire systems.