是否有内置冗余的反向增量备份解决方案(例如par2)?

发布于 2025-01-04 21:58:26 字数 1077 浏览 0 评论 0原文

我正在设置一个家庭服务器主要用于备份。我有大约 90GB 的个人数据,必须以最可靠的方式进行备份,同时仍保留磁盘空间。我想要完整的文件历史记录,以便我可以在任何特定日期返回到任何文件。

由于数据的大小,每周完整备份不是一种选择。相反,我正在寻找增量备份解决方案。但是,我知道一组增量备份中的单个损坏会使整个系列(超出某一点)无法恢复。因此,简单的增量备份不是一个选择。

我研究了许多解决该问题的方法。首先,我会使用反向增量备份,以便最新版本的文件丢失的可能性最小(较旧的文件并不那么重要)。其次,我想通过某种冗余来保护增量和备份。 Par2 奇偶校验数据似乎非常适合这项工作。简而言之,我正在寻找具有以下要求的备份解决方案:

  • 反向增量(以节省磁盘空间并优先考虑最近的备份)
  • 文件历史记录(一种更广泛的类别,包括反向增量)
  • 关于增量和备份的 Par2 奇偶校验数据数据
  • 保留元数据
  • 有效利用带宽(节省带宽;无需为每个增量复制整个目录)。大多数增量备份解决方案应该以这种方式工作。

(我相信)这将确保文件完整性和相对较小的备份大小。我已经研究过许多备份解决方案,但它们有很多问题:

  • Bacula - 简单的普通增量备份
  • bup - 增量并实现 par2,但不是反向增量,并且不保留元数据的
  • 重复性 - 增量、压缩和加密但不是反向增量
  • dar - 增量和 par2 很容易添加,但不是反向增量且没有文件历史记录吗?
  • rdiff-backup - 几乎完美满足我的需要,但它不支持 par2

到目前为止,我认为 rdiff-backup 似乎是最好的妥协,但它不支持 par2。我想我可以很容易地将 par2 支持添加到备份增量中,因为它们不会修改每个备份,但是其余文件呢?我可以为备份中的所有文件递归生成 par2 文件,但这会很慢且效率低下,而且我必须担心备份和旧 par2 文件期间的损坏。特别是,我无法区分已更改的文件和损坏的文件之间的区别,并且我不知道如何检查此类错误或它们将如何影响备份历史记录。有谁知道有更好的解决方案吗?有没有更好的方法来解决这个问题?

感谢您阅读我的困难以及您可以给我的任何意见。任何帮助将不胜感激。

I'm setting a home server primarily for backup use. I have about 90GB of personal data that must be backed up in the most reliable manner, while still preserving disk space. I want to have full file history so I can go back to any file at any particular date.

Full weekly backups are not an option because of the size of the data. Instead, I'm looking along the lines of an incremental backup solution. However, I'm aware that a single corruption in a set of incremental backups makes the entire series (beyond a point) unrecoverable. Thus simple incremental backups are not an option.

I've researched a number of solutions to the problem. First, I would use reverse-incremental backups so that the latest version of the files would have the least chance of loss (older files are not as important). Second, I want to protect both the increments and backup with some sort of redundancy. Par2 parity data seems perfect for the job. In short, I'm looking for a backup solution with the following requirements:

  • Reverse incremental (to save on disk space and prioritize the most recent backup)
  • File history (kind of a broader category including reverse incremental)
  • Par2 parity data on increments and backup data
  • Preserve metadata
  • Efficient with bandwidth (bandwidth saving; no copying the entire directory over for each increment). Most incremental backup solutions should work this way.

This would (I believe) ensure file integrity and relatively small backup sizes. I've looked at a number of backup solutions already but they have a number of problems:

  • Bacula - Simple normal incremental backups
  • bup - incremental and implements par2 but isn't reverse incremental and doesn't preserve metadata
  • duplicity - incremental, compressed, and encrypted but isn't reverse incremental
  • dar - incremental and par2 is easy to add, but isn't reverse incremental and no file history?
  • rdiff-backup - almost perfect for what I need but it doesn't have par2 support

So far I think that rdiff-backup seems like the best compromise but it doesn't support par2. I think I can add par2 support to backup increments easily enough since they aren't modified each backup but what about the rest of the files? I could generate par2 files recursively for all files in the backup but this would be slow and inefficient, and I'd have to worry about corruption during a backup and old par2 files. In particular, I couldn't tell the difference between a changed file and a corrupt file, and I don't know how to check for such errors or how they would affect the backup history. Does anyone know of any better solution? Is there a better approach to the issue?

Thanks for reading through my difficulties and for any input you can give me. Any help would be greatly appreciated.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

你的心境我的脸 2025-01-11 21:58:27

http://www.timedicer.co.uk/index

使用 rdiff-backup 作为引擎。我一直在研究它,但这需要我使用 Linux 或虚拟机设置一个“服务器”。

就我个人而言,我使用 WinRAR 进行伪增量备份(它实际上对最近的文件进行完整备份),每天通过计划任务运行。它类似地是“推送”备份。

它不是真正的增量(或反向增量),但它根据上次更新的时间保存不同版本的文件。我的意思是,即使文件相同,它也会保存今天、昨天和前几天的版本。您可以设置存档位以节省空间,但我不再打扰,因为我备份的只是小型电子表格和文档。

RAR 有自己的奇偶校验或恢复记录,您可以设置大小或百分比。我用的是1%(百分之一)。

它可以保留元数据,我个人跳过高分辨率时间。

它可以非常高效,因为它可以压缩文件。

然后我所要做的就是将文件发送到我的备份中。我将其复制到不同的驱动器和网络中的另一台计算机。不需要真正的服务器,只需共享。尽管 Windows 工作站有 10 个连接的限制,但您无法对太多计算机执行此操作。

因此,出于我的目的(可能适合您的目的),每天备份我的文件以查找过去 7 天内更新的文件。然后我有另一个计划备份,每月或每 30 天运行一次,备份过去 90 天内更新的文件。

但我使用 Windows,所以如果您实际上要设置 Linux 服务器,您可能会查看 Time Dicer。

http://www.timedicer.co.uk/index

Uses rdiff-backup as the engine. I've been looking at it, but that requires me to set up a "server" using linux or a virtual machine.

Personally, I use WinRAR to make pseudo-incremental backups (it actually makes a full backup of recent files) run daily by a scheduled task. It is similarly a "push" backup.

It's not a true incremental (or reverse-incremental) but it saves different versions of files based on when it was last updated. I mean, it saves the version for today, yesterday and the previous days, even if the file is identical. You can set the archive bit to save space, but I don't bother anymore as all I backup are small spreadsheets and documents.

RAR has it's own parity or recovery record that you can set in size or percentage. I use 1% (one percent).

It can preserve metadata, I personally skip the high resolution times.

It can be efficient since it compresses the files.

Then all I have to do is send the file to my backup. I have it copied to a different drive and to another computer in the network. No need for a true server, just a share. You can't do this for too many computers though as Windows workstations have a 10 connection limit.

So for my purpose, which may fit yours, backs up my files daily for files that have been updated in the last 7 days. Then I have another scheduled backup that backups files that have been updated in the last 90 days run once a month or every 30 days.

But I use Windows, so if you're actually setting up a Linux server, you might check out the Time Dicer.

写下不归期 2025-01-11 21:58:27

由于没有人能够回答我的问题,我将写下我在研究该主题时发现的一些可能的解决方案。简而言之,我相信最好的解决方案是 rdiff 备份到 ZFS 文件系统。原因如下:

  • ZFS 对存储的所有块进行校验并可以轻松检测错误。
  • 如果您将 ZFS 设置为镜像数据,它可以通过从良好副本进行复制来恢复错误。
  • 即使数据被复制两次,这比完整备份占用的空间更少。
  • 原件和镜像出现错误的可能性很小。

就我个人而言,我没有使用这个解决方案,因为 ZFS 在 Linux 上工作有点棘手。 Btrfs 看起来很有前途,但多年的使用尚未证明其稳定性。相反,我会选择一种更便宜的选择,即简单地检查硬盘驱动器 SMART 数据。硬盘驱动器应该进行一些错误检查/自我纠正,通过监视这些数据,我可以看到这个过程是否正常工作。它不如额外的文件系统奇偶校验那么好,但总比没有好。

对于寻求可靠备份开发的人们来说,还有一些注释可能会感兴趣:

  • par2 似乎是过时且有缺陷的软件。 zfec 似乎是一个更快的现代替代方案。 bup 中的讨论不久前发生过:https://groups.google.com /group/bup-list/browse_thread/thread/a61748557087ca07
  • 在写入之前计算奇偶校验数据更安全 磁盘。即不写入磁盘,读取它,然后计算奇偶校验数据。从内存中执行此操作,并对照原始版本进行检查以获得额外的可靠性。这可能只能通过 zfec 实现,因为 par2 太慢了。

Since nobody was able to answer my question, I'll write a few possible solutions I found while researching the topic. In short, I believe the best solution is rdiff-backup to a ZFS filesystem. Here's why:

  • ZFS checksums all blocks stored and can easily detect errors.
  • If you have ZFS set to mirror your data, it can recover the errors by copying from the good copy.
  • This takes up less space than full backups, even though the data is copied twice.
  • The odds of an error in both the original and mirror is tiny.

Personally I am not using this solution as ZFS is a little tricky to get working on Linux. Btrfs looks promising but hasn't been proven stable from years of use. Instead, I'm going with a cheaper option of simply checking hard drive SMART data. Hard drives should do some error checking/correcting themselves and by monitoring this data I can see if this process is working properly. It's not as good as additional filesystem parity but better than nothing.

A few more notes that might be interesting to people looking into reliable backup development:

  • par2 seems to be dated and buggy software. zfec seems like a much faster modern alternative. Discussion in bup occurred a while ago: https://groups.google.com/group/bup-list/browse_thread/thread/a61748557087ca07
  • It's safer to calculate parity data before even writing to disk. i.e. don't write to disk, read it, and then calculate parity data. Do it from ram, and check against the original for additional reliability. This might only be possible with zfec, since par2 is too slow.
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文