Amazon EBS，快照作为增量备份

发布于 2024-11-16 22:24:35 字数 392 浏览 6 评论 0原文

我正在开发一种自动化机制，以便每天备份我们的 EBS 卷。

我非常清楚创建新快照的步骤。显然这一切都很简单，您有一个可以创建快照的 EBS 卷，并且可以随时恢复快照。美好的。

但我担心的是快照的大小，我知道这些快照是压缩存储在 S3 中的，我们将根据快照的大小收费。如果我们有大量数据，每次备份的发票都会大幅增加。

然而，根据亚马逊的页面，这些快照是增量的。这可以解决我的问题，因为每日备份只会上传自上次快照以来已更改的数据。但这引出了下一个问题：如果备份是增量的并且我们只上传修改后的数据，那么原始数据存储在哪里？（即，第一个快照显然无法增量完成......）

不幸的是，我无法在亚马逊的所有文档中找到此信息。

有人有快照及其计费的经验吗？

我将不胜感激任何帮助，谢谢！

原文

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

风蛊 2024-11-23 22:24:35

我认为您不会找到有关如何实现快照的详细文档；这不是我遇到过的事情。他们确实有“预测成本”的文档。但是，我认为如果您知道它是如何工作的，您能凭直觉看到账单，用起来更放心。

请注意，这些快照不是我们在 DOS 操作系统中理解的“增量”术语。在 DOS 中，修改文件时会设置“存档”位，并且“增量”备份仅复制设置了“存档”位的文件。备份过程将清除存档属性，因此将来对文件的编辑将导致它再次“增量”备份。

对于快照，卷的每个块如果被修改都会被标记。它不是逐个文件完成的。在第一个快照之后，仅备份已标记为已修改的块，就像 DOS 中的“增量”备份一样。但这就是相似之处的结束，因为对于每个不需要复制的块，它不只是跳过它，它会写入一个指针，指向数据的最后一个（未更改的）副本所在的位置。

在为卷创建的第一个快照中，数据被分成块。来自 Amazon：“卷数据在传输到 Amazon S3 之前会被分解为块。虽然块的大小可能会通过未来的优化而改变，但可以通过除以数据的大小来估计数量 [...]自上次快照以来已更改了 4MB。”

您制作的下一个快照仅包含已更改的块的数据以及指向未更改的块的指针。。这些指针指向上一个快照中的数据块。

下一个快照 (n) 是通过记录自上一个快照 (n-1) 以来更改的每个块的数据以及自上一个快照 (n-1) 以来未更改的块的指针来创建的。这些指针指向前一个快照中可能包含数据的相应块，或者指向其前一个快照的另一个指针。最终，每个指针最终都指向一个真实数据块（自创建快照以来，该数据块没有改变）。

现在假设您决定删除快照 (x)。快照 (x) 包含在其之前 (x-1) 和之后 (x+1) 制作的快照。 Amazon 将快照 (x+1) 中的指针替换为快照 (x)（被删除的快照）中的指针和数据。因此，快照 (x) 中的任何实际数据都会复制到快照 (x+1)，除非它拥有该块的最新数据的自己的副本。

这就是快照的工作原理、数据的存储位置以及快照大小易于管理的原因。由此您可以了解到，删除快照只会破坏您将卷恢复到创建该快照时的状态的能力，而不会破坏使用其他快照的能力。与不使用指针的简单传统“增量”备份不同，未删除的快照会根据需要进行更新，以在删除其依赖快照之一时保持其有用性。这就是为什么 Amazon 对智能快照存储收取比 EBS 卷的简单副本更高的费用是有道理的。最后，可以理解的是，很难预测快照存储的成本，因为它是如此动态。

I don't think that you'll find detailed documentation as to how the snapshots are implemented; it's not something I have come across. They do have documentation for "Projecting Costs". However, I think if you know how it works, you can intuit the bill, and feel more at ease with it.

Note that these snapshots are not "incremental" in the way we may have come to understand that term in the DOS operating system. In DOS, the "archive" bit was set when a file was modified, and an "incremental" backup copied only the files that had it's "archive" bit set. The backup process would clear the archive attribute, so a future edit to the file would cause it to be backed up "incrementally" once again.

With snapshots, each block of the volume is flagged if it is modified. It's not done on a file by file basis. After the first snapshot, only blocks that have been flagged as modified are backed up, just like "incremental" backups in DOS. But that's where the similarities end, because with each block that it doesn't have to copy it doesn't just skip it, it writes a pointer to where the last (unchanged) copy of the data is.

The first snapshot you make of a volume, the data is broken up into blocks. From Amazon: "Volume data is broken up into chunks before being transferred to Amazon S3. While the size of the chunks could change through future optimizations, the number [...] can be estimated by dividing the size of the data that has changed since the last snapshot by 4MB."

The next snapshot you make consists of data for only those blocks that have changed, and pointers to the blocks that haven't changed. Those pointers point to blocks of data in the previous snapshot.

The next snapshot (n) is made by recording data of each block changed since the previous snapshot (n-1), along with pointers for the blocks that haven't changed since the previous snapshot (n-1). These pointers point to corresponding blocks in the previous snapshot, which may contain data, or another pointer to its previous snapshot. Eventually, every pointer ends up at a block of real data, (that hasn't changed since that snapshot was created).

Now let's say you decide to delete snapshot (x). Snapshot (x) has snapshots made before it (x-1), and after it (x+1). Amazon replaces the pointers in snapshot (x+1) with pointers and data from snapshot (x) (the one being deleted). As a result, any actual data in snapshot (x) is copied to snapshot (x+1), unless it has it's own copy of more recent data for that block there.

This is how snapshots work, where the data is stored, and why the size of the snapshots are manageable. You can understand from this how deleting a snapshot will destroy only your ability to bring back the volume as it was at the point in time when that snapshot was created, without destroying the ability to use your other snapshots. Unlike simple, traditional "incremental" backups that don't utilize pointers, snapshots not being deleted are updated as needed to maintain their usefulness when one of its dependent snapshots are deleted. This is why it makes sense that Amazon charges more for intelligent snapshot storage than simple copies of EBS volumes. Finally, it's understandable that it's difficult to predict how much snapshot storage is going to cost, since it is so dynamic.

回复收藏 0 原文

~没有更多了~