Amazon RDS 备份/快照实际上是如何工作的?
我是 Amazon RDS 客户,每天都会遇到 Amazon RDS 写入延迟峰值,大致与备份窗口相对应。我还会在快照结束时看到峰值(例如:运行快照大约需要 1 小时,并且在最后 5 分钟内出现写入延迟峰值)。我正在运行多可用区 m1.large 部署。
Stack 上有没有人可以解释 Amazon RDS 备份实际上是如何工作的?我已阅读 Amazon RDS 文档,据我所知,Amazon RDS 的行为不符合规范。具体来说,这些备份/快照操作应该影响我的副本,因此不会造成任何停机/性能影响,至少我是这么认为的。
我可以将问题归纳为六个问题:
- 快照和备份期间技术上发生了什么,它们有何不同? (如果您回答这个问题,请告诉我您是否能够凭经验确认您的答案,或者只是引用我的文档)。
- 在多可用区部署的备份窗口期间,写入延迟是否会出现峰值?
- 在多可用区部署的快照结束时,写入延迟是否会出现峰值?
- 如果我不是多可用区,我的写入延迟峰值是否会更高?
- 从架构上来说,如果我在两个 m1.large EC2 实例上运行自己的数据库,是否能够避免这些写入延迟峰值?
- 我是否可以使用任何配置来避免这些写入延迟峰值,同时仍然使用 RDS 托管我的数据库,或者我是否实际上受到 Amazon 的摆布?
额外问题:您在哪里以及如何托管 mysql 数据库?
我可以说,除了这些日常写入延迟问题之外,我对 RDS 总体上很满意。我喜欢内置的数据库监控,它的设置和使用相当简单。
谢谢!
I am an Amazon RDS customer and am experiencing daily amazon RDS write latency spikes, corresponding roughly to the backup window. I will also see spikes at the end of a snapshot (case in point: running a snapshot takes appx 1 hour, and in the final 5 minutes, write latency spikes). I am running a multi-AZ m1.large deployment.
Is there anyone on Stack who can explain how Amazon RDS backup is actually working? I've read the Amazon RDS docs, and as far as I can tell, Amazon RDS is not behaving according to spec. Specifically, these backup/snapshot operations should be hitting my replica, and therefore not causing any downtime/performance hit, or so I thought.
I can distill my problem into six questions:
- What is technically happening during a snapshot and a backup, and how are they different? (If you answer this question, please tell me if you are able to empirically confirm your answer, or are simply quoting me documentation).
- Is a spike in write latency to be expected during the backup window on a multi-AZ deployment?
- Is a spike in write latency to be expected at the end of a snapshot on a multi-AZ deployment?
- Would my write latency spike be even higher if I was not multi-AZ ?
- Architecturally, would I be able to avoid these write latency spikes if I rolled my own database running on two m1.large EC2 instances?
- Are there any configurations I can use that would avoid these write latency spikes while still hosting my DB with RDS, or am I effectively at the mercy of Amazon?
Bonus Question: where and how do you host your mysql database?
I can say that I have been generally happy with RDS except for these daily write latency issues. I love the built-in database monitoring and it was fairly simple to setup and get going.
Thanks!
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
除了我们自己管理的一些机器上的 MySQL 之外,我们还运行多个 RDS 实例。我无法具体发表评论,因为我不是亚马逊工程师,但我了解到的几件事可能可以解释您所看到的内容:
尽管亚马逊没有 100% 共享后端详细信息,但我们强烈怀疑他们正在使用 EBS 系统来支持 RDS 数据库。
本文帮助解释 EBS 限制和快照功能 http:// blog.rightscale.com/2008/08/20/amazon-ebs-explained/ 同样,虽然不明确,但 Amazon 使用此基础设施来提供 RDS 服务是有意义的。
通常,与快照相比,MySQL 备份涉及使用 mysqldump 等工具创建 SQL 语句文件,然后该文件将重现数据库。不需要冻结数据库即可执行此操作。对于 EBS 后端,最佳实践是在创建快照时冻结数据库(暂停所有事务)以避免数据损坏。
您在备份窗口末尾看到的峰值。如果在复制副本快照期间 Amazon 暂停复制,则副本将需要在快照完成后“赶上”事务。这会导致延迟峰值。
跨多可用区部署的复制本质上比单可用区部署慢。为了获得更好的冗余而付出的代价。
We also run several RDS instances, in addition to MySQL on some machines that we manage ourselves. I can't comment specifically, as I'm not an Amazon engineer, but several things I've learned that might explain what you're seeing:
Although Amazon does not share the backend details 100%, we strongly suspect that they are using their EBS system to back RDS databases.
This article helps explain EBS limitations and snapshot functionality http://blog.rightscale.com/2008/08/20/amazon-ebs-explained/ Again, while it's not explicit, it would make sense for Amazon to be using this infrastructure to provide RDS services.
Typically, a MySQL backup, in contrast to a snapshot, involves using a tool like mysqldump to create a file of SQL statements that will then reproduce the database. The database does not need to be frozen to do this. With an EBS backend, the best practice is to freeze the database (pause all transactions) while you are snapshotting to avoid data corruption.
The spikes you're seeing at the ends of the backup window. If replication is paused by Amazon during the snapshot of your replica, the replica would then need to "catch up" on transactions when the snapshot was complete. This would cause a latency spike.
Replication across a multi-AZ deployment is inherently slower then a single AZ deployment. The price you pay for better redundancy.
亚马逊透露了他们在多可用区部署中使用的基本架构。这可能有助于人们做出决定
https:// /aws.amazon.com/blogs/database/amazon-rds-under-the-hood-multi-az/
Amazon revealed the basic architecture that they use in Multi AZ deployments. This may help people to take decisions
https://aws.amazon.com/blogs/database/amazon-rds-under-the-hood-multi-az/