您的灾难恢复计划是什么?

发布于 2024-07-12 05:38:33 字数 1431 浏览 5 评论 0原文

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(4

凶凌 2024-07-19 05:38:33

更新 - 刚刚看到您的评论。 具有日志传送功能的 Amazon EC2 绝对是最佳选择。 不要使用镜像,因为这通常假设其他备用数据库可用。 如果您将 TTL 设置为该值,则更改 DNS 的时间不应超过 1/2 小时。 这将使您有时间集成任何待处理的日志。 可能每周左右打开一次服务器,只是为了集成待处理的日志(或更少,以避免增加每小时的成本。)


您的主要托管位置应该在所有级别上都有冗余:

  • 多个互联网连接、
  • 设置为故障转移的多个防火墙、
  • 多个集群 Web 服务器、
  • 多个集群数据库服务器、
  • 如果存储文件,请使用 SAN 或 Amazon S3、
  • 每台服务器应根据服务器的用途具有某种形式的 RAID、
  • 每台服务器可以有多个连接到单独电源/断路器的 PSU、
  • 外部以及内部服务器监控软件、
  • 断电时自动打开的发电机以及备用发电机。

在大多数故障情况下,这将使您能够在主要位置运行。

然后在远程位置设置一台服务器,该服务器使用日志传送保持更新,并将其包含在您的部署脚本中(在更新正常的生产服务器之后...)位于国家另一端的并置服务器非常适合这些目的。 为了最大限度地减少因切换到辅助位置而造成的停机时间,请将 DNS 记录上的 TTL 保持在您认为合适的最低水平。

当然,如此多的硬件会变得很陡峭,因此您需要确定什么值得停机 1 秒、1 分钟、10 分钟等,并进行相应调整。

Update - Just saw your comment. Amazon EC2 with log shipping is definitely the way to go. Don't use mirroring because that normally assumes the other standby database is available. Changing your DNS should not take more than 1/2 hour if you set your TTL to that. That would give you time to integrate any logs that are pending. Might turn on the server once a week or so just to integrate logs that are pending (or less to avoid racking up hourly costs.)


Your primary hosting location should have redundancy at all levels:

  • Multiple internet connections,
  • Multiple firewalls set to failover,
  • Multiple clustered web servers,
  • Multiple clustered database servers,
  • If you store files, use a SAN or Amazon S3,
  • Every server should have some form of RAID depending on the server's purpose,
  • Every server can have multiple PSUs connected to separate power sources/breakers,
  • External and internal server monitoring software,
  • Power generator that automatically turns on when the power goes out, and a backup generator for good measure.

That'll keep you running at your primary location in the event of most failure scenarios.

Then have a single server set up at a remote location that is kept updated using log shipping and include it in your deployment script (after your normal production servers are updated...) A colocated server on the other side of the country does nicely for these purposes. To minimize downtime of having to switch to the secondary location keep your TTL on the DNS records as low as you are comfortable.

Of course, so much hardware is going to be steep so you'll need to determine what is worth being down for 1 second, 1 minute, 10 minutes, etc. and adjust accordingly.

街道布景 2024-07-19 05:38:33

这完全取决于您的停机时间要求。 如果您必须在几秒钟内恢复,以免失去数十亿美元的业务,那么您所做的事情将会与您拥有一个可以让您也许每月 1000 美元,如果一天宕机,其收入不会受到明显影响。

我知道这不是一个特别有用的答案,但这是一个很大的领域,有很多变量,如果没有更多信息,几乎不可能推荐一些真正适合您情况的东西(因为我们真的不知道您的情况是什么)情况是)。

It all depends on what your downtime requirements are. If you've got to be back up in seconds in order to not lose your multi-billion dollar business, then you'll do things a lot differently to if you've got a site that makes you maybe $1000/month and whose revenue won't be noticeably affected if it's down for a day.

I know that's not a particularly helpful answer, but this is a big area, with a lot of variables, and without more information it's almost impossible to recommend something that's actually going to work for your situation (since we don't really know what your situation is).

野味少女 2024-07-19 05:38:33

坚如磐石的灾难恢复策略的起点是首先计算出服务器/平台停机对业务造成的真正成本是多少。

下面的文章将帮助您沿着正确的路线开始。

https:// web.archive.org/web/1/http://articles.techrepublic%2ecom%2ecom/5100-10878_11-1038783.html

如果您需要进一步的指南,老牌 Google 可以提供更多阅读内容。

这种性质的项目需要您与关键业务决策者合作,并且您需要与他们沟通停机的相关成本以及业务影响。 您可能需要与多个业务部门合作才能收集所需的信息。 然后,您需要共同决定什么是您的企业可接受的停机时间。 只有这样,您才能设计灾难恢复策略来满足这些要求。

您还会发现,进行此练习可能会突出您的平台当前配置在高可用性方面的缺陷,这可能也需要作为一个旁白项目进行审查。

摆脱所有这些的关键点是,关于可接受的停机时间的决定不是由 DBA 独自决定,而是由 DBA 提供必要的信息和专业知识,以便做出切合实际的决定。 您的任务是实施能够满足业务需求的策略。

不要忘记通过执行测试场景来测试您的灾难恢复策略,以验证您的恢复时间并练习该过程。 如果到了需要实施灾难恢复策略的时候,您可能会面临压力,您的手机会频繁响起,人们会像蚊子一样在您周围盘旋。 在磨练和实践您的灾难恢复响应之后,您可以自信地控制局势,并且实施恢复将是一个顺利的过程。

祝你的项目好运。

The starting point for a rock solid DR Strategy is to first work out what the true cost is to the business of your server/platform downtime.

The following article will get you started along the right lines.

https://web.archive.org/web/1/http://articles.techrepublic%2ecom%2ecom/5100-10878_11-1038783.html

If you require further guidelines good old Google can provide plenty more reading.

A project of this nature requires you to collaborate with your key business decision makers and you will need to communicate to them what the associated costs of downtime are and what the business impact would be. You will likely need to collaborate with several business units in order to gather the required information. Collectively you then need to come to a decision as to what is considered acceptable downtime for your business. Only then can you devise a DR strategy to accommodate these requirements.

You will also find that conducting this exercise may highlight shortcomings in your platforms current configuration with regard to high availability and this may also need to be reviewed as an aside project.

The key point to take away from all of this is that the decision as to what is an acceptable period of downtime is not for the DBA alone to decide but rather to provide the information and expert knowledge necessary so that a realistic decision can be reached. Your task is to implement a strategy that can meet the business requirements.

Don’t forget to test your DR strategy by conducting a test scenario in order to validate your recovery times and to practice the process. Should the time come when you need to implement your DR strategy you will likely be under pressure, your phone will be ringing frequently and people will be hovering around you like mosquitoes. Having already honed and practiced your DR response, you can be confident in taking control of the situation and implementing the recovery will be a smooth process.

Good luck with your project.

潜移默化 2024-07-19 05:38:33

我没有使用过不同的第三方工具,但我体验过cloudendure,至于你得到的复制品,我可以看出它是一个非常高端的产品。 复制是在非常小的时间间隔内完成的,这使得您的副本非常可靠,但我可以看到您不需要在几秒钟内备份您的站点,因此也许询问价格或摆脱不同的供应商可能会有所帮助。

I haven't worked with different third party tools but I've experienced cloudendure, and as for the replica you get I can tell it is a really high end product. Replication is done in really tiny time intervals which makes your replica very reliable, but I can see you're not in need of having your site back up within seconds so maybe asking for a price offer or getting away with a different vendor might help.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文