跨主机故障转移设计、DNS 级别故障转移?

发布于 2024-07-11 04:16:14 字数 395 浏览 9 评论 0原文

我对 Web 应用程序的跨托管故障转移策略很感兴趣,这样,如果主站点发生故障,用户可以无缝登陆另一个托管的故障转移站点。

应用程序方面的问题看起来主要是通过在 colo 和服务之间设置主从数据库来解决的,这些数据库旨在恢复并能够在中流中恢复。 我正在尝试找出将流量从主站点转移到故障转移站点的策略。 DNS 故障转移,即使 TTL 较低,似乎也有一定程度的 延迟。

假设无法访问主托管服务器上的服务器,您会建议采取哪些策略来在托管服务器之间快速移动流量?

如果您对跨主机故障转移有其他有趣的经验/智慧之言,我也很想听听。

I'm interested in cross-colo fail-over strategies for web applications, such that if the main site fails users seamlessly land at the fail-over site in another colo.

The application side of things looks to be mostly figured out with a master-slave database setup between the colos and services designed to recover and be able to pick up mid-stream. I'm trying to figure out the strategy for moving traffic from the main site to the fail-over site. DNS failover, even with low TTLs, seems to carry a fair bit of latency.

What strategies would you recommend for quickly moving traffic between colos, assuming the servers at the main colo are unreachable?

If you have other interesting experience / words of wisdom about cross-colo failover I'd love to hear those as well.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

甜柠檬 2024-07-18 04:16:14

即使您在区域文件中设置较低的 TTL,基于 DNS 的机制也很麻烦。

其原因是许多应用程序(例如 MSIE)维护自己的缓存,而忽略 TTL。 其他软件将执行单个 gethostbyname() 或等效调用并存储结果,直到程序重新启动。

更糟糕的是,许多 ISP 的递归 DNS 服务器会忽略低于其首选最小值的 TTL,并强加自己较高的 TTL。

最终,如果站点要从两个数据中心运行而不更改其 IP 地址,那么您需要通过全球 BGP4 路由公告查看“多宿主”的安排。

使用多宿主,您需要至少获得“独立于提供商”(又名“PI”)IP 地址空间的 /24 网络块,然后仅在主站点脱机时才从备份站点将其通告到全局路由表。

DNS based mechanisms are troublesome, even if you put low TTLs in your zone files.

The reason for this is that many applications (e.g. MSIE) maintain their own caches which ignore the TTL. Other software will do a single gethostbyname() or equivalent call and store the result until the program is restarted.

Worse still, many ISPs' recursive DNS servers are known to ignore TTLs below their own preferred minimum and impose their own higher TTLs.

Ultimately if the site is to run from both data centers without changing its IP address then you need to look at arrangements for "Multihoming" via global BGP4 route announcements.

With multihoming you need to get at least a /24 netblock of "provider independent" (aka "PI") IP address space, and then have that only be announced to the global routing table from the backup site if the main site goes offline.

彩扇题诗 2024-07-18 04:16:14

至于DNS,我喜欢参考“为什么基于DNS的全局服务器负载平衡不起作用”。 对于其他一切 - 使用 BGP

使用 BGP 设计网络以实现负载平衡仍然不是一件容易的事,我本人当然也不是这方面的专家。 它也比维基百科可以告诉您的更复杂,但网上有几篇有趣的文章详细介绍了如何完成它:

如果您搜索 BGP 和负载平衡,总会有更多内容。 网上还有一些白皮书描述了 Akamai 如何进行全局负载平衡(我相信也是 BGP),阅读和了解这些白皮书总是很有趣。

除了可以使用软件和硬件来实现的明显概念之外,您可能还需要咨询您的 ISP/提供商/托管中心是否可以为您进行设置。

另外,对于您选择的托管服务(提供商是谁?)没有任何冒犯,但大多数地方都应该设置为处理停机等问题,他们不应该要求您采取行动。 当然,洪水或外星人总是会袭击,但在这种情况下,我想还有更重要的问题。 :-)

As for DNS, I like to reference, "Why DNS Based Global Server Load Balancing Doesn't Work". For everything else -- use BGP.

Designing networks in order to load balance using BGP is still not an easy task and I myself certainly am not an expert on this. It's also more complex than Wikipedia can tell you but there are a couple interesting articles on the web that detail how it can be done:

There is always more if you search for BGP and load balancing. There are also a couple whitepapers on the net which describe how Akamai does their global loadbalancing (I believe it's BGP too.), which is always interesting to read and learn about.

Beyond the obvious concepts you can use software and hardware to achieve, you might also want to check with your ISP/provider/colo if they can set you up.

Also, no offense in regard to your choice of colo (Who's the provider?), but most places should be setup to deal with downtimes and so on, they should not require you to take actions. Of course floods or aliens can always strike, but in that case I guess there are more important issues. :-)

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文