CouchDB 复制如何处理故障/恢复的服务器?
考虑以下场景:
3 个 EC2 实例位于:
- 美国西部
- 爱尔兰
- 东京
每个实例都是专用的 CouchDB 服务器。每个 CouchDB 服务器都设置为与其他每个服务器运行连续复制(双向)。
现在假设爱尔兰服务器由于某些 AWS 中断而离线。 US-WEST 和东京 CouchDB 服务器将重试 X 次,然后最终与该服务器的复制失败(这是正确的吗?)
假设 6 小时过去了,AWS 使该区域重新上线,该服务器也恢复正常 - 我假设 US-WEST 和东京将忽略爱尔兰的服务器直到爱尔兰 CouchDB 服务器重新启动与它们的双向同步,例如:
Irish CouchDB _replicator伪设置
- 复制[源=本地主机,目标=美国西部]
- 复制[源=美国西部,目标=本地主机]
- 复制[源=本地主机,目标=东京]
- 复制[源=东京,目标=本地主机]
Q1:我对Couch复制失败/恢复的理解是否正确?
问题 2:如果出现网络故障,一小时后自行修复(具体来说:没有服务器重新启动,迫使数据库在启动时重新初始化),各个 CouchDB 实例对此有何反应?我想美国西部和东京会忘记爱尔兰,但爱尔兰会突然开始再次与这两台服务器对话,重新初始化双向连续复制吗?
我对 EC2 环境中的故障恢复特别感兴趣,因此如果我遗漏了该环境的具体细节,请告诉我。
谢谢!
Consider the following scenario:
3 EC2 instances located in:
- US-WEST
- Ireland
- Tokyo
Each instance is a dedicated CouchDB server. Each CouchDB server is setup to run continuous replication with every other server (bi-directional).
Now assume that the Ireland server goes offline due to some AWS outage. The US-WEST and Tokyo CouchDB servers will retry X number of times and then eventually fail replication with that server (is this correct?)
Lets say 6 hours go by and AWS gets the region back online and that server comes back up -- I assume US-WEST and Tokyo will ignore the server in Ireland until the Irish CouchDB server re-initiates the bi-directional sync with both of them, a la:
Irish CouchDB _replicator Pseudo-Settings
- replicate[source=localhost,target=us-west]
- replicate[source=us-west,target=localhost]
- replicate[source=localhost,target=tokyo]
- replicate[source=tokyo,target=localhost]
Q1: Is my understanding of Couch's replication failure/recovery correct?
Q2: What if there is a network failure that fixes itself an hour later (specifically: there is no server restart forcing the DB to re-init itself on startup), how do the respective CouchDB instances react to this? I imagine that us-west and tokyo will forget about Ireland, but will Ireland suddenly start talking with those two servers again, re-initializing the bidirectional, continuous replication?
I am specifically interested in failure recovery in the EC2 environment, so if there is a specific detail to that environment I have missed, please let me know.
Thanks!
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
在 1.1 之前,复制任务不是持久的,甚至是连续的。如果发生断开连接,重试的次数有限,但最终会停止。当连接恢复时,您将需要再次启动复制。由于复制是幂等的(启动相同的复制任务两次与启动一次相同),您只需添加一个 cronjob 来每分钟启动一次(或者您认为合理的任何时间间隔)。如果任务已在运行,则尝试返回成功(但不会启动另一个复制)。
在1.1中,您可以通过在特殊的_replicator数据库中创建文档来创建持久复制任务。如果 CouchDB 崩溃或连接中断,将重试此操作。注意:1.1.0最终放弃,在下一个版本(1.1.1)中我们允许无限重试。
由于 CouchDB 是从一开始就支持多主复制而设计的,因此当您听说它能够很好地处理连接中断时,您不会感到惊讶。中断期间发生的变化会被快速发现并复制。
Prior to 1.1, a replication task is not persistent, even a continuous one. In the event of a disconnection, there is a limited attempt at retrying, but eventually it will stop. When connectivity resumes you will need to initiate replication again. Since replication is idempotent (starting the same replication task twice is the same as starting it once), you can just add a cronjob to start it every minute (or whatever interval seems sane to you). If the task is running already, the attempt returns success (but does not start another replication).
In 1.1, you can create persistent replication tasks by creating a document in the special _replicator database. CouchDB will retry this if it crashes or the connection is interrupted. NOTE: 1.1.0 eventually gives up, in the next release (1.1.1) we allow infinite retries.
As CouchDB is designed from the ground up to support multi-master replication, you won't be surprised to hear that it handles interruptions to connectivity very well. The changes that occurred during the interruption are rapidly found and replicated.