网络分区在线记忆恢复

发布于 2024-07-14 15:21:27 字数 1459 浏览 6 评论 0原文

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

谈情不如逗狗 2024-07-21 15:21:27

经过一些实验,我发现了以下内容:

  • 如果两个节点之间存在节点断开连接并重新连接而无需重新启动 mnesia,则 Mnesia 认为网络已分区。
  • 即使在断开连接期间没有发生 Mnesia 读/写操作也是如此。
  • Mnesia 本身必须重新启动才能清除分区网络事件 - 网络分区后您无法 force_load_table
  • 只需要重启 Mnesia 即可清除网络分区事件。 您不需要重新启动整个节点。
  • Mnesia 通过让新重新启动的 Mnesia 节点用来自另一个 Mnesia 节点的数据覆盖其表数据来解决网络分区问题(启动表加载算法)。
  • 一般来说,节点会从运行时间最长的节点复制表(这是我看到的行为,我还没有验证这是否是明确编码的而不是其他东西的副作用)。 如果您从集群中断开节点,在两个分区(断开连接的节点及其旧对等节点)中进行写入,关闭所有节点并重新启动所有节点,首先启动断开连接的节点,断开连接的节点将被视为主节点,其主节点将被视为主节点。数据将覆盖所有其他节点。 没有表比较/校验和/仲裁行为。

因此,为了回答我的问题,可以通过在分区中您决定丢弃其数据的节点上执行 mnesia:stop()、mnesia:start() 来执行半在线恢复(我将其称为丢失的分区)。 执行 mnesia:start() 调用将导致节点联系分区另一侧的节点。 如果丢失分区中有多个节点,您可能需要将用于表加载的主节点设置为获胜分区中的节点 - 否则我认为它有可能从丢失分区中的另一个节点加载表,从而返回分区网络状态。

不幸的是,mnesia 不支持在启动表加载阶段合并/协调表内容,也不支持在启动后返回表加载阶段。

合并阶段特别适合 ejabberd,因为节点仍然具有用户连接,因此知道它拥有/应该是最新的用户记录(假设每个集群有一个用户连接)。 如果存在合并阶段,节点可以过滤用户数据表,保存连接用户的所有记录,照常加载表,然后将保存的记录写回 mnesia 集群。

After some experimentation I've discovered the following:

  • Mnesia considered the network to be partitioned if between two nodes there is a node disconnect and a reconnect without an mnesia restart.
  • This is true even if no Mnesia read/write operations occur during the time of the disconnection.
  • Mnesia itself must be restarted in order to clear the partitioned network event - you cannot force_load_table after the network is partitioned.
  • Only Mnesia needs to be restarted in order to clear the network partitioned event. You don't need to restart the entire node.
  • Mnesia resolves the network partitioning by having the newly restarted Mnesia node overwrite its table data with data from another Mnesia node (the startup table load algorithm).
  • Generally nodes will copy tables from the node that's been up the longest (this was the behaviour I saw, I haven't verified that this explicitly coded for and not a side-effect of something else). If you disconnect a node from a cluster, make writes in both partitions (the disconnected node and its old peers), shutdown all nodes and start them all back up again starting the disconnected node first, the disconnected node will be considered the master and its data will overwrite all the other nodes. There is no table comparison/checksumming/quorum behaviour.

So to answer my question, one can perform semi online recovery by executing mnesia:stop(), mnesia:start() on the nodes in the partition whose data you decide to discard (which I'll call the losing partition). Executing the mnesia:start() call will cause the node to contact the nodes on the other side of the partition. If you have more than one node in the losing partition, you may want to set the master nodes for table loading to nodes in the winning partition - otherwise I think there is a chance it will load tables from another node in the losing partition and thus return to the partitioned network state.

Unfortunately mnesia provides no support for merging/reconciling table contents during the startup table load phase, nor does it provide for going back into the table load phase once started.

A merge phase would be suitable for ejabberd in particular as the node would still have user connections and thus know which user records it owns/should be the most up-to-date for (assuming one user conneciton per cluster). If a merge phase existed, the node could filter userdata tables, save all records for connected users, load tables as per usual and then write the saved records back to the mnesia cluster.

染年凉城似染瑾 2024-07-21 15:21:27

Sara 的回答很棒,甚至可以看看关于 CAP 的文章。 Mnesia 开发者为了 CA 牺牲了 P。 如果您需要 P,那么您应该选择您想要牺牲的 CAP,而不是选择其他存储。 例如 CouchDB (牺牲 C)或 Scalaris(牺牲 A)。

Sara's answer is great, even look at article about CAP. Mnesia developers sacrifice P for CA. If you need P, then you should choice what of CAP you want sacrifice and than choice another storage. For example CouchDB (sacrifice C) or Scalaris (sacrifice A).

挽清梦 2024-07-21 15:21:27

它的工作原理是这样的。 想象一下天空中布满了鸟儿。 拍照直到拍到所有的鸟。
将图片放在桌子上。 将图片相互映射。 所以你每只鸟都会看到一次。 你看到每只鸟了吗? 好的。 然后你就知道了,那个时候。 系统稳定。
记录所有鸟类的声音(信息)并拍摄更多照片。 然后重复。

如果您有节点分裂。 返回到最新的通用稳定版快照。 并尝试**重播此后附加的内容。 :)

更好地描述了
“分布式快照:确定分布式系统的全局状态”
K. MANI CHANDY 和 LESLIE LAMPORT

** 我认为在尝试重播所发生的事情时,决定追随谁的时钟存在问题

It works like this. Imagine the sky full of birds. Take pictures until you got all the birds.
Place the pictures on the table. Map pictures over each other. So you see every bird one time. Do you se every bird? Ok. Then you know, at that time. The system was stable.
Record what all the birds sounds like(messages) and take some more pictures. Then repeat.

If you have a node split. Go back to the latest common stable snapshot. And try** to replay what append after that. :)

It's better described in
"Distributed Snapshots: Determining Global States of Distributed Systems"
K. MANI CHANDY and LESLIE LAMPORT

** I think there are a problem deciding who's clock to go after when trying to replay what happend

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文