在mnesia集群中,查询哪个节点?

发布于 2024-07-16 16:31:33 字数 208 浏览 10 评论 0原文

假设您在节点 A 和 B 上复制了一个 mnesia 表。如果在节点 C 上(不包含该表的副本),我会执行 mnesia:change_config(extra_db_nodes, [NodeA, NodeB]) ,然后在节点 CI 上执行 mnesia:dirty_read(user, bob) 节点 C 如何选择要在哪个节点的表副本上执行查询?

Let's say you have a mnesia table replicated on nodes A and B. If on node C, which does not contain a copy of the table, I do mnesia:change_config(extra_db_nodes, [NodeA, NodeB]), and then on node C I do mnesia:dirty_read(user, bob) how does node C choose which node's copy of the table to execute a query on?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

左秋 2024-07-23 16:31:33

根据我自己的研究,这个问题的答案是 - 它将选择最近连接的节点。 如果发现错误,我将不胜感激地指出 - mnesia 是一个非常复杂的系统!

正如 Dan Gudmundsson 指出的关于选择要查询的远程节点的邮件列表算法在mnesia_lib:set_remote_where_to_read/2中定义。 它如下所示

set_remote_where_to_read(Tab, Ignore) ->
    Active = val({Tab, active_replicas}),
    Valid =
       case mnesia_recover:get_master_nodes(Tab) of
           [] ->  Active;
           Masters -> mnesia_lib:intersect(Masters, Active)
       end,
    Available = mnesia_lib:intersect(val({current, db_nodes}), Valid -- Ignore),
    DiscOnlyC = val({Tab, disc_only_copies}),
    Prefered  = Available -- DiscOnlyC,
    if
       Prefered /= [] ->
           set({Tab, where_to_read}, hd(Prefered));
       Available /= [] ->
           set({Tab, where_to_read}, hd(Available));
       true ->
           set({Tab, where_to_read}, nowhere)
    end.

因此它获取 active_replicas 列表(即候选列表),可选择将列表缩小到表的主节点,删除要忽略的表(出于任何原因),将列表缩小到当前连接的节点,然后选择按以下顺序:

  1. 首先是非 disc_only_copies
  2. 任何可用节点

最重要的部分实际上是 active_replicas 列表,因为它决定了候选列表中节点的顺序。

active_replicas 列表是由从新连接的节点到旧节点(即之前在集群中的节点)远程调用 mnesia_controller:add_active_replica/* 形成的,归结为函数 add/1 将项目添加为列表的头部。

因此问题的答案是 - 它将选择最近连接的节点

笔记:
要查看给定节点上的活动副本列表,您可以使用以下(肮脏的黑客)代码:

[ {T,X} || {{T,active_replicas}, X} <- ets:tab2list(mnesia_gvar) ]. 

According to my own research answer for the question is - it will choose the most recently connected node. I will be grateful for pointing out errors if found - mnesia is a really complex system!

As Dan Gudmundsson pointed out on the mailing list algorithm of selection of the remote node to query is defined in mnesia_lib:set_remote_where_to_read/2. It is the following

set_remote_where_to_read(Tab, Ignore) ->
    Active = val({Tab, active_replicas}),
    Valid =
       case mnesia_recover:get_master_nodes(Tab) of
           [] ->  Active;
           Masters -> mnesia_lib:intersect(Masters, Active)
       end,
    Available = mnesia_lib:intersect(val({current, db_nodes}), Valid -- Ignore),
    DiscOnlyC = val({Tab, disc_only_copies}),
    Prefered  = Available -- DiscOnlyC,
    if
       Prefered /= [] ->
           set({Tab, where_to_read}, hd(Prefered));
       Available /= [] ->
           set({Tab, where_to_read}, hd(Available));
       true ->
           set({Tab, where_to_read}, nowhere)
    end.

So it gets the list of active_replicas (i.e. list of candidates), optionally shrinks the list to master nodes for the table, remove tables to be ignored (for any reason), shrinks the list to currently connected nodes and then selects in the following order:

  1. First non-disc_only_copies
  2. Any available node

The most important part is in fact the list of active_replicas, since it determines the order of nodes in the list of candidates.

List of active_replicas is formed by remote calls of mnesia_controller:add_active_replica/* from newly connected nodes to old nodes (i.e. one which were in the cluster before), which boils down to the function add/1 which adds the item as the head of the list.

Hence answer for the question is - it will choose the most recently connected node.

Notes:
To check out the list of active replicas on the given node you can use this (dirty hack) code:

[ {T,X} || {{T,active_replicas}, X} <- ets:tab2list(mnesia_gvar) ]. 
又怨 2024-07-23 16:31:33

那么,节点 C 需要联系节点 A 或节点 B 才能进行查询。 因此,节点 C 必须自行决定在哪个表副本上执行查询。

如果您需要更多的东西,您可能需要一些算法来决定在哪个节点上查询,或者甚至在节点 C 上复制表(这通常取决于您想要/需要什么样的特征)。

如果节点 A 和节点 B 形成数据库集群或者是数据库集群的一部分,那么循环算法(或如您所建议的那样是随机的)可能是一个好的开始。

Well, node C would need to contact either node A or node B in order to do a query. Thus node C will have to decide itself which table copy to execute the query on.

If you need something more than this you would either need to have some algorithm which will decide which node to query on, or even replicate the table on node C (this would typically depend on what kind of characteristics you want / need).

If node A and node B form or are part of a database cluster, a good start is probably the round robin algorithm (or random, as you suggest).

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文