Erlang：分布式应用程序奇怪的行为

发布于 2024-11-16 04:28:25 字数 3706 浏览 2 评论 0原文

我使用分布式 erlang 应用程序进行支付。

配置和思路取自：
http://www.erlang.org/doc/pdf/otp-system-documentation.pdf 9.9。分布式应用程序

我们有 3 个节点：n1@a2-X201、n2@a2-X201、n3@a2-X201
我们有应用程序 wd 执行一些有用的工作:)

配置文件：

wd1 .config - 对于第一个节点：

      [{kernel,
          [{distributed,[{wd,5000,['n1@a2-X201',{'n2@a2-X201','n3@a2-X201'}]}]},
           {sync_nodes_mandatory,['n2@a2-X201','n3@a2-X201']},
           {sync_nodes_timeout,5000}
        ]}
      ,{sasl, [
      %% All reports go to this file
      {sasl_error_logger,{file,"/tmp/wd_n1.log"}}
      ]
    }].

wd2.config 对于第二个节点：

    [{kernel,
        [{distributed,[{wd,5000,['n1@a2-X201',{'n2@a2-X201','n3@a2-X201'}]}]},
         {sync_nodes_mandatory,['n1@a2-X201','n3@a2-X201']},
         {sync_nodes_timeout,5000}
         ]
     }
    ,{sasl, [
        %% All reports go to this file
        {sasl_error_logger,{file,"/tmp/wd_n2.log"}}
    ]
    }].

对于节点 n3 看起来类似。

现在在3个单独的终端中启动erlang：

erl -sname n1@a2-X201 -config wd1 -pa $WD_EBIN_PATH -boot start_sasl
erl -sname n2@a2-X201 -config wd2 -pa $WD_EBIN_PATH -boot start_sasl
erl -sname n3@a2 -X201 -config wd3 -pa $WD_EBIN_PATH -boot start_sasl

在每个 erlang 节点上启动应用程序： * 应用程序：启动（wd）。

(n1@a2-X201)1> application:start(wd).

=INFO REPORT==== 19-Jun-2011::15:42:51 ===
wd_plug_server starting... PluginId: 4 Path: "/home/a2/src/erl/data/SIG" FileMask: "(?i)(.*)\\.SIG$" 
ok

(n2@a2-X201)1> application:start(wd).
ok
(n2@a2-X201)2>

(n3@a2-X201)1> application:start(wd).
ok
(n3@a2-X201)2>

目前一切正常。正如 Erlang 文档中所写：应用程序正在节点 n1@a2-X201 上运行

，现在杀死节点 n1：应用程序已迁移到n2

(n2@a2-X201)2> 
=INFO REPORT==== 19-Jun-2011::15:46:28 ===
wd_plug_server starting... PluginId: 4 Path: "/home/a2/src/erl/data/SIG" FileMask: "(?i)(.*)\\.SIG$"

继续我们的游戏：杀死节点n2 多一个时间系统工作正常。我们的应用程序位于节点n3，

(n3@a2-X201)2> 
=INFO REPORT==== 19-Jun-2011::15:48:18 ===
wd_plug_server starting... PluginId: 4 Path: "/home/a2/src/erl/data/SIG" FileMask: "(?i)(.*)\\.SIG$"

现在恢复节点n1和n2。所以：

Erlang R14B (erts-5.8.1) [source] [smp:4:4] [rq:4] [async-threads:0] [hipe] [kernel-poll:false]

Eshell V5.8.1  (abort with ^G)
(n1@a2-X201)1> 

Eshell V5.8.1  (abort with ^G)
(n2@a2-X201)1>

节点 n1 和 n2 又回来了。
看起来现在我必须手动重新启动应用程序： * 让我们首先在节点n2 上执行此操作：

(n2@a2-X201)1> application:start(wd).

看起来它已挂起...
现在在n1 上重新启动它，

(n1@a2-X201)1> application:start(wd).

=INFO REPORT==== 19-Jun-2011::15:55:43 ===
wd_plug_server starting... PluginId: 4 Path: "/home/a2/src/erl/data/SIG" FileMask: "(?i)(.*)\\.SIG$" 

ok
(n1@a2-X201)2>

它可以工作。并且节点 n2 也返回了 OK：

Eshell V5.8.1  (abort with ^G)
(n2@a2-X201)1> application:start(wd).
ok
(n2@a2-X201)2>

在节点 n3 处，我们看到：

=INFO REPORT==== 19-Jun-2011::15:55:43 ===
    application: wd
    exited: stopped
    type: temporary

一般来说，一切看起来都正常，如文档中所写，除了在节点 n3 处启动应用程序时出现延迟>n2。

现在再次杀死节点n1：

(n1@a2-X201)2> 
User switch command
 --> q
[a2@a2-X201 releases]$

Ops ...一切都挂起。应用程序未在另一个节点重新启动。

事实上，当我写这篇文章时，我意识到有时一切都很好，有时我遇到了问题。

有什么想法吗？虽然恢复“主”节点并再次杀死它时可能会出现问题？

原文

I'm paying with distributed erlang applications.

Configuration and ideas are taken from:
http:/www.erlang.org/doc/pdf/otp-system-documentation.pdf 9.9. Distributed Applications

We have 3 nodes: n1@a2-X201, n2@a2-X201, n3@a2-X201
We have application wd that do some useful job :)

Configuration files:

wd1.config - for the first node:

      [{kernel,
          [{distributed,[{wd,5000,['n1@a2-X201',{'n2@a2-X201','n3@a2-X201'}]}]},
           {sync_nodes_mandatory,['n2@a2-X201','n3@a2-X201']},
           {sync_nodes_timeout,5000}
        ]}
      ,{sasl, [
      %% All reports go to this file
      {sasl_error_logger,{file,"/tmp/wd_n1.log"}}
      ]
    }].

wd2.config for the second:

    [{kernel,
        [{distributed,[{wd,5000,['n1@a2-X201',{'n2@a2-X201','n3@a2-X201'}]}]},
         {sync_nodes_mandatory,['n1@a2-X201','n3@a2-X201']},
         {sync_nodes_timeout,5000}
         ]
     }
    ,{sasl, [
        %% All reports go to this file
        {sasl_error_logger,{file,"/tmp/wd_n2.log"}}
    ]
    }].

For the node n3 looks similar.

Now start erlang in 3 separate terminals:

erl -sname n1@a2-X201 -config wd1 -pa $WD_EBIN_PATH -boot start_sasl
erl -sname n2@a2-X201 -config wd2 -pa $WD_EBIN_PATH -boot start_sasl
erl -sname n3@a2-X201 -config wd3 -pa $WD_EBIN_PATH -boot start_sasl

Start application on each of erlang nodes:
* application:start(wd).

(n1@a2-X201)1> application:start(wd).

=INFO REPORT==== 19-Jun-2011::15:42:51 ===
wd_plug_server starting... PluginId: 4 Path: "/home/a2/src/erl/data/SIG" FileMask: "(?i)(.*)\\.SIG$" 
ok

(n2@a2-X201)1> application:start(wd).
ok
(n2@a2-X201)2>

(n3@a2-X201)1> application:start(wd).
ok
(n3@a2-X201)2>

At the moment everything is Ok. As written in Erlang documentation: Application is running at node n1@a2-X201

Now kill node n1:
Application was migrated to n2

(n2@a2-X201)2> 
=INFO REPORT==== 19-Jun-2011::15:46:28 ===
wd_plug_server starting... PluginId: 4 Path: "/home/a2/src/erl/data/SIG" FileMask: "(?i)(.*)\\.SIG$"

Continue our game: kill node n2
One more time system works fine. We have our application at node n3

(n3@a2-X201)2> 
=INFO REPORT==== 19-Jun-2011::15:48:18 ===
wd_plug_server starting... PluginId: 4 Path: "/home/a2/src/erl/data/SIG" FileMask: "(?i)(.*)\\.SIG$"

Now restore nodes n1 and n2.
So:

Erlang R14B (erts-5.8.1) [source] [smp:4:4] [rq:4] [async-threads:0] [hipe] [kernel-poll:false]

Eshell V5.8.1  (abort with ^G)
(n1@a2-X201)1> 

Eshell V5.8.1  (abort with ^G)
(n2@a2-X201)1>

Nodes n1 and n2 are back.
Looks like now I have to restart application manually:
* Let's do it at node n2 first:

(n2@a2-X201)1> application:start(wd).

Looks like it hanged ...
Now restart it at n1

(n1@a2-X201)1> application:start(wd).

=INFO REPORT==== 19-Jun-2011::15:55:43 ===
wd_plug_server starting... PluginId: 4 Path: "/home/a2/src/erl/data/SIG" FileMask: "(?i)(.*)\\.SIG$" 

ok
(n1@a2-X201)2>

It works. And node n2 also has returned OK:

Eshell V5.8.1  (abort with ^G)
(n2@a2-X201)1> application:start(wd).
ok
(n2@a2-X201)2>

At node n3 we see:

=INFO REPORT==== 19-Jun-2011::15:55:43 ===
    application: wd
    exited: stopped
    type: temporary

In general, everything looks ok, as written in documentation, except for delay with starting application at node n2.

Now kill node n1 once more:

(n1@a2-X201)2> 
User switch command
 --> q
[a2@a2-X201 releases]$

Ops ... everything hangs. Application was not restarted at another node.

Actually, while I was writing this post I've realized that sometime everything id Ok, sometime I have a problem.

Any ideas, While there could be problems when restoring "primary" node nd killing it one more time?

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

兮子 2024-11-23 04:28:25

正如在 Learn You Some Erlang 中所解释的（滚动到底部），分布式应用程序只有在以下情况下才能正常工作：作为版本的一部分启动，而不是在您使用 application:start 手动启动它们时启动。

回复收藏 0 原文

榆西 2024-11-23 04:28:25

您所看到的奇怪现象很可能与您在节点 n1/n2 上完全重新启动应用程序有关，而 n3 仍在初始应用程序初始化下运行。

如果您的应用程序启动任何系统范围的进程并使用它们的 pid，而不是使用全局设置的注册名称，例如 pg 或 pg2，那么您可能正在使用两组全局状态。

如果是这种情况，建议采取的方法是专注于从现有应用程序中添加/删除节点，而不是重新启动整个应用程序。通过这种方式，节点可以离开并加入到一组现有的初始化值中。

回复收藏 0 原文

~没有更多了~

关于作者

最美不过初阳

暂无简介

0 文章

0 评论

25 人气

关注发私信

友情链接

文江博客

Erlang：分布式应用程序奇怪的行为

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（2）

关于作者

相关话题

热门标签

推荐作者

離殇

小姐丶请自重

Aik

国产ˉ祖宗

猥琐帝

半仙

友情链接

Erlang：分布式应用程序奇怪的行为

如果你对这篇内容有疑问，欢迎到本站社区发帖提问 参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（2）

关于作者

相关话题

热门标签

推荐作者

離殇

小姐丶请自重

Aik

国产ˉ祖宗

猥琐帝

半仙

友情链接

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。