在一台机器上的 OS X 上使用 mpirun

发布于 2024-10-20 07:28:21 字数 1365 浏览 7 评论 0原文

我在 OS X 上的单机模式下使用 mpirun 时遇到问题。使用 mpirun -np 5 my_program 运行我的程序时,我收到以下错误输出:

[...-MacBook-Pro.local:85936] [0,0,0] ORTE_ERROR_LOG: Timeout in file /SourceCache/openmpi/openmpi-8/openmpi/orte/mca/pls/base/pls_base_orted_cmds.c at line 275
[...-MacBook-Pro.local:85936] [0,0,0] ORTE_ERROR_LOG: Timeout in file /SourceCache/openmpi/openmpi-8/openmpi/orte/mca/pls/rsh/pls_rsh_module.c at line 1158
[...-MacBook-Pro.local:85936] [0,0,0] ORTE_ERROR_LOG: Timeout in file /SourceCache/openmpi/openmpi-8/openmpi/orte/mca/errmgr/hnp/errmgr_hnp.c at line 90
mpirun noticed that job rank 1 with PID 85940 on node ...-MacBook-Pro.local exited on signal 6 (Abort trap). 
2 additional processes aborted (not shown)

显然,默认情况下< code>mpirun 使用 rsh 连接到机器。我尝试使用 ssh 代替,但没有帮助:

mpirun --mca pls_rsh_agent ssh -np 5 my_program

然后,我尝试使用共享内存 (sm) BTL,这也没有帮助:

mpirun --mca btl self,sm -np 5 my_program

最后,我尝试使用计算机文件来指定我只想使用 localhost,但这也没有帮助:

mpirun -np 5 -machinefile machinefile.local my_program

这里,machinefile.local 仅包含 localhost code> 在(单个)第一行。

在上述所有情况下,我都会收到上述超时错误。

另外,我还验证了我的 Mac OS X 防火墙没有运行,并且我可以通过 ssh 登录到我的计算机。

I have trouble using mpirun in single-machine mode on OS X. When running my program using mpirun -np 5 my_program I get the following error output:

[...-MacBook-Pro.local:85936] [0,0,0] ORTE_ERROR_LOG: Timeout in file /SourceCache/openmpi/openmpi-8/openmpi/orte/mca/pls/base/pls_base_orted_cmds.c at line 275
[...-MacBook-Pro.local:85936] [0,0,0] ORTE_ERROR_LOG: Timeout in file /SourceCache/openmpi/openmpi-8/openmpi/orte/mca/pls/rsh/pls_rsh_module.c at line 1158
[...-MacBook-Pro.local:85936] [0,0,0] ORTE_ERROR_LOG: Timeout in file /SourceCache/openmpi/openmpi-8/openmpi/orte/mca/errmgr/hnp/errmgr_hnp.c at line 90
mpirun noticed that job rank 1 with PID 85940 on node ...-MacBook-Pro.local exited on signal 6 (Abort trap). 
2 additional processes aborted (not shown)

Apparently, by default mpirun uses rsh for connecting to machines. I tried using ssh instead, but it didn't help:

mpirun --mca pls_rsh_agent ssh -np 5 my_program

Then, I tried using the shared-memory (sm) BTL, which didn't help either:

mpirun --mca btl self,sm -np 5 my_program

Finally, I tried using a machine file to specify that I only want to use localhost, which didn't help either:

mpirun -np 5 -machinefile machinefile.local my_program

Here, machinefile.local only contains localhost on the (single) first line.

In all of the above cases, I get the above timeout error.

Also, I verified that my Mac OS X firewall wasn't running and that I could ssh into my machine.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

〃安静 2024-10-27 07:28:21

所以看起来您正在使用 fink 的 OpenMPI 版本,是吗? /usr/bin 和 /usr/lib 中是否还有原始的 1.2.x MPI?寻找奇怪的启动问题的第一个地方是 MPI 库的版本冲突。

首先尝试一些简单的东西,比如 /usr/bin/mpirun -np 5 hostname ,然后无论你的 fink mpirun 在哪里,都做同样的事情: /path/to/fink/mpirun -np 5 hostname ,只是为了确保两个 MPI 启动器在非 MPI 程序上工作。然后对my_program执行ldd;它链接到哪些库?对这些库使用适当的 mpirun,并查看是否有效。

So it looks like you're using a version of OpenMPI from fink, is that right? Do you still have the original 1.2.x MPI in /usr/bin and /usr/lib? The first place to look for weird launching issues is conflicting versions of the MPI libraries.

First try something simple like /usr/bin/mpirun -np 5 hostname, and then wherever your fink mpirun is do the same thing: /path/to/fink/mpirun -np 5 hostname, just to make sure the two MPI launchers work on a non-MPI program. Then do an ldd on my_program; which libraries is it linking to? Use the appropriate mpirun for those libraries, and see if that works.

感受沵的脚步 2024-10-27 07:28:21

检查您的防火墙并确保它允许 mpirun 建立入站和出站连接。

Check your firewall and make sure it allows mpirun to establish inbound and outbound connections.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文