在旧群集中使用Slurm(和PBS)的Oneapi的Mpirun错误

发布于 2025-01-31 02:06:53 字数 3074 浏览 5 评论 0原文

最近,我安装了Intel Oneapi,包括C编译器,Fortran编译器和MPI库,并遵守了VASP。

在提出问题之前,我需要在安装VASP时需要澄清一些技巧:

  1. Glibc2.14:群集是一台具有GLIBC版本2.12的旧机器,其中Oneapi需要2.14版。因此,我编译了glibc2.14并导出ld_path:导出ld_library_path =“〜/mySoft/mysoft/glibc214/lib:$ ld_library_path”
  2. ld 2.24:群集,虽然需要更高的版本。因此,我安装了Binutils 2.24。

有一台主计算机与群集中的30个计算节点相连。计算以3种方式执行:

  1. 当我在主中进行计算时,完全可以。
  2. 当我用rsh命令手动登录节点时,登录节点中的计算也没有问题。
  3. 但是通常我从主(带有slurm或pbs)提交计算脚本,然后在节点中进行计算。在那种情况下,我遇到了以下错误消息:
[[email protected]] poll_for_event (../../../../../src/pm/i_hydra/libhydra/demux/hydra_demux_poll.c:159): check exit codes error
[[email protected]] HYD_dmx_poll_wait_for_proxy_event (../../../../../src/pm/i_hydra/libhydra/demux/hydra_demux_poll.c:212): poll for event error
[[email protected]] HYD_bstrap_setup (../../../../../src/pm/i_hydra/libhydra/bstrap/src/intel/i_hydra_bstrap.c:1062): error waiting for event
[[email protected]] HYD_print_bstrap_setup_error_message (../../../../../src/pm/i_hydra/mpiexec/intel/i_mpiexec.c:1015): error setting up the bootstrap proxies
[[email protected]] Possible reasons:
[[email protected]] 1. Host is unavailable. Please check that all hosts are available.
[[email protected]] 2. Cannot launch hydra_bstrap_proxy or it crashed on one of the hosts. Make sure hydra_bstrap_proxy is available on all hosts and it has right permissions.
[[email protected]] 3. Firewall refused connection. Check that enough ports are allowed in the firewall and specify them with the I_MPI_PORT_RANGE variable.
[[email protected]] 4. pbs bootstrap cannot launch processes on remote host. You may try using -bootstrap option to select alternative launcher.

我只通过ONEAPI编译的代码遇到了此错误,但Intel®Parallel Studio XE已编译。您对这个错误有任何想法吗?您的回应将不胜感激。

最好,

莱昂

Recently I installed Intel OneAPI including c compiler, FORTRAN compiler and mpi library and complied VASP with it.

Before presenting the question, there are some tricks I need to clarify during the installation of VASP:

  1. GLIBC2.14: the cluster is an old machine with a glibc version of 2.12, where OneAPI needs a version of 2.14. So I compile the GLIBC2.14 and export the ld_path: export LD_LIBRARY_PATH="~/mysoft/glibc214/lib:$LD_LIBRARY_PATH"
  2. ld 2.24: The ld version is 2.20 in the cluster, while a higher version is needed. So I installed binutils 2.24.

There is one master computer connected with 30 calculating nodes in the cluster. The calculation is executed with 3 ways:

  1. When I do the calculation in the master, it's totally OK.
  2. When I login the nodes manually with rsh command, the calculation in the logged node is also no problem.
  3. But usually I submit the calculation script from the master (with slurm or pbs), and then do the calculation in the node. In that case, I met following error message:
[[email protected]] poll_for_event (../../../../../src/pm/i_hydra/libhydra/demux/hydra_demux_poll.c:159): check exit codes error
[[email protected]] HYD_dmx_poll_wait_for_proxy_event (../../../../../src/pm/i_hydra/libhydra/demux/hydra_demux_poll.c:212): poll for event error
[[email protected]] HYD_bstrap_setup (../../../../../src/pm/i_hydra/libhydra/bstrap/src/intel/i_hydra_bstrap.c:1062): error waiting for event
[[email protected]] HYD_print_bstrap_setup_error_message (../../../../../src/pm/i_hydra/mpiexec/intel/i_mpiexec.c:1015): error setting up the bootstrap proxies
[[email protected]] Possible reasons:
[[email protected]] 1. Host is unavailable. Please check that all hosts are available.
[[email protected]] 2. Cannot launch hydra_bstrap_proxy or it crashed on one of the hosts. Make sure hydra_bstrap_proxy is available on all hosts and it has right permissions.
[[email protected]] 3. Firewall refused connection. Check that enough ports are allowed in the firewall and specify them with the I_MPI_PORT_RANGE variable.
[[email protected]] 4. pbs bootstrap cannot launch processes on remote host. You may try using -bootstrap option to select alternative launcher.

I only met this error with oneAPI compiled codes but Intel® Parallel Studio XE compiled. Do you have any idea of this error? Your response will be highly appreciated.

Best,

Léon

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

一杯敬自由 2025-02-07 02:06:53

可能是权限错误,而Slurm代理没有正确的权限或库路径?

Could it be a permissions error with the Slurm agent not having the correct permissions or library path?

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文