“无法找到指定的可执行文件”当尝试在 Julia 上使用 mpirun 时

发布于 2025-01-15 16:45:01 字数 1434 浏览 5 评论 0原文

我正在尝试在集群的多个节点上运行我的 julia 代码,该代码使用 Moab 和 Torque 作为调度程序和资源管理器。 在我请求 3 个节点的交互式会话中,我加载 julia 和 openmpi 模块并运行:

mpirun -np 72 --hostfile $PBS_NODEFILE -display-allocation julia --project=.  "./estimation/test.jl"

mpirun 确实成功识别了我的 3 个节点,因为它显示:


======================   ALLOCATED NODES   ======================
        comp-bc-0383: slots=24 max_slots=0 slots_inuse=0 state=UP
        comp-bc-0378: slots=24 max_slots=0 slots_inuse=0 state=UNKNOWN
        comp-bc-0372: slots=24 max_slots=0 slots_inuse=0 state=UNKNOWN
=================================================================

但是,之后它返回一条错误消息

--------------------------------------------------------------------------
mpirun was unable to find the specified executable file, and therefore
did not launch the job.  This error was first reported for process
rank 48; it may have occurred for other processes as well.

NOTE: A common cause for this error is misspelling a mpirun command
      line parameter option (remember that mpirun interprets the first
      unrecognized command line token as the executable).

Node:       comp-bc-0372
Executable: /opt/aci/sw/julia/1.5.3_gcc-4.8.5-ips/bin/julia
--------------------------------------------------------------------------

可能的原因是什么?是因为从其他节点访问 Julia 时遇到问题吗? (我认为是这种情况,因为代码只要 -np X 运行,其中 x <= 24,这是一个节点的槽数;一旦 x >= 25,它就无法运行)

I am trying to run my julia code on multiple nodes of a cluster, which uses Moab and Torque for the scheduler and resource manager.
In an interactive session where I requested 3 nodes, I load julia and openmpi modules and run:

mpirun -np 72 --hostfile $PBS_NODEFILE -display-allocation julia --project=.  "./estimation/test.jl"

The mpirun does successfully recognize my 3 nodes since it displays:


======================   ALLOCATED NODES   ======================
        comp-bc-0383: slots=24 max_slots=0 slots_inuse=0 state=UP
        comp-bc-0378: slots=24 max_slots=0 slots_inuse=0 state=UNKNOWN
        comp-bc-0372: slots=24 max_slots=0 slots_inuse=0 state=UNKNOWN
=================================================================

However, after that it returns an error message

--------------------------------------------------------------------------
mpirun was unable to find the specified executable file, and therefore
did not launch the job.  This error was first reported for process
rank 48; it may have occurred for other processes as well.

NOTE: A common cause for this error is misspelling a mpirun command
      line parameter option (remember that mpirun interprets the first
      unrecognized command line token as the executable).

Node:       comp-bc-0372
Executable: /opt/aci/sw/julia/1.5.3_gcc-4.8.5-ips/bin/julia
--------------------------------------------------------------------------

What could be the possible cause of this? Is it because it has trouble accessing julia from other nodes? (I think this is the case because the code runs as long as -np X where x <= 24, which is the number of slots for one node; as soon as x >= 25, it fails to run)

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

彩虹直至黑白 2025-01-22 16:45:01

这是一个关于如何使用模块和 mpirun 的好手册。使用MPIstacksWithModules

总结一下手册中写的内容:

应该强调的是,模块只不过是管理环境变量的结构化方式;因此,无论模块有什么障碍,环境变量也同样适用。

您需要的是使用 -x PATH -x LD_LIBRARY_PATH 导出 mpirun 命令中的环境变量。要查看这是否有效,您可以运行

mpirun -np 72 --hostfile $PBS_NODEFILE -display-allocation -x PATH -x LD_LIBRARY_PATH which julia

另外,您应该考虑提供要运行的文件的完整路径,因此 /path/to/estimation/test.jl 而不是 。 /estimation/test.jl 因为每个节点中的工作目录都不相同。 (一般来说,使用完整路径总是更安全)。
通过使用完整路径,您还应该能够使用 /path/to/julia (即 which julia 的输出),而不仅仅是 julia code>,这样就不需要导出环境变量了。

Here a good manual how to work with modules and mpirun. UsingMPIstacksWithModules

To sum it up with what is written in the manual:

It should be highlighted that modules are nothing else than a structured way to manage your environment variables; so, whatever hurdles there are about modules, apply equally well about environment variables.

What you need is to export the environment variables in your mpirun command with -x PATH -x LD_LIBRARY_PATH. To see if this worked you can then run

mpirun -np 72 --hostfile $PBS_NODEFILE -display-allocation -x PATH -x LD_LIBRARY_PATH which julia

Also, you should consider giving the whole path of the file you want to run, so /path/to/estimation/test.jl instead of ./estimation/test.jl since your working directory is not the same in every node. (In general it is always safer to use whole paths).
By using whole paths, you should also be able to use /path/to/julia (that is the output of which julia) instead of only julia, this way you should not need to export the environment variables.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文