“无法找到指定的可执行文件”当尝试在 Julia 上使用 mpirun 时
我正在尝试在集群的多个节点上运行我的 julia 代码,该代码使用 Moab 和 Torque 作为调度程序和资源管理器。 在我请求 3 个节点的交互式会话中,我加载 julia 和 openmpi 模块并运行:
mpirun -np 72 --hostfile $PBS_NODEFILE -display-allocation julia --project=. "./estimation/test.jl"
mpirun 确实成功识别了我的 3 个节点,因为它显示:
====================== ALLOCATED NODES ======================
comp-bc-0383: slots=24 max_slots=0 slots_inuse=0 state=UP
comp-bc-0378: slots=24 max_slots=0 slots_inuse=0 state=UNKNOWN
comp-bc-0372: slots=24 max_slots=0 slots_inuse=0 state=UNKNOWN
=================================================================
但是,之后它返回一条错误消息
--------------------------------------------------------------------------
mpirun was unable to find the specified executable file, and therefore
did not launch the job. This error was first reported for process
rank 48; it may have occurred for other processes as well.
NOTE: A common cause for this error is misspelling a mpirun command
line parameter option (remember that mpirun interprets the first
unrecognized command line token as the executable).
Node: comp-bc-0372
Executable: /opt/aci/sw/julia/1.5.3_gcc-4.8.5-ips/bin/julia
--------------------------------------------------------------------------
可能的原因是什么?是因为从其他节点访问 Julia 时遇到问题吗? (我认为是这种情况,因为代码只要 -np X 运行,其中 x <= 24,这是一个节点的槽数;一旦 x >= 25,它就无法运行)
I am trying to run my julia code on multiple nodes of a cluster, which uses Moab and Torque for the scheduler and resource manager.
In an interactive session where I requested 3 nodes, I load julia and openmpi modules and run:
mpirun -np 72 --hostfile $PBS_NODEFILE -display-allocation julia --project=. "./estimation/test.jl"
The mpirun does successfully recognize my 3 nodes since it displays:
====================== ALLOCATED NODES ======================
comp-bc-0383: slots=24 max_slots=0 slots_inuse=0 state=UP
comp-bc-0378: slots=24 max_slots=0 slots_inuse=0 state=UNKNOWN
comp-bc-0372: slots=24 max_slots=0 slots_inuse=0 state=UNKNOWN
=================================================================
However, after that it returns an error message
--------------------------------------------------------------------------
mpirun was unable to find the specified executable file, and therefore
did not launch the job. This error was first reported for process
rank 48; it may have occurred for other processes as well.
NOTE: A common cause for this error is misspelling a mpirun command
line parameter option (remember that mpirun interprets the first
unrecognized command line token as the executable).
Node: comp-bc-0372
Executable: /opt/aci/sw/julia/1.5.3_gcc-4.8.5-ips/bin/julia
--------------------------------------------------------------------------
What could be the possible cause of this? Is it because it has trouble accessing julia from other nodes? (I think this is the case because the code runs as long as -np X where x <= 24, which is the number of slots for one node; as soon as x >= 25, it fails to run)
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
data:image/s3,"s3://crabby-images/d5906/d59060df4059a6cc364216c4d63ceec29ef7fe66" alt="扫码二维码加入Web技术交流群"
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
这是一个关于如何使用模块和 mpirun 的好手册。使用MPIstacksWithModules
总结一下手册中写的内容:
您需要的是使用
-x PATH -x LD_LIBRARY_PATH
导出mpirun
命令中的环境变量。要查看这是否有效,您可以运行另外,您应该考虑提供要运行的文件的完整路径,因此
/path/to/estimation/test.jl
而不是。 /estimation/test.jl
因为每个节点中的工作目录都不相同。 (一般来说,使用完整路径总是更安全)。通过使用完整路径,您还应该能够使用
/path/to/julia
(即which julia
的输出),而不仅仅是julia
code>,这样就不需要导出环境变量了。Here a good manual how to work with modules and
mpirun
. UsingMPIstacksWithModulesTo sum it up with what is written in the manual:
What you need is to export the environment variables in your
mpirun
command with-x PATH -x LD_LIBRARY_PATH
. To see if this worked you can then runAlso, you should consider giving the whole path of the file you want to run, so
/path/to/estimation/test.jl
instead of./estimation/test.jl
since your working directory is not the same in every node. (In general it is always safer to use whole paths).By using whole paths, you should also be able to use
/path/to/julia
(that is the output ofwhich julia
) instead of onlyjulia
, this way you should not need to export the environment variables.