Perl:如何添加中断处理程序,以便可以通过 system() 控制 mpirun 执行的代码?
我们使用带有 Perceus (warewulf) 软件的集群来进行一些计算。该软件包有wwmpirun
程序(一个Perl脚本) 准备主机文件并执行 mpirun
:
# ...
system("$mpirun -hostfile $tmp_hostfile -np $mpirun_np @ARGV");
# ...
我们使用此脚本在多个节点上运行数学程序 (CODE),并且 CODE 通常应该通过 Ctrl+C 停止,并提供一个带有选项的简短菜单:状态、停止和停止。然而,使用 MPI 运行时,按 Ctrl+C 会严重杀死 CODE,导致数据丢失。
CODE 的开发人员建议了一种解决方法 - 可以通过创建名为 stop%s
的文件来停止程序,其中 %s
是 CODE 正在执行的任务文件的名称。这允许停止,但我们无法获得计算状态。有时需要很长时间才能恢复此功能,我们将不胜感激。
您认为问题出在 CODE 还是 mpirun
中?
能否找到一种与 mpirun
执行的 CODE 进行通信的方法?
UPDATE1
在单次运行中,通过按 Ctrl+C 并通过输入 s
在提供的菜单中选择选项 status
来获取计算状态。 CODE 在 STDOUT 中打印状态信息并继续进行计算。
We use a cluster with Perceus (warewulf) software to do some computing. This software package has wwmpirun
program (a Perl script) to prepare a hostfile and execute mpirun
:
# ...
system("$mpirun -hostfile $tmp_hostfile -np $mpirun_np @ARGV");
# ...
We use this script to run a math program (CODE) on several nodes, and CODE is normally supposed to be stopped by Ctrl+C giving a short menu with options: status, stop, and halt. However, running with MPI, pressing Ctrl+C badly kills CODE with loss of data.
Developers of CODE suggest a workaround - the program can be stopped by creating a file with name stop%s
, where %s
is name of task-file being executed by CODE. This allows to stop, but we cannot get status of calculation. Sometimes it takes really long time and getting this function back would be very appreciated.
What do you think - the problem is in CODE or mpirun
?
Can one find a way to communicate with CODE executed by mpirun
?
UPDATE1
In single run, one gets status of calculation by pressing Ctrl+C and choosing option status
in the provided menu by entering s
. CODE prints status information in STDOUT and continues to do the calculation.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
“我们无法获得计算状态” - 这是什么意思?您是否期望以某种方式获得该地位但没有?或者该软件的设计目的不是为您提供地位?
您的
系统
调用不会将标准错误/输出重定向到任何地方,这是状态应该在的地方(在这种情况下,通过打开管道或重定向到来捕获它)日志并让包装器读取日志)。此外,您没有通过评估
系统
调用的返回值来处理返回代码 - 这可能是程序通信的另一种方式。您的 Ctrl+C 问题可能是因为 Ctrl+C 被 Perl 包装器捕获,该包装器死掉了,而不是被具有一些不错的 Ctrl+C 中断处理程序的 CODE 捕获。解决方案可能是将中断处理程序添加到
mpirun
- 请参阅 Perl Cookbook Recipe 16.18 了解$SIG{INT}
或 http://www.wellho.net/resources/ex.php4?item=p216/sigint ;您可能想让 Perl 包装器捕获Ctrl+C
并将 INT 信号发送到它启动的 CODE。"we cannot get status of calculation" - what does that mean? do you expect to get the status somehow but are not? or is the software not designed to give you status?
Your
system
call doesn't re-direct standard error/out anyplace, is that where the status is supposed to be (in which case, catch it by opening a pipe or re-directing to a log and having the wrapper read the log).Also, you're not processing the return code by evaluating the return value of
system
call - that may be another way the program communicates.Your Ctrl+C problem might be because Ctrl+C is caught by the Perl wrapper which dies instead of by the CODE which has some nice Ctrl+C interrupt handler. The solution might be to add interrupt handler to
mpirun
- see Perl Cookbook Recipe 16.18 for$SIG{INT}
or http://www.wellho.net/resources/ex.php4?item=p216/sigint ; you may want to have the Perl wrapper catchCtrl+C
and send the INT signal to CODE it launched.