#dask本地流产成功完成之前完成

发布于 2025-02-09 21:18:30 字数 1235 浏览 0 评论 0原文

我运行的命令如下。

mpirun --hostfile /home/user/share/hostlist.txt -np 4 /home/user/share/mpi-dask/venv/bin/dask-mpi --scheduler-file ~/dask-scheduler.json

我得到的结果如下。

*** An error occurred in MPI_Init_thread  
*** on a NULL communicator  
*** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort,  
***    and potentially your MPI job)  
[rpi40000:14497] Local abort 

在MPI_Init完成之前,成功完成,但无法汇总错误消息,也无法确保所有其他过程都被杀死!

2022-06-23 06:40:12,321 - distributed.nanny - INFO - Worker process 14497 exited with status 1  
2022-06-23 06:40:12,324 - distributed.nanny - WARNING - Restarting worker  
^C[rpi40000:14416] PMIX ERROR: UNREACHABLE in file ../../../src/server/pmix_server.c at line 2795
[rpi40000:14416] 8 more processes have sent help message help-orte-runtime.txt /   orte_init:startup:internal-failure  
[rpi40000:14416] Set MCA parameter "orte_base_help_aggregate" to 0 to see all help / error messages  
[rpi40000:14416] 5 more processes have sent help message help-orte-runtime / orte_init:startup:internal-failure  
[rpi40000:14416] 5 more processes have sent help message help-mpi-runtime.txt / mpi_init:startup:internal-failure

I run command as follows.

mpirun --hostfile /home/user/share/hostlist.txt -np 4 /home/user/share/mpi-dask/venv/bin/dask-mpi --scheduler-file ~/dask-scheduler.json

I got result as follows.

*** An error occurred in MPI_Init_thread  
*** on a NULL communicator  
*** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort,  
***    and potentially your MPI job)  
[rpi40000:14497] Local abort 

before MPI_INIT completed completed successfully, but am not able to aggregate error messages, and not able to guarantee that all other processes were killed!

2022-06-23 06:40:12,321 - distributed.nanny - INFO - Worker process 14497 exited with status 1  
2022-06-23 06:40:12,324 - distributed.nanny - WARNING - Restarting worker  
^C[rpi40000:14416] PMIX ERROR: UNREACHABLE in file ../../../src/server/pmix_server.c at line 2795
[rpi40000:14416] 8 more processes have sent help message help-orte-runtime.txt /   orte_init:startup:internal-failure  
[rpi40000:14416] Set MCA parameter "orte_base_help_aggregate" to 0 to see all help / error messages  
[rpi40000:14416] 5 more processes have sent help message help-orte-runtime / orte_init:startup:internal-failure  
[rpi40000:14416] 5 more processes have sent help message help-mpi-runtime.txt / mpi_init:startup:internal-failure

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。
列表为空,暂无数据
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文