在 Windows XP 上运行 OpenMPI

发布于 2024-09-02 15:52:45 字数 3200 浏览 2 评论 0原文

我正在尝试构建一个基于Windows XP 的简单集群。我成功编译了 OpenMPI-1.4.2,并且 mpiccompi_info 等工具也可以工作,但我无法让我的 mpirun 正常工作。我能看到的唯一输出是

Z:\>orterun --hostfile z:\hosts.txt -np 2 hostname
[host0:04728] Failed to initialize COM library. Error code = -2147417850
[host0:04728] [[8946,0],0] ORTE_ERROR_LOG: Error in file ..\..\openmpi-1.4.2
\orte\mca\ess\hnp\ess_hnp_module.c at line 218
--------------------------------------------------------------------------
It looks like orte_init failed for some reason; your parallel process is
likely to abort.  There are many reasons that a parallel process can
fail during orte_init; some of which are due to configuration or
environment problems.  This failure appears to be an internal failure;
here's some additional information (which may only be relevant to an
Open MPI developer):

  orte_plm_init failed
  --> Returned value Error (-1) instead of ORTE_SUCCESS
--------------------------------------------------------------------------
[host0:04728] [[8946,0],0] ORTE_ERROR_LOG: Error in file ..\..\openmpi-1.4.2
\orte\runtime\orte_init.c at line 132
--------------------------------------------------------------------------
It looks like orte_init failed for some reason; your parallel process is
likely to abort.  There are many reasons that a parallel process can
fail during orte_init; some of which are due to configuration or
environment problems.  This failure appears to be an internal failure;
here's some additional information (which may only be relevant to an
Open MPI developer):

  orte_ess_set_name failed
  --> Returned value Error (-1) instead of ORTE_SUCCESS
--------------------------------------------------------------------------
[host0:04728] [[8946,0],0] ORTE_ERROR_LOG: Error in file ..\..\..\..\openmpi
-1.4.2\orte\tools\orterun\orterun.c at line 543

z:\hosts.txt 如下所示:

host0
host1

Z: 是一个可供 host0 和 host1 使用的共享网络驱动器。

我的问题是什么以及如何解决它?


更新: 好的,这个问题似乎已经解决了。在我看来,WideCap 驱动程序和/或软件组件会导致出现此错误。 “干净”的机器成功运行本地任务。不管怎样,我仍然无法在至少 2 台机器上运行任务,我收到以下消息:

Z:\>mpirun --hostfile z:\hosts.txt -np 2 hostname
connecting to host1
username:MAIN\cluster
password:********
Save Credential?(Y/N) y
[host0:04728] This feature hasn't been implemented yet.
[host0:04728] Could not connect to namespace cimv2 on node host1. Error code =-2147217400
--------------------------------------------------------------------------
mpirun was unable to start the specified application as it encountered an error.
More information may be available above.
--------------------------------------------------------------------------

我用 google 搜索了一下并按照此处所述执行了所有操作: http://www.open-mpi.org/community/lists/users/2010/03/12355.php 但我仍然遇到同样的错误。有人可以帮助我吗?


更新2: 错误代码 -2147217400 可能是 WMI 错误 WBEM_E_INVALID_PARAMETER (0x80041008),当传递到 WMI 调用的参数之一不正确时,就会发生该错误。这是否意味着问题出在 OpenMPI 源代码本身?或者可能是因为我在从源代码构建 OpenMPI 时使用了错误/过时的 wincred.hcredui.lib

I'm trying to build a simple cluster based on Windows XP. I compiled OpenMPI-1.4.2 successfully, and tools like mpicc and ompi_info work too, but I can't get my mpirun working properly. The only output I can see is

Z:\>orterun --hostfile z:\hosts.txt -np 2 hostname
[host0:04728] Failed to initialize COM library. Error code = -2147417850
[host0:04728] [[8946,0],0] ORTE_ERROR_LOG: Error in file ..\..\openmpi-1.4.2
\orte\mca\ess\hnp\ess_hnp_module.c at line 218
--------------------------------------------------------------------------
It looks like orte_init failed for some reason; your parallel process is
likely to abort.  There are many reasons that a parallel process can
fail during orte_init; some of which are due to configuration or
environment problems.  This failure appears to be an internal failure;
here's some additional information (which may only be relevant to an
Open MPI developer):

  orte_plm_init failed
  --> Returned value Error (-1) instead of ORTE_SUCCESS
--------------------------------------------------------------------------
[host0:04728] [[8946,0],0] ORTE_ERROR_LOG: Error in file ..\..\openmpi-1.4.2
\orte\runtime\orte_init.c at line 132
--------------------------------------------------------------------------
It looks like orte_init failed for some reason; your parallel process is
likely to abort.  There are many reasons that a parallel process can
fail during orte_init; some of which are due to configuration or
environment problems.  This failure appears to be an internal failure;
here's some additional information (which may only be relevant to an
Open MPI developer):

  orte_ess_set_name failed
  --> Returned value Error (-1) instead of ORTE_SUCCESS
--------------------------------------------------------------------------
[host0:04728] [[8946,0],0] ORTE_ERROR_LOG: Error in file ..\..\..\..\openmpi
-1.4.2\orte\tools\orterun\orterun.c at line 543

Where z:\hosts.txt appears as follows:

host0
host1

Z: is a shared network drive available to both host0 and host1.

What my problem is and how do I fix it?


Upd:
Ok, this problem seems to be fixed. It seems to me that WideCap driver and/or software components causes this error to appear. A "clean" machine runs local task successfully. Anyway, I still cannot run a task within at least 2 machines, I'm getting following message:

Z:\>mpirun --hostfile z:\hosts.txt -np 2 hostname
connecting to host1
username:MAIN\cluster
password:********
Save Credential?(Y/N) y
[host0:04728] This feature hasn't been implemented yet.
[host0:04728] Could not connect to namespace cimv2 on node host1. Error code =-2147217400
--------------------------------------------------------------------------
mpirun was unable to start the specified application as it encountered an error.
More information may be available above.
--------------------------------------------------------------------------

I googled a little and did all the things as described here: http://www.open-mpi.org/community/lists/users/2010/03/12355.php but I'm still getting the same error. Can anyone help me?


Upd2:
Error code -2147217400 might be WMI error WBEM_E_INVALID_PARAMETER (0x80041008) which occures when one of the parameters passed to the WMI call is not correct. Does this mean that the problem is in OpenMPI source code itself? Or maybe it's because of wrong/outdated wincred.h and credui.lib I used while building OpenMPI from the source code?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。
列表为空,暂无数据
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文