多线程 MPI 进程突然终止

发布于 2024-08-24 11:52:38 字数 2109 浏览 13 评论 0原文

我正在编写一个 MPI 程序(Visual Studio 2k8 + MSMPI),它使用 Boost::thread 为每个 MPI 进程生成两个线程,并且遇到了我无法追踪的问题。

当我使用以下命令运行程序时:mpiexec -n 2 program.exe,其中一个进程突然终止:

job aborted:
[ranks] message

[0] terminated

[1] process exited without calling finalize

---- error analysis -----

[1] on winblows
program.exe ended prematurely and may have crashed. exit code 0xc0000005


---- error analysis -----

我不知道为什么第一个进程突然终止,并且无法弄清楚如何跟踪下原因。即使我在所有操作结束时将零级进程放入无限循环中,也会发生这种情况......它会突然死亡。我的主要函数如下所示:

int _tmain(int argc, _TCHAR* argv[])
{
    /* Initialize the MPI execution environment. */
    MPI_Init(0, NULL);

    /* Create the worker threads. */
    boost::thread masterThread(&Master);
    boost::thread slaveThread(&Slave);

    /* Wait for the local test thread to end. */
    masterThread.join();
    slaveThread.join();

    /* Shutdown. */
    MPI_Finalize();
    return 0;
}

masterslave 函数在结束之前执行一些任意工作。我可以确认主线程至少已达到其操作的结尾。从属线程始终是执行中止之前未完成的线程。使用打印语句,似乎从属线程实际上并没有遇到任何错误......它正在愉快地移动,只是在崩溃中被取出。

那么,有没有人有任何想法:
a) 可能是什么原因造成的?
b) 我应该如何调试它?

非常感谢!

编辑:

发布主/从功能的最小版本。请注意,该程序的目标纯粹是为了演示目的......所以它没有做任何有用的事情。本质上,主线程将虚拟有效负载发送到其他 MPI 进程的从属线程。

void Master()
{   
    int  myRank;
    int  numProcs;
    MPI_Comm_size(MPI_COMM_WORLD, &numProcs);
    MPI_Comm_rank(MPI_COMM_WORLD, &myRank);

    /* Create a message with numbers 0 through 39 as the payload, addressed 
     * to this thread. */
    int *payload= new int[40];
    for(int n = 0; n < 40; n++) {
        payload[n] = n;
    }

    if(myRank == 0) {
        MPI_Send(payload, 40, MPI_INT, 1, MPI_ANY_TAG, MPI_COMM_WORLD);
    } else {
        MPI_Send(payload, 40, MPI_INT, 0, MPI_ANY_TAG, MPI_COMM_WORLD);
    }

    /* Free memory. */
    delete(payload);
}

void Slave()
{
    MPI_Status status;
    int *payload= new int[40];
    MPI_Recv(payload, 40, MPI_INT, MPI_ANY_SOURCE, MPI_ANY_TAG, MPI_COMM_WORLD, &status);

    /* Free memory. */
    delete(payload);
}

I'm writing an MPI program (Visual Studio 2k8 + MSMPI) that uses Boost::thread to spawn two threads per MPI process, and have run into a problem I'm having trouble tracking down.

When I run the program with: mpiexec -n 2 program.exe, one of the processes suddenly terminates:

job aborted:
[ranks] message

[0] terminated

[1] process exited without calling finalize

---- error analysis -----

[1] on winblows
program.exe ended prematurely and may have crashed. exit code 0xc0000005


---- error analysis -----

I have no idea why the first process is suddenly terminating, and can't figure out how to track down the reason. This happens even if I put the rank zero process into an infinite loop at the end of all of it's operations... it just suddenly dies. My main function looks like this:

int _tmain(int argc, _TCHAR* argv[])
{
    /* Initialize the MPI execution environment. */
    MPI_Init(0, NULL);

    /* Create the worker threads. */
    boost::thread masterThread(&Master);
    boost::thread slaveThread(&Slave);

    /* Wait for the local test thread to end. */
    masterThread.join();
    slaveThread.join();

    /* Shutdown. */
    MPI_Finalize();
    return 0;
}

Where the master and slave functions do some arbitrary work before ending. I can confirm that the master thread, at the very least, is reaching the end of it's operations. The slave thread is always the one that isn't done before the execution gets aborted. Using print statements, it seems like the slave thread isn't actually hitting any errors... it's happily moving along and just get's taken out in the crash.

So, does anyone have any ideas for:
a) What could be causing this?
b) How should I go about debugging it?

Thanks so much!

Edit:

Posting minimal versions of the Master/Slave functions. Note that the goal of this program is purely for demonstration purposes... so it isn't doing anything useful. Essentially, the master threads send a dummy payload to the slave thread of the other MPI process.

void Master()
{   
    int  myRank;
    int  numProcs;
    MPI_Comm_size(MPI_COMM_WORLD, &numProcs);
    MPI_Comm_rank(MPI_COMM_WORLD, &myRank);

    /* Create a message with numbers 0 through 39 as the payload, addressed 
     * to this thread. */
    int *payload= new int[40];
    for(int n = 0; n < 40; n++) {
        payload[n] = n;
    }

    if(myRank == 0) {
        MPI_Send(payload, 40, MPI_INT, 1, MPI_ANY_TAG, MPI_COMM_WORLD);
    } else {
        MPI_Send(payload, 40, MPI_INT, 0, MPI_ANY_TAG, MPI_COMM_WORLD);
    }

    /* Free memory. */
    delete(payload);
}

void Slave()
{
    MPI_Status status;
    int *payload= new int[40];
    MPI_Recv(payload, 40, MPI_INT, MPI_ANY_SOURCE, MPI_ANY_TAG, MPI_COMM_WORLD, &status);

    /* Free memory. */
    delete(payload);
}

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

┈┾☆殇 2024-08-31 11:52:38

您必须使用 mpi 运行时的线程安全版本。
阅读MPI_Init_thread

you have to use thread safe version of mpi runtime.
read up on MPI_Init_thread.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文