MPI 通信中出现奇怪的双精度值 - 内存问题?

发布于 2024-10-31 06:16:31 字数 1221 浏览 4 评论 0原文

这实际上是 这个问题的后续内容,因为我认为我已经解决了最初问题提出的问题,但现在还有一些其他问题。

我有一些 MPI 代码正在执行矩阵转置。它通过使用 MPI_Isend 和 MPI_Irecv 的点对点非阻塞通信来完成此操作。我正在使用双精度数,并且我的所有 MPI 代码都使用 MPI_DOUBLE 作为类型。然而,我似乎遇到了一些奇怪的内存问题 - 其中关键之一是在我的输出中包含“无意义”数字。例如:

Test Process (2, 1): 68.000000 78.000000 
Test Process (2, 1): 387323398486945739062068424931898425134839058804189460794109462554519403357109477747039490936107027309191462010675537134594564349232145421118587860238537662203953149049188364045280831238661272720084252520359127715290869606638545797120.000000 881150864511763756676254370742733018389256944202962553716402946507192139671624750374865205489904045881646541419557063427368973644261533211221769931916194052019466643963904.000000 
Test Process (2, 1): 78.000000 88.000000 

我可以猜测,不知何故存在内存问题 - 当不是时,我将一些内存作为双精度读取,或者当它不是时,将其作为双精度写入内存。知道我该如何调试这个吗?

代码可以在这里找到,但我不期望对代码进行详细分析,更多提示使用 MPI 通信如何发生这种错误,以及我可以采取哪些措施来追踪错误。

只是为了确认我尝试过的一些事情:这不是数组初始化的问题。我尝试将数组初始化为已知值 (999),但该值最后并未出现在数组中,因此显然所有新值(包括疯狂的值)都来自 MPI 通信。

有什么想法吗?

This is really a follow-up from this question, as I think I've solved the problem that the original question asked, but now have some other problems.

I have some MPI code which is doing a matrix transpose. It is doing this via point-to-point non-blocking communication using MPI_Isend and MPI_Irecv. I am working with doubles, and all of my MPI code uses MPI_DOUBLE as the type. However, I seem to be getting some strange memory issues - the key one of which is the inclusion of 'nonsense' numbers in my output. For example:

Test Process (2, 1): 68.000000 78.000000 
Test Process (2, 1): 387323398486945739062068424931898425134839058804189460794109462554519403357109477747039490936107027309191462010675537134594564349232145421118587860238537662203953149049188364045280831238661272720084252520359127715290869606638545797120.000000 881150864511763756676254370742733018389256944202962553716402946507192139671624750374865205489904045881646541419557063427368973644261533211221769931916194052019466643963904.000000 
Test Process (2, 1): 78.000000 88.000000 

I can guess that somehow there is a memory problem - I'm reading some memory as a double when it isn't, or writing to memory as a double when it isn't. Any idea how I can go about debugging this?

The code is available here, but I'm not expecting detailed analysis of the code, more tips as to how this kind of error can occur using MPI communication, and what I might be able to do to track down the error.

Just to confirm a few things I've tried: it's not a problem with the initialisation of the array. I've tried initialising the array to a known value (999) and that doesn't appear in the array at the end, so obviously all of the new values (including the crazy ones) are coming from the MPI communications.

Any ideas?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

星星的軌跡 2024-11-07 06:16:31

一个潜在的问题是数组索引不一致。在第 223 行,看来 i 和 j 可能是向后的。 (我不确定在 iPod Touch 中查看行号是否匹配。这是带有注释“计算偏移量”的循环)与其他循环相比,j 和 i 是行换列。看来您已经在评论中调整了几种不同的方式......所以也许这是预期的。由于我目前使用的是视野有限的 iPod touch,所以我无法清楚地看到整个代码。但这部分似乎确实不正确。

最后的循环似乎也不正确。其中 j 和 i 与其他循环相比也相反。

One potential issue is inconsistent indexing of array. At line 223, It appears that i and j might be backwards. (I'm not sure that in this iPod Touch viewing it the line numbers are matching up. It is the loop with the comment "Calculate the offsets") j and i are swapped rows for columns compared to the other loops. It appears that you have adjusted that a couple of different ways in the comments ... so maybe it is expected. I can't see the whole code very well since I am currently using an iPod touch with limited view. But that part does seem incorrect.

And the final loop also seems incorrect. In that one j and i are also reversed compared to the other loops.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文