使用超过 1 个节点进行 mpi 作业时出现分段错误
我目前正在寻找与管理和最大化我从国家 HPC 服务访问的资源相关的问题的解决方案。
该服务有 2 个相关的主要队列:1) Intel Xeon Gold 6148 Skylake 处理器(每个节点 2x20 个内核),程度较小2) Intel Xeon Phi 7210 KNL 处理器。
当我在 1 个节点上运行我的代码 [这是一个海洋物理/生物地球化学模型] 时,我没有遇到任何问题。但是,当我在超过 1 个节点上运行时,我收到分段错误。根据我的经验,避免这种分段错误的唯一方法是请求每个节点的核心数少于全部核心数。然而这种做法是令人望而却步的。例如,1个节点有40个核心。我请求的节点越多,我可以请求的核心就越少。我遇到的上限是 6 个节点,每个节点 15 个核心,总共 90 个核心。
这自然意味着我在加速方面的野心被大大削弱了。
据我了解,启动节点具有无限的堆栈空间,但每个附加节点都使用较低的默认堆栈大小。因此,我尝试了命令 ulimit -s unlimited,但没有成功。
[更新为了回应 Gilles 下面的第一条评论,我之前曾尝试过此建议,但无济于事]
我怀疑我正在使用的 HPC 服务上的某些专有设置正在限制某些内容,同样如此在其他国家/地区的类似 HPC 服务上运行的模型显然没问题,并且获得了所需的扩展。
如果有任何有关 slurm 配置和 HPC 的建议,我将不胜感激。由于时间限制,我无法对这个由多个研究机构彻底开发的代码进行 MPI 方面的大量重写。
我重申,这段代码在其他 HPC 和 slurm/pbs 脚本上运行良好。
I am currently looking for a solution to an issue relating to managing and maximizing resources I am accessing from a national HPC service.
The service has 2 main queues of relevance: 1) Intel Xeon Gold 6148 Skylake processors (2x20 cores per node), and to a lesser extent 2) Intel Xeon Phi 7210 KNL processors.
When I run my code [which is an ocean physics/biogeochemistry model] on 1 node, I experience no issues. However, when I run on more than 1 node, I get a segmentation error. The only way to avoid this segmentation error in my experience has been to request fewer than the full compliment of cores per node. This practice is however prohibitive. For example, 1 node has 40 cores. The more nodes I request, the fewer cores I can request. The cap I have met is 6 nodes with 15 cores per node, to give 90 cores overall.
This all naturally means that the ambition I had in terms of speed-up are considerably curtailed.
I understand that the launch node has unlimited stack space, but that every additional node uses a lower default stack size. Consequently, I tried the command ulimit -s unlimited, but no luck.
[UPDATE In response to the first comment below from Gilles, I have tried this suggestion previously to no avail]
I suspect that some proprietary setting on the HPC service I'm using is throttling things, as the same model run on similar HPC services in other countries is apparently OK and gets the desired scale-up.
I would appreciate any suggestions in relation to slurm configuration and HPC. Due to time constraints, I can't engage in extensive rewriting of MPI aspects to this code which has been thoroughly developed by multiple research agencies.
I reiterate that this code works fine on other HPCs and slurm/pbs scripting.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
我将以下三个标志添加到 ifort 编译器标志列表中,这似乎已经解决了问题
-堆数组 1000
-平行线
-xSKYLAKE-AVX512
现在我可以利用每个节点上的所有核心。我怀疑将数组大小规范(1000kb,即 1MB)添加到“-heap-arrays”标志是最重要的添加。我以前使用它时没有指定要存储在堆上的数组的大小,但它没有任何区别。
I added the three following flags to the list of ifort compiler flags, which seems to have resolved the issue
-heap-arrays 1000
-parallel
-xSKYLAKE-AVX512
Now I can avail of all the cores on each node. I suspect that adding the array size specification (1000kb, i.e. 1MB) to the "-heap-arrays" flag is the most important addition. I had previously used it without specifying the size of the array to store on the heap, but it had made no difference.