使用 Valgrind 进行 OpenMPI 调试以及 OS X 中的抑制

发布于 2024-11-29 11:34:04 字数 5477 浏览 1 评论 0原文

我正在我的 OS X (Snow Leopard) 笔记本电脑上用 C++ 编写并行代码,并尝试使用 memchecker 对其进行调试。我已经成功构建了具有 valgrind 支持的 OpenMPI:configure --prefix=/opt/openmpi-1.4.3/ --enable-debug --enable-memchecker --with-valgrind=/opt/valgrind-3.6。 0/ FFLAGS=-m64 F90FLAGS=-m64 (忽略 Fortran 标志,这是因为我的 Fortran 编译器来自海湾合作委员会)。

当我运行我的应用程序时

mpirun -np 2 valgrind --suppressions=/opt/openmpi-1.4.3/share/openmpi/openmpi-valgrind.supp --leak-check=yes --dsymutil=yes ./program

我收到很多警告Valgrind(其中大部分来自最后的堆摘要)。我在下面添加了一小段警告。我从他们那里得到的是 Valgrind 检测 MPI 库中的内存泄漏和未初始化的值,但我对此并不真正感兴趣。我希望我编写的代码发出警告。我已经使用 OpenMPI 提供的抑制文件运行 Valgrind,但显然这还不够。如何轻松忽略 OpenMPI 发行版中检测到的所有其他警告?是否可以在 OS X 上使用 Valgrind 找到用于 OpenMPI 调试的抑制文件,或者您知道什么狡猾的技巧吗?

第一个警告是

 ==1531==    Syscall param writev(vector[...]) points to uninitialised byte(s)
 ==1531==    at 0x1014E16E2: writev (in /usr/lib/libSystem.B.dylib)
 ==1531==    by 0x101AEA4C5: mca_oob_tcp_peer_send (in /opt/openmpi-1.4.3/lib/openmpi/mca_oob_tcp.so)
 ==1531==    by 0x101AF0B88: mca_oob_tcp_send_nb (in /opt/openmpi-1.4.3/lib/openmpi/mca_oob_tcp.so)
 ==1531==    by 0x101AC7F48: orte_rml_oob_send (in /opt/openmpi-1.4.3/lib/openmpi/mca_rml_oob.so)
 ==1531==    by 0x101AC8AA1: orte_rml_oob_send_buffer (in /opt/openmpi-1.4.3/lib/openmpi/mca_rml_oob.so)
 ==1531==    by 0x101B3489E: allgather (in /opt/openmpi-1.4.3/lib/openmpi/mca_grpcomm_bad.so) 
 ==1531==    by 0x101B3525D: modex (in /opt/openmpi-1.4.3/lib/openmpi/mca_grpcomm_bad.so)
 ==1531==    by 0x1000A48E6: ompi_mpi_init (in /opt/openmpi-1.4.3/lib/libmpi.0.dylib)
 ==1531==    by 0x1000F7806: MPI_Init (in /opt/openmpi-1.4.3/lib/libmpi.0.dylib)
 ==1531==    by 0x100001AF2: main (main.cpp:34)
 ==1531==  Address 0x101a8911b is 107 bytes inside a block of size 256 alloc'd
 ==1531==    at 0x10002DB2D: realloc (vg_replace_malloc.c:525)
 ==1531==    by 0x1012240B6: opal_dss_buffer_extend (in /opt/openmpi-1.4.3/lib/libopen- pal.0.dylib)
 ==1531==    by 0x101225CF7: opal_dss_copy_payload (in /opt/openmpi-1.4.3/lib/libopen-pal.0.dylib)
 ==1531==    by 0x101B347CA: allgather (in /opt/openmpi-1.4.3/lib/openmpi/mca_grpcomm_bad.so)
 ==1531==    by 0x101B3525D: modex (in /opt/openmpi-1.4.3/lib/openmpi/mca_grpcomm_bad.so)
 ==1531==    by 0x1000A48E6: ompi_mpi_init (in /opt/openmpi-1.4.3/lib/libmpi.0.dylib)
 ==1531==    by 0x1000F7806: MPI_Init (in /opt/openmpi-1.4.3/lib/libmpi.0.dylib)
 ==1531==    by 0x100001AF2: main (main.cpp:34)

执行后堆摘要的一小段如下所示

 ==1531== 88 bytes in 1 blocks are definitely lost in loss record 1,950 of 2,194
 ==1531==    at 0x10002D915: malloc (vg_replace_malloc.c:236)
 ==1531==    by 0x100073888: opal_obj_new (in /opt/openmpi-1.4.3/lib/libmpi.0.dylib)
 ==1531==    by 0x100073808: opal_obj_new_debug (in /opt/openmpi-1.4.3/lib/libmpi.0.dylib)
 ==1531==    by 0x100073C17: ompi_attr_create_keyval_impl (in /opt/openmpi-1.4.3/lib/libmpi.0.dylib)
 ==1531==    by 0x100073FCF: ompi_attr_create_keyval (in /opt/openmpi-1.4.3/lib/libmpi.0.dylib)
 ==1531==    by 0x100077C96: create_comm (in /opt/openmpi-1.4.3/lib/libmpi.0.dylib)
 ==1531==    by 0x10007798A: ompi_attr_create_predefined (in /opt/openmpi-1.4.3/lib/libmpi.0.dylib)
 ==1531==    by 0x1000737CF: ompi_attr_init (in /opt/openmpi-1.4.3/lib/libmpi.0.dylib)
 ==1531==    by 0x1000A4840: ompi_mpi_init (in /opt/openmpi-1.4.3/lib/libmpi.0.dylib)
 ==1531==    by 0x1000F7806: MPI_Init (in /opt/openmpi-1.4.3/lib/libmpi.0.dylib)
 ==1531==    by 0x100001AF2: main (main.cpp:34)

...

 ==1531== 88 bytes in 1 blocks are definitely lost in loss record 1,952 of 2,194
 ==1531==    at 0x10002D915: malloc (vg_replace_malloc.c:236)
 ==1531==    by 0x100073888: opal_obj_new (in /opt/openmpi-1.4.3/lib/libmpi.0.dylib)
 ==1531==    by 0x100073808: opal_obj_new_debug (in /opt/openmpi-1.4.3/lib/libmpi.0.dylib)
 ==1531==    by 0x100073C17: ompi_attr_create_keyval_impl (in /opt/openmpi-1.4.3/lib/libmpi.0.dylib)
 ==1531==    by 0x100073FCF: ompi_attr_create_keyval (in /opt/openmpi-1.4.3/lib/libmpi.0.dylib)
 ==1531==    by 0x10014CEC5: PMPI_Keyval_create (in /opt/openmpi-1.4.3/lib/libmpi.0.dylib)
 ==1531==    by 0x1065ACFE6: ???
 ==1531==    by 0x10658867B: ???
 ==1531==    by 0x10017A591: module_init (in /opt/openmpi-1.4.3/lib/libmpi.0.dylib)
 ==1531==    by 0x100179985: mca_io_base_file_select (in /opt/openmpi-1.4.3/lib/libmpi.0.dylib)
 ==1531==    by 0x100089D55: ompi_file_open (in /opt/openmpi-1.4.3/lib/libmpi.0.dylib)
 ==1531==    by 0x1000E1ED1: MPI_File_open (in /opt/openmpi-1.4.3/lib/libmpi.0.dylib)
 ==1531== 
 ==1531== 88 bytes in 1 blocks are definitely lost in loss record 1,953 of 2,194
 ==1531==    at 0x10002D915: malloc (vg_replace_malloc.c:236)
 ==1531==    by 0x100073888: opal_obj_new (in /opt/openmpi-1.4.3/lib/libmpi.0.dylib)
 ==1531==    by 0x100073808: opal_obj_new_debug (in /opt/openmpi-1.4.3/lib/libmpi.0.dylib)
 ==1531==    by 0x100073C17: ompi_attr_create_keyval_impl (in /opt/openmpi-1.4.3/lib/libmpi.0.dylib)
 ==1531==    by 0x100073FCF: ompi_attr_create_keyval (in /opt/openmpi-1.4.3/lib/libmpi.0.dylib)
 ==1531==    by 0x10014CEC5: PMPI_Keyval_create (in /opt/openmpi-1.4.3/lib/libmpi.0.dylib)
 ==1531==    by 0x1065A6210: ???
 ==1531==    by 0x106597149: ???
 ==1531==    by 0x106596AAB: ???
 ==1531==    by 0x1065AD14C: ???
 ==1531==    by 0x10658867B: ???
 ==1531==    by 0x10017A591: module_init (in /opt/openmpi-1.4.3/lib/libmpi.0.dylib)

I am writing a parallel code in C++ on my OS X (Snow Leopard) laptop, and I am trying to debug it with memchecker. I have successfully built OpenMPI with valgrind support with: configure --prefix=/opt/openmpi-1.4.3/ --enable-debug --enable-memchecker --with-valgrind=/opt/valgrind-3.6.0/ FFLAGS=-m64 F90FLAGS=-m64 (Ignore the Fortran flags, it's due to my Fortran compiler being from GCC).

When I run my application with

mpirun -np 2 valgrind --suppressions=/opt/openmpi-1.4.3/share/openmpi/openmpi-valgrind.supp --leak-check=yes --dsymutil=yes ./program

I get a whole lot of warnings from Valgrind (the most of them from the heap summary at the end). I have included a small snippet of the warnings below. What I get from them is that Valgrind detects memory leaks and uninitialised values in the MPI library, but I'm not really interested in that. I want warnings from the code I write. I already run Valgrind with the suppression file provided by OpenMPI, but evidently it is not enough. How can I easily ignore all the other warnings detected in the OpenMPI distribution? Is it possible to find a suppression file for OpenMPI debugging with Valgrind on OS X, or do you know any cunning trick?

The first warning is

 ==1531==    Syscall param writev(vector[...]) points to uninitialised byte(s)
 ==1531==    at 0x1014E16E2: writev (in /usr/lib/libSystem.B.dylib)
 ==1531==    by 0x101AEA4C5: mca_oob_tcp_peer_send (in /opt/openmpi-1.4.3/lib/openmpi/mca_oob_tcp.so)
 ==1531==    by 0x101AF0B88: mca_oob_tcp_send_nb (in /opt/openmpi-1.4.3/lib/openmpi/mca_oob_tcp.so)
 ==1531==    by 0x101AC7F48: orte_rml_oob_send (in /opt/openmpi-1.4.3/lib/openmpi/mca_rml_oob.so)
 ==1531==    by 0x101AC8AA1: orte_rml_oob_send_buffer (in /opt/openmpi-1.4.3/lib/openmpi/mca_rml_oob.so)
 ==1531==    by 0x101B3489E: allgather (in /opt/openmpi-1.4.3/lib/openmpi/mca_grpcomm_bad.so) 
 ==1531==    by 0x101B3525D: modex (in /opt/openmpi-1.4.3/lib/openmpi/mca_grpcomm_bad.so)
 ==1531==    by 0x1000A48E6: ompi_mpi_init (in /opt/openmpi-1.4.3/lib/libmpi.0.dylib)
 ==1531==    by 0x1000F7806: MPI_Init (in /opt/openmpi-1.4.3/lib/libmpi.0.dylib)
 ==1531==    by 0x100001AF2: main (main.cpp:34)
 ==1531==  Address 0x101a8911b is 107 bytes inside a block of size 256 alloc'd
 ==1531==    at 0x10002DB2D: realloc (vg_replace_malloc.c:525)
 ==1531==    by 0x1012240B6: opal_dss_buffer_extend (in /opt/openmpi-1.4.3/lib/libopen- pal.0.dylib)
 ==1531==    by 0x101225CF7: opal_dss_copy_payload (in /opt/openmpi-1.4.3/lib/libopen-pal.0.dylib)
 ==1531==    by 0x101B347CA: allgather (in /opt/openmpi-1.4.3/lib/openmpi/mca_grpcomm_bad.so)
 ==1531==    by 0x101B3525D: modex (in /opt/openmpi-1.4.3/lib/openmpi/mca_grpcomm_bad.so)
 ==1531==    by 0x1000A48E6: ompi_mpi_init (in /opt/openmpi-1.4.3/lib/libmpi.0.dylib)
 ==1531==    by 0x1000F7806: MPI_Init (in /opt/openmpi-1.4.3/lib/libmpi.0.dylib)
 ==1531==    by 0x100001AF2: main (main.cpp:34)

After execution a small snippet of the heap summary looks like this

 ==1531== 88 bytes in 1 blocks are definitely lost in loss record 1,950 of 2,194
 ==1531==    at 0x10002D915: malloc (vg_replace_malloc.c:236)
 ==1531==    by 0x100073888: opal_obj_new (in /opt/openmpi-1.4.3/lib/libmpi.0.dylib)
 ==1531==    by 0x100073808: opal_obj_new_debug (in /opt/openmpi-1.4.3/lib/libmpi.0.dylib)
 ==1531==    by 0x100073C17: ompi_attr_create_keyval_impl (in /opt/openmpi-1.4.3/lib/libmpi.0.dylib)
 ==1531==    by 0x100073FCF: ompi_attr_create_keyval (in /opt/openmpi-1.4.3/lib/libmpi.0.dylib)
 ==1531==    by 0x100077C96: create_comm (in /opt/openmpi-1.4.3/lib/libmpi.0.dylib)
 ==1531==    by 0x10007798A: ompi_attr_create_predefined (in /opt/openmpi-1.4.3/lib/libmpi.0.dylib)
 ==1531==    by 0x1000737CF: ompi_attr_init (in /opt/openmpi-1.4.3/lib/libmpi.0.dylib)
 ==1531==    by 0x1000A4840: ompi_mpi_init (in /opt/openmpi-1.4.3/lib/libmpi.0.dylib)
 ==1531==    by 0x1000F7806: MPI_Init (in /opt/openmpi-1.4.3/lib/libmpi.0.dylib)
 ==1531==    by 0x100001AF2: main (main.cpp:34)

...

 ==1531== 88 bytes in 1 blocks are definitely lost in loss record 1,952 of 2,194
 ==1531==    at 0x10002D915: malloc (vg_replace_malloc.c:236)
 ==1531==    by 0x100073888: opal_obj_new (in /opt/openmpi-1.4.3/lib/libmpi.0.dylib)
 ==1531==    by 0x100073808: opal_obj_new_debug (in /opt/openmpi-1.4.3/lib/libmpi.0.dylib)
 ==1531==    by 0x100073C17: ompi_attr_create_keyval_impl (in /opt/openmpi-1.4.3/lib/libmpi.0.dylib)
 ==1531==    by 0x100073FCF: ompi_attr_create_keyval (in /opt/openmpi-1.4.3/lib/libmpi.0.dylib)
 ==1531==    by 0x10014CEC5: PMPI_Keyval_create (in /opt/openmpi-1.4.3/lib/libmpi.0.dylib)
 ==1531==    by 0x1065ACFE6: ???
 ==1531==    by 0x10658867B: ???
 ==1531==    by 0x10017A591: module_init (in /opt/openmpi-1.4.3/lib/libmpi.0.dylib)
 ==1531==    by 0x100179985: mca_io_base_file_select (in /opt/openmpi-1.4.3/lib/libmpi.0.dylib)
 ==1531==    by 0x100089D55: ompi_file_open (in /opt/openmpi-1.4.3/lib/libmpi.0.dylib)
 ==1531==    by 0x1000E1ED1: MPI_File_open (in /opt/openmpi-1.4.3/lib/libmpi.0.dylib)
 ==1531== 
 ==1531== 88 bytes in 1 blocks are definitely lost in loss record 1,953 of 2,194
 ==1531==    at 0x10002D915: malloc (vg_replace_malloc.c:236)
 ==1531==    by 0x100073888: opal_obj_new (in /opt/openmpi-1.4.3/lib/libmpi.0.dylib)
 ==1531==    by 0x100073808: opal_obj_new_debug (in /opt/openmpi-1.4.3/lib/libmpi.0.dylib)
 ==1531==    by 0x100073C17: ompi_attr_create_keyval_impl (in /opt/openmpi-1.4.3/lib/libmpi.0.dylib)
 ==1531==    by 0x100073FCF: ompi_attr_create_keyval (in /opt/openmpi-1.4.3/lib/libmpi.0.dylib)
 ==1531==    by 0x10014CEC5: PMPI_Keyval_create (in /opt/openmpi-1.4.3/lib/libmpi.0.dylib)
 ==1531==    by 0x1065A6210: ???
 ==1531==    by 0x106597149: ???
 ==1531==    by 0x106596AAB: ???
 ==1531==    by 0x1065AD14C: ???
 ==1531==    by 0x10658867B: ???
 ==1531==    by 0x10017A591: module_init (in /opt/openmpi-1.4.3/lib/libmpi.0.dylib)

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

护你周全 2024-12-06 11:34:04

我无法评价 Open MPI 在 Valgrind 下的行为,但 MPICH2 应该在这方面做得更好。如果您不特别需要 Open MPI 作为 MPI 实现,那么您可以 轻松配置 MPICH2 以避免 Valgrind 出现问题

I can't speak to Open MPI's behavior under Valgrind, but MPICH2 should be better about this. If you don't specifically need Open MPI as your MPI implementation, then you can easily configure MPICH2 to avoid problems with Valgrind.

半衬遮猫 2024-12-06 11:34:04

您可以自己为 valgrind 添加额外的抑制。这些将处理您发布的第一组警告:

{
  ORTE OOB suppression rule
  Memcheck:Param
  writev(vector[...])
  fun:writev
  fun:mca_oob_tcp_msg_send_handler
  fun:mca_oob_tcp_peer_send
  fun:mca_oob_tcp_send_nb
  fun:orte_rml_oob_send
  fun:orte_rml_oob_send_buffer
  ...
  fun:ompi_mpi_init
}

{
  OMPI init leak
  Memcheck:Leak
  fun:malloc
  ...
  fun:ompi_mpi_init
}

{
  OMPI init leak
  Memcheck:Leak
  fun:realloc
  ...
  fun:ompi_mpi_init
}

{
  OMPI init leak
  Memcheck:Leak
  fun:calloc
  ...
  fun:ompi_mpi_init
}

You can add additional suppressions yourself for valgrind. These will take care of the first set of warnings that you posted:

{
  ORTE OOB suppression rule
  Memcheck:Param
  writev(vector[...])
  fun:writev
  fun:mca_oob_tcp_msg_send_handler
  fun:mca_oob_tcp_peer_send
  fun:mca_oob_tcp_send_nb
  fun:orte_rml_oob_send
  fun:orte_rml_oob_send_buffer
  ...
  fun:ompi_mpi_init
}

{
  OMPI init leak
  Memcheck:Leak
  fun:malloc
  ...
  fun:ompi_mpi_init
}

{
  OMPI init leak
  Memcheck:Leak
  fun:realloc
  ...
  fun:ompi_mpi_init
}

{
  OMPI init leak
  Memcheck:Leak
  fun:calloc
  ...
  fun:ompi_mpi_init
}
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文