使用 Valgrind 进行 OpenMPI 调试以及 OS X 中的抑制
我正在我的 OS X (Snow Leopard) 笔记本电脑上用 C++ 编写并行代码,并尝试使用 memchecker 对其进行调试。我已经成功构建了具有 valgrind 支持的 OpenMPI:configure --prefix=/opt/openmpi-1.4.3/ --enable-debug --enable-memchecker --with-valgrind=/opt/valgrind-3.6。 0/ FFLAGS=-m64 F90FLAGS=-m64 (忽略 Fortran 标志,这是因为我的 Fortran 编译器来自海湾合作委员会)。
当我运行我的应用程序时
mpirun -np 2 valgrind --suppressions=/opt/openmpi-1.4.3/share/openmpi/openmpi-valgrind.supp --leak-check=yes --dsymutil=yes ./program
我收到很多警告Valgrind(其中大部分来自最后的堆摘要)。我在下面添加了一小段警告。我从他们那里得到的是 Valgrind 检测 MPI 库中的内存泄漏和未初始化的值,但我对此并不真正感兴趣。我希望我编写的代码发出警告。我已经使用 OpenMPI 提供的抑制文件运行 Valgrind,但显然这还不够。如何轻松忽略 OpenMPI 发行版中检测到的所有其他警告?是否可以在 OS X 上使用 Valgrind 找到用于 OpenMPI 调试的抑制文件,或者您知道什么狡猾的技巧吗?
第一个警告是
==1531== Syscall param writev(vector[...]) points to uninitialised byte(s)
==1531== at 0x1014E16E2: writev (in /usr/lib/libSystem.B.dylib)
==1531== by 0x101AEA4C5: mca_oob_tcp_peer_send (in /opt/openmpi-1.4.3/lib/openmpi/mca_oob_tcp.so)
==1531== by 0x101AF0B88: mca_oob_tcp_send_nb (in /opt/openmpi-1.4.3/lib/openmpi/mca_oob_tcp.so)
==1531== by 0x101AC7F48: orte_rml_oob_send (in /opt/openmpi-1.4.3/lib/openmpi/mca_rml_oob.so)
==1531== by 0x101AC8AA1: orte_rml_oob_send_buffer (in /opt/openmpi-1.4.3/lib/openmpi/mca_rml_oob.so)
==1531== by 0x101B3489E: allgather (in /opt/openmpi-1.4.3/lib/openmpi/mca_grpcomm_bad.so)
==1531== by 0x101B3525D: modex (in /opt/openmpi-1.4.3/lib/openmpi/mca_grpcomm_bad.so)
==1531== by 0x1000A48E6: ompi_mpi_init (in /opt/openmpi-1.4.3/lib/libmpi.0.dylib)
==1531== by 0x1000F7806: MPI_Init (in /opt/openmpi-1.4.3/lib/libmpi.0.dylib)
==1531== by 0x100001AF2: main (main.cpp:34)
==1531== Address 0x101a8911b is 107 bytes inside a block of size 256 alloc'd
==1531== at 0x10002DB2D: realloc (vg_replace_malloc.c:525)
==1531== by 0x1012240B6: opal_dss_buffer_extend (in /opt/openmpi-1.4.3/lib/libopen- pal.0.dylib)
==1531== by 0x101225CF7: opal_dss_copy_payload (in /opt/openmpi-1.4.3/lib/libopen-pal.0.dylib)
==1531== by 0x101B347CA: allgather (in /opt/openmpi-1.4.3/lib/openmpi/mca_grpcomm_bad.so)
==1531== by 0x101B3525D: modex (in /opt/openmpi-1.4.3/lib/openmpi/mca_grpcomm_bad.so)
==1531== by 0x1000A48E6: ompi_mpi_init (in /opt/openmpi-1.4.3/lib/libmpi.0.dylib)
==1531== by 0x1000F7806: MPI_Init (in /opt/openmpi-1.4.3/lib/libmpi.0.dylib)
==1531== by 0x100001AF2: main (main.cpp:34)
执行后堆摘要的一小段如下所示
==1531== 88 bytes in 1 blocks are definitely lost in loss record 1,950 of 2,194
==1531== at 0x10002D915: malloc (vg_replace_malloc.c:236)
==1531== by 0x100073888: opal_obj_new (in /opt/openmpi-1.4.3/lib/libmpi.0.dylib)
==1531== by 0x100073808: opal_obj_new_debug (in /opt/openmpi-1.4.3/lib/libmpi.0.dylib)
==1531== by 0x100073C17: ompi_attr_create_keyval_impl (in /opt/openmpi-1.4.3/lib/libmpi.0.dylib)
==1531== by 0x100073FCF: ompi_attr_create_keyval (in /opt/openmpi-1.4.3/lib/libmpi.0.dylib)
==1531== by 0x100077C96: create_comm (in /opt/openmpi-1.4.3/lib/libmpi.0.dylib)
==1531== by 0x10007798A: ompi_attr_create_predefined (in /opt/openmpi-1.4.3/lib/libmpi.0.dylib)
==1531== by 0x1000737CF: ompi_attr_init (in /opt/openmpi-1.4.3/lib/libmpi.0.dylib)
==1531== by 0x1000A4840: ompi_mpi_init (in /opt/openmpi-1.4.3/lib/libmpi.0.dylib)
==1531== by 0x1000F7806: MPI_Init (in /opt/openmpi-1.4.3/lib/libmpi.0.dylib)
==1531== by 0x100001AF2: main (main.cpp:34)
...
==1531== 88 bytes in 1 blocks are definitely lost in loss record 1,952 of 2,194
==1531== at 0x10002D915: malloc (vg_replace_malloc.c:236)
==1531== by 0x100073888: opal_obj_new (in /opt/openmpi-1.4.3/lib/libmpi.0.dylib)
==1531== by 0x100073808: opal_obj_new_debug (in /opt/openmpi-1.4.3/lib/libmpi.0.dylib)
==1531== by 0x100073C17: ompi_attr_create_keyval_impl (in /opt/openmpi-1.4.3/lib/libmpi.0.dylib)
==1531== by 0x100073FCF: ompi_attr_create_keyval (in /opt/openmpi-1.4.3/lib/libmpi.0.dylib)
==1531== by 0x10014CEC5: PMPI_Keyval_create (in /opt/openmpi-1.4.3/lib/libmpi.0.dylib)
==1531== by 0x1065ACFE6: ???
==1531== by 0x10658867B: ???
==1531== by 0x10017A591: module_init (in /opt/openmpi-1.4.3/lib/libmpi.0.dylib)
==1531== by 0x100179985: mca_io_base_file_select (in /opt/openmpi-1.4.3/lib/libmpi.0.dylib)
==1531== by 0x100089D55: ompi_file_open (in /opt/openmpi-1.4.3/lib/libmpi.0.dylib)
==1531== by 0x1000E1ED1: MPI_File_open (in /opt/openmpi-1.4.3/lib/libmpi.0.dylib)
==1531==
==1531== 88 bytes in 1 blocks are definitely lost in loss record 1,953 of 2,194
==1531== at 0x10002D915: malloc (vg_replace_malloc.c:236)
==1531== by 0x100073888: opal_obj_new (in /opt/openmpi-1.4.3/lib/libmpi.0.dylib)
==1531== by 0x100073808: opal_obj_new_debug (in /opt/openmpi-1.4.3/lib/libmpi.0.dylib)
==1531== by 0x100073C17: ompi_attr_create_keyval_impl (in /opt/openmpi-1.4.3/lib/libmpi.0.dylib)
==1531== by 0x100073FCF: ompi_attr_create_keyval (in /opt/openmpi-1.4.3/lib/libmpi.0.dylib)
==1531== by 0x10014CEC5: PMPI_Keyval_create (in /opt/openmpi-1.4.3/lib/libmpi.0.dylib)
==1531== by 0x1065A6210: ???
==1531== by 0x106597149: ???
==1531== by 0x106596AAB: ???
==1531== by 0x1065AD14C: ???
==1531== by 0x10658867B: ???
==1531== by 0x10017A591: module_init (in /opt/openmpi-1.4.3/lib/libmpi.0.dylib)
I am writing a parallel code in C++ on my OS X (Snow Leopard) laptop, and I am trying to debug it with memchecker. I have successfully built OpenMPI with valgrind support with: configure --prefix=/opt/openmpi-1.4.3/ --enable-debug --enable-memchecker --with-valgrind=/opt/valgrind-3.6.0/ FFLAGS=-m64 F90FLAGS=-m64
(Ignore the Fortran flags, it's due to my Fortran compiler being from GCC).
When I run my application with
mpirun -np 2 valgrind --suppressions=/opt/openmpi-1.4.3/share/openmpi/openmpi-valgrind.supp --leak-check=yes --dsymutil=yes ./program
I get a whole lot of warnings from Valgrind (the most of them from the heap summary at the end). I have included a small snippet of the warnings below. What I get from them is that Valgrind detects memory leaks and uninitialised values in the MPI library, but I'm not really interested in that. I want warnings from the code I write. I already run Valgrind with the suppression file provided by OpenMPI, but evidently it is not enough. How can I easily ignore all the other warnings detected in the OpenMPI distribution? Is it possible to find a suppression file for OpenMPI debugging with Valgrind on OS X, or do you know any cunning trick?
The first warning is
==1531== Syscall param writev(vector[...]) points to uninitialised byte(s)
==1531== at 0x1014E16E2: writev (in /usr/lib/libSystem.B.dylib)
==1531== by 0x101AEA4C5: mca_oob_tcp_peer_send (in /opt/openmpi-1.4.3/lib/openmpi/mca_oob_tcp.so)
==1531== by 0x101AF0B88: mca_oob_tcp_send_nb (in /opt/openmpi-1.4.3/lib/openmpi/mca_oob_tcp.so)
==1531== by 0x101AC7F48: orte_rml_oob_send (in /opt/openmpi-1.4.3/lib/openmpi/mca_rml_oob.so)
==1531== by 0x101AC8AA1: orte_rml_oob_send_buffer (in /opt/openmpi-1.4.3/lib/openmpi/mca_rml_oob.so)
==1531== by 0x101B3489E: allgather (in /opt/openmpi-1.4.3/lib/openmpi/mca_grpcomm_bad.so)
==1531== by 0x101B3525D: modex (in /opt/openmpi-1.4.3/lib/openmpi/mca_grpcomm_bad.so)
==1531== by 0x1000A48E6: ompi_mpi_init (in /opt/openmpi-1.4.3/lib/libmpi.0.dylib)
==1531== by 0x1000F7806: MPI_Init (in /opt/openmpi-1.4.3/lib/libmpi.0.dylib)
==1531== by 0x100001AF2: main (main.cpp:34)
==1531== Address 0x101a8911b is 107 bytes inside a block of size 256 alloc'd
==1531== at 0x10002DB2D: realloc (vg_replace_malloc.c:525)
==1531== by 0x1012240B6: opal_dss_buffer_extend (in /opt/openmpi-1.4.3/lib/libopen- pal.0.dylib)
==1531== by 0x101225CF7: opal_dss_copy_payload (in /opt/openmpi-1.4.3/lib/libopen-pal.0.dylib)
==1531== by 0x101B347CA: allgather (in /opt/openmpi-1.4.3/lib/openmpi/mca_grpcomm_bad.so)
==1531== by 0x101B3525D: modex (in /opt/openmpi-1.4.3/lib/openmpi/mca_grpcomm_bad.so)
==1531== by 0x1000A48E6: ompi_mpi_init (in /opt/openmpi-1.4.3/lib/libmpi.0.dylib)
==1531== by 0x1000F7806: MPI_Init (in /opt/openmpi-1.4.3/lib/libmpi.0.dylib)
==1531== by 0x100001AF2: main (main.cpp:34)
After execution a small snippet of the heap summary looks like this
==1531== 88 bytes in 1 blocks are definitely lost in loss record 1,950 of 2,194
==1531== at 0x10002D915: malloc (vg_replace_malloc.c:236)
==1531== by 0x100073888: opal_obj_new (in /opt/openmpi-1.4.3/lib/libmpi.0.dylib)
==1531== by 0x100073808: opal_obj_new_debug (in /opt/openmpi-1.4.3/lib/libmpi.0.dylib)
==1531== by 0x100073C17: ompi_attr_create_keyval_impl (in /opt/openmpi-1.4.3/lib/libmpi.0.dylib)
==1531== by 0x100073FCF: ompi_attr_create_keyval (in /opt/openmpi-1.4.3/lib/libmpi.0.dylib)
==1531== by 0x100077C96: create_comm (in /opt/openmpi-1.4.3/lib/libmpi.0.dylib)
==1531== by 0x10007798A: ompi_attr_create_predefined (in /opt/openmpi-1.4.3/lib/libmpi.0.dylib)
==1531== by 0x1000737CF: ompi_attr_init (in /opt/openmpi-1.4.3/lib/libmpi.0.dylib)
==1531== by 0x1000A4840: ompi_mpi_init (in /opt/openmpi-1.4.3/lib/libmpi.0.dylib)
==1531== by 0x1000F7806: MPI_Init (in /opt/openmpi-1.4.3/lib/libmpi.0.dylib)
==1531== by 0x100001AF2: main (main.cpp:34)
...
==1531== 88 bytes in 1 blocks are definitely lost in loss record 1,952 of 2,194
==1531== at 0x10002D915: malloc (vg_replace_malloc.c:236)
==1531== by 0x100073888: opal_obj_new (in /opt/openmpi-1.4.3/lib/libmpi.0.dylib)
==1531== by 0x100073808: opal_obj_new_debug (in /opt/openmpi-1.4.3/lib/libmpi.0.dylib)
==1531== by 0x100073C17: ompi_attr_create_keyval_impl (in /opt/openmpi-1.4.3/lib/libmpi.0.dylib)
==1531== by 0x100073FCF: ompi_attr_create_keyval (in /opt/openmpi-1.4.3/lib/libmpi.0.dylib)
==1531== by 0x10014CEC5: PMPI_Keyval_create (in /opt/openmpi-1.4.3/lib/libmpi.0.dylib)
==1531== by 0x1065ACFE6: ???
==1531== by 0x10658867B: ???
==1531== by 0x10017A591: module_init (in /opt/openmpi-1.4.3/lib/libmpi.0.dylib)
==1531== by 0x100179985: mca_io_base_file_select (in /opt/openmpi-1.4.3/lib/libmpi.0.dylib)
==1531== by 0x100089D55: ompi_file_open (in /opt/openmpi-1.4.3/lib/libmpi.0.dylib)
==1531== by 0x1000E1ED1: MPI_File_open (in /opt/openmpi-1.4.3/lib/libmpi.0.dylib)
==1531==
==1531== 88 bytes in 1 blocks are definitely lost in loss record 1,953 of 2,194
==1531== at 0x10002D915: malloc (vg_replace_malloc.c:236)
==1531== by 0x100073888: opal_obj_new (in /opt/openmpi-1.4.3/lib/libmpi.0.dylib)
==1531== by 0x100073808: opal_obj_new_debug (in /opt/openmpi-1.4.3/lib/libmpi.0.dylib)
==1531== by 0x100073C17: ompi_attr_create_keyval_impl (in /opt/openmpi-1.4.3/lib/libmpi.0.dylib)
==1531== by 0x100073FCF: ompi_attr_create_keyval (in /opt/openmpi-1.4.3/lib/libmpi.0.dylib)
==1531== by 0x10014CEC5: PMPI_Keyval_create (in /opt/openmpi-1.4.3/lib/libmpi.0.dylib)
==1531== by 0x1065A6210: ???
==1531== by 0x106597149: ???
==1531== by 0x106596AAB: ???
==1531== by 0x1065AD14C: ???
==1531== by 0x10658867B: ???
==1531== by 0x10017A591: module_init (in /opt/openmpi-1.4.3/lib/libmpi.0.dylib)
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
我无法评价 Open MPI 在 Valgrind 下的行为,但 MPICH2 应该在这方面做得更好。如果您不特别需要 Open MPI 作为 MPI 实现,那么您可以 轻松配置 MPICH2 以避免 Valgrind 出现问题。
I can't speak to Open MPI's behavior under Valgrind, but MPICH2 should be better about this. If you don't specifically need Open MPI as your MPI implementation, then you can easily configure MPICH2 to avoid problems with Valgrind.
您可以自己为 valgrind 添加额外的抑制。这些将处理您发布的第一组警告:
You can add additional suppressions yourself for valgrind. These will take care of the first set of warnings that you posted: