混合搭配 GCC 和 Intel 编译器：正确链接 OpenMP

发布于 2025-01-12 09:39:22 字数 9942 浏览 2 评论 0原文

我有一个与 OpenMP 并行的科学 C++ 应用程序，通常使用 GCC/8.2.0 进行编译。该应用程序还依赖于 gsl 和 fftw，后者也使用 OpenMP。该应用程序使用 C API 访问 Fortran 库，该库也与 OpenMP 并行，并且可以使用 Intel 的 MKL 或 openblas 作为后端。最好使用 Intel/19.1.0 工具链编译该库。我已经使用 GCC/8.2.0 和 openblas （作为基线）成功编译、链接和测试了所有内容。然而，对最小示例的测试研究表明，采用 Intel 的 MKL 会更快，而且速度对我的用例很重要。

icc --version 给我：icc (ICC) 19.1.0.166 20191121;操作系统是 CentOS 7。请记住，我在集群上，对可以安装的内容的控制有限。软件使用 spack 进行集中管理，并通过编译器层的规范加载环境（一次仅加载一个）。

我考虑了如何将 Intel/MKL 库纳入我的代码中的不同方法：

使用 Intel 工具链编译 C++ 和 Fortran 代码。虽然这可能是最简洁的解决方案，但编译器会针对包含 OMP 的特定文件抛出“内部错误：20000_7001”。我找不到该特定错误代码的文档，也没有从英特尔获得反馈（https://community.intel.com/t5/Intel-C-Compiler/Compilation-error-internal-error-20000-7001/mp/1365208#M39785）。我分配了> 80 GB 内存用于编译，因为我以前经历过当资源有限时编译器崩溃的情况。也许这里有人看到过那个错误代码？
使用 GCC/8.2.0 编译 C++ 和 Fortran 代码，但动态链接到英特尔编译的 MKL 作为 Fortran 库的后端。我设法从 GCC/8.2.0 层以及 LIBRARY_PATH 和 LD_LIBRARY_PATH 扩展到 MKL 在集群上的位置来做到这一点。似乎只有 GNU OMP 被链接并且找到了 MKL。分析表明 CPU 负载相当低（但高于 GCC/8.2.0 + openblas 设置的二进制文件）。我的程序的执行时间提高了约 30%。但是，当我运行具有 20 个核心的二进制文件时，至少在一种情况下遇到了此运行时错误：libgomp: 线程创建失败：资源暂时不可用。
我的 C++ 代码坚持使用 GCC/8.2.0，并动态链接到使用英特尔 OMP 通过英特尔/MKL 自行编译的预编译 Fortran 库。事实证明这种方法很棘手。与方法(2)一样，我加载了GCC环境并手动扩展了LD_LIBRARY_PATH。一个非 OMP 并行化的最小示例本身开箱即用，效果很好。然而，尽管我也成功地编译和链接了我的 C++ 程序，但一旦 Fortran 库中发生 OMP 调用，我就会立即收到运行时错误。

以下是已编译的 C++ 代码的 ldd 输出：

linux-vdso.so.1 => (0x00007fff2d7bb000)
libpthread.so.0 => /lib64/libpthread.so.0 (0x00002ab227c25000)
libgsl.so.25 => /cluster/apps/gcc-8.2.0/gsl-2.6-x4nsmnz6sgpkm7ksorpmc2qdrkdxym22/lib/libgsl.so.25 (0x00002ab227e41000)
libgslcblas.so.0 => /cluster/apps/gcc-8.2.0/gsl-2.6-x4nsmnz6sgpkm7ksorpmc2qdrkdxym22/lib/libgslcblas.so.0 (0x00002ab228337000)
libfftw3.so.3 => /cluster/apps/gcc-8.2.0/fftw-3.3.9-w5zvgavdpyt5z3ryppx3uwbfg27al4v6/lib/libfftw3.so.3 (0x00002ab228595000)
libz.so.1 => /lib64/libz.so.1 (0x00002ab228a36000)
libfftw3_omp.so.3 => /cluster/apps/gcc-8.2.0/fftw-3.3.9-w5zvgavdpyt5z3ryppx3uwbfg27al4v6/lib/libfftw3_omp.so.3 (0x00002ab228c4c000)
libxtb.so.6 => /cluster/project/igc/iridiumcc/intel-19.1.0/xtb/build/libxtb.so.6 (0x00002ab228e53000)
libstdc++.so.6 => /cluster/spack/apps/linux-centos7-x86_64/gcc-4.8.5/gcc-8.2.0-6xqov2fhvbmehix42slain67vprec3fs/lib64/libstdc++.so.6 (0x00002ab22a16d000)
libm.so.6 => /lib64/libm.so.6 (0x00002ab22a4f1000)
libgomp.so.1 => /cluster/spack/apps/linux-centos7-x86_64/gcc-4.8.5/gcc-8.2.0-6xqov2fhvbmehix42slain67vprec3fs/lib64/libgomp.so.1 (0x00002ab22a7f3000)
libgcc_s.so.1 => /cluster/spack/apps/linux-centos7-x86_64/gcc-4.8.5/gcc-8.2.0-6xqov2fhvbmehix42slain67vprec3fs/lib64/libgcc_s.so.1 (0x00002ab22aa21000)
libc.so.6 => /lib64/libc.so.6 (0x00002ab22ac39000)
/lib64/ld-linux-x86-64.so.2 (0x00002ab227a01000)
libmkl_intel_lp64.so => /cluster/apps/intel/parallel_studio_xe_2020_r0/compilers_and_libraries_2020.0.166/linux/mkl/lib/intel64/libmkl_intel_lp64.so (0x00002ab22b007000)
libmkl_intel_thread.so => /cluster/apps/intel/parallel_studio_xe_2020_r0/compilers_and_libraries_2020.0.166/linux/mkl/lib/intel64/libmkl_intel_thread.so (0x00002ab22bb73000)
libmkl_core.so => /cluster/apps/intel/parallel_studio_xe_2020_r0/compilers_and_libraries_2020.0.166/linux/mkl/lib/intel64/libmkl_core.so (0x00002ab22e0df000)
libifcore.so.5 => /cluster/apps/intel/parallel_studio_xe_2020_r0/compilers_and_libraries_2020.0.166/linux/compiler/lib/intel64/libifcore.so.5 (0x00002ab2323ff000)
libimf.so => /cluster/apps/intel/parallel_studio_xe_2020_r0/compilers_and_libraries_2020.0.166/linux/compiler/lib/intel64/libimf.so (0x00002ab232763000)
libsvml.so => /cluster/apps/intel/parallel_studio_xe_2020_r0/compilers_and_libraries_2020.0.166/linux/compiler/lib/intel64/libsvml.so (0x00002ab232d01000)
libirng.so => /cluster/apps/intel/parallel_studio_xe_2020_r0/compilers_and_libraries_2020.0.166/linux/compiler/lib/intel64/libirng.so (0x00002ab234688000)
libiomp5.so => /cluster/apps/intel/parallel_studio_xe_2020_r0/compilers_and_libraries_2020.0.166/linux/compiler/lib/intel64/libiomp5.so (0x00002ab2349f2000)
libintlc.so.5 => /cluster/apps/intel/parallel_studio_xe_2020_r0/compilers_and_libraries_2020.0.166/linux/compiler/lib/intel64/libintlc.so.5 (0x00002ab234de2000)
libdl.so.2 => /lib64/libdl.so.2 (0x00002ab235059000)

我做了一些研究，并在此处和英特尔文档中发现了有关两种不同 OMP 实现的崩溃的有趣讨论：

告诉 GCC *不*链接 libgomp，以便它链接 libiomp5相反 https://www.intel.com/content/www/us/en/develop/documentation/cpp-compiler-developer-guide-and-reference/top/optimization-and-programming-guide/openmp-support /openmp-library-support/using-the-openmp-libraries.html http://www.nacad.ufrj.br/online/intel/Documentation/en_US/compiler_c/main_cls/optaps/common/optaps_par_compat_libs_using.htm

我遵循了为英特尔 OpenMP 兼容性库提供的指南。我的 C++ 代码的编译是在 GCC 环境中一如既往地使用 -fopenmp 标志完成的。在链接阶段 (g++)，我采用了通常采用的相同链接器命令，但将 -fopenmp 替换为 -L/cluster/apps/intel/parallel_studio_xe_2020_r0/compilers_and_libraries_2020.0.166/linux/compiler /lib/intel64-liomp5-lpthread。生成的二进制文件运行起来就像一个魅力，大约是我原来构建的（GCC/openblas）的两倍。

以下是编译后的 C++ 代码的 ldd 输出：

linux-vdso.so.1 =>  (0x00007ffd7eb9a000)
libiomp5.so => /cluster/apps/intel/parallel_studio_xe_2020_r0/compilers_and_libraries_2020.0.166/linux/compiler/lib/intel64/libiomp5.so (0x00002b4fb08da000)
libpthread.so.0 => /lib64/libpthread.so.0 (0x00002b4fb0cca000)
libgsl.so.25 => /cluster/apps/gcc-8.2.0/gsl-2.6-x4nsmnz6sgpkm7ksorpmc2qdrkdxym22/lib/libgsl.so.25 (0x00002b4fb0ee6000)
libgslcblas.so.0 => /cluster/apps/gcc-8.2.0/gsl-2.6-x4nsmnz6sgpkm7ksorpmc2qdrkdxym22/lib/libgslcblas.so.0 (0x00002b4fb13dc000)
libfftw3.so.3 => /cluster/apps/gcc-8.2.0/fftw-3.3.9-w5zvgavdpyt5z3ryppx3uwbfg27al4v6/lib/libfftw3.so.3 (0x00002b4fb163a000)
libz.so.1 => /lib64/libz.so.1 (0x00002b4fb1adb000)
libfftw3_omp.so.3 => /cluster/apps/gcc-8.2.0/fftw-3.3.9-w5zvgavdpyt5z3ryppx3uwbfg27al4v6/lib/libfftw3_omp.so.3 (0x00002b4fb1cf1000)
libxtb.so.6 => /cluster/project/igc/iridiumcc/intel-19.1.0/xtb/build/libxtb.so.6 (0x00002b4fb1ef8000)
libstdc++.so.6 => /cluster/spack/apps/linux-centos7-x86_64/gcc-4.8.5/gcc-8.2.0-6xqov2fhvbmehix42slain67vprec3fs/lib64/libstdc++.so.6 (0x00002b4fb3212000)
libm.so.6 => /lib64/libm.so.6 (0x00002b4fb3596000)
libgcc_s.so.1 => /cluster/spack/apps/linux-centos7-x86_64/gcc-4.8.5/gcc-8.2.0-6xqov2fhvbmehix42slain67vprec3fs/lib64/libgcc_s.so.1 (0x00002b4fb3898000)
libc.so.6 => /lib64/libc.so.6 (0x00002b4fb3ab0000)
/lib64/ld-linux-x86-64.so.2 (0x00002b4fb06b6000)
libdl.so.2 => /lib64/libdl.so.2 (0x00002b4fb3e7e000)
libgomp.so.1 => /cluster/spack/apps/linux-centos7-x86_64/gcc-4.8.5/gcc-8.2.0-6xqov2fhvbmehix42slain67vprec3fs/lib64/libgomp.so.1 (0x00002b4fb4082000)
libmkl_intel_lp64.so => /cluster/apps/intel/parallel_studio_xe_2020_r0/compilers_and_libraries_2020.0.166/linux/mkl/lib/intel64/libmkl_intel_lp64.so (0x00002b4fb42b0000)
libmkl_intel_thread.so => /cluster/apps/intel/parallel_studio_xe_2020_r0/compilers_and_libraries_2020.0.166/linux/mkl/lib/intel64/libmkl_intel_thread.so (0x00002b4fb4e1c000)
libmkl_core.so => /cluster/apps/intel/parallel_studio_xe_2020_r0/compilers_and_libraries_2020.0.166/linux/mkl/lib/intel64/libmkl_core.so (0x00002b4fb7388000)
libifcore.so.5 => /cluster/apps/intel/parallel_studio_xe_2020_r0/compilers_and_libraries_2020.0.166/linux/compiler/lib/intel64/libifcore.so.5 (0x00002b4fbb6a8000)
libimf.so => /cluster/apps/intel/parallel_studio_xe_2020_r0/compilers_and_libraries_2020.0.166/linux/compiler/lib/intel64/libimf.so (0x00002b4fbba0c000)
libsvml.so => /cluster/apps/intel/parallel_studio_xe_2020_r0/compilers_and_libraries_2020.0.166/linux/compiler/lib/intel64/libsvml.so (0x00002b4fbbfaa000)
libirng.so => /cluster/apps/intel/parallel_studio_xe_2020_r0/compilers_and_libraries_2020.0.166/linux/compiler/lib/intel64/libirng.so (0x00002b4fbd931000)
libintlc.so.5 => /cluster/apps/intel/parallel_studio_xe_2020_r0/compilers_and_libraries_2020.0.166/linux/compiler/lib/intel64/libintlc.so.5 (0x00002b4fbdc9b000)

与方法 (2) 不同，二进制文件同时链接到 libiomp5 和 libgomp。我怀疑我引用了 libgomp，因为我链接到了 libfftw3_omp，它是用 GCC/8.2.0 编译的。我发现很令人困惑的是 ldd 似乎给出了与我第一次尝试方法 (3) 完全相同的链接，只是顺序似乎发生了变化（libiomp5 在 libgomp 之前）。

虽然我很高兴最终获得了一个可以工作的二进制文件，但我有一些自己无法解决的问题：

您是否像我一样解释英特尔的文档和之前的 SO 帖子，并同意英特尔 OpenMP 兼容性库适用于我的情况并且我使用了正确的工作流程？或者您认为方法（3）会导致未来的灾难吗？
你们中有人对Intel的C++编译器有更多的经验并且见过方法(1)中描述的错误代码吗？（请参阅下面的更新）
您认为值得研究一下我是否可以通过手动链接到仅依赖于 libiomp5 的 Intel 编译的 libfftw3_omp 来完全摆脱 libgomp 吗？（请参阅下面的更新）
您是否可以解释为什么在某些情况下使用方法（2）创建线程失败？

预先非常感谢您！

// 更新： 与此同时，我设法调整方法 (3)，不链接 GCC/8.2.0 编译的 gsl 和 fftw，而是使用 Intel/19.1.0 编译的 gsl 和 fftw。与我之前获得的二进制文件相比，生成的二进制文件的速度相似，但是，仅链接到 libiomp5.so，这对我来说似乎是更干净的解决方案。

// 更新： 手动排除从 CMakeLists.txt 引发内部错误的文件的编译器优化 (CMake：如何禁用单个 *.cpp 文件的优化？）给了我一个工作二进制文件，但是带有链接器警告。

原文

I have a scientific C++ application that is parallelized with OpenMP and compiled typically with GCC/8.2.0. The application further depends on gsl and fftw, the latter using OpenMP as well. The application uses a C API to access a Fortran library that is parallelized with OpenMP as well and can use either Intel's MKL or openblas as backend. Compilation of the library is preferred using the Intel/19.1.0 toolchain. I have successfully compiled, linked, and tested everything using GCC/8.2.0 and openblas (as base line). However, test studies on minimal examples suggest MKL with Intel would be faster and speed is important for my use case.

icc --version gives me: icc (ICC) 19.1.0.166 20191121; operating system is CentOS 7. Bear in mind I'm on a cluster and have limited control on what I can install. Software is centrally managed using spack and environments are loaded by specification of a compiler layer (only one at a time).

I have considered different approaches how to get the Intel/MKL library into my code:

Compile C++ and Fortran code using the Intel toolchain. While that's probably the tidiest solution, the compiler throws "internal error: 20000_7001" for a particular file with a OMP include. I could not find documentation for that particular error code and have not gotten feedback from Intel either (https://community.intel.com/t5/Intel-C-Compiler/Compilation-error-internal-error-20000-7001/m-p/1365208#M39785). I allocated > 80 GB of memory for compilation as I have experienced the compiler break down before when limited resources were available. Maybe someone here has seen that error code?
Compile C++ and Fortran code with GCC/8.2.0 but link dynamically to Intel compiled MKL as backend for the Fortran library. I managed to do that from the GCC/8.2.0 layer and extension of LIBRARY_PATH and LD_LIBRARY_PATH to where MKL lives on the cluster. It seems like only GNU OMP is linked and MKL was found. Analysis shows that CPU load is quite low (but higher than the binary with the GCC/8.2.0 + openblas set-up). Execution time of my program is improved by ~30%. However, I got this runtime error in at least one case when I run the binary with 20 cores: libgomp: Thread creation failed: Resource temporarily unavailable.
Sticking with GCC/8.2.0 for my C++ code and linking dynamically to the precompiled Fortran library that was compiled itself with Intel/MKL using Intel OMP. This approach turned out to be tricky. As with approach (2), I loaded the GCC environment and manually expanded LD_LIBRARY_PATH. A minimal example that is not OMP parallelized itself worked beautifully out of the box. However, even though I managed to compile and link my C++ program as well, I got an immediate runtime error once the OMP call in the Fortran library occurs.

Here is the output of ldd of the compiled C++ code:

linux-vdso.so.1 => (0x00007fff2d7bb000)
libpthread.so.0 => /lib64/libpthread.so.0 (0x00002ab227c25000)
libgsl.so.25 => /cluster/apps/gcc-8.2.0/gsl-2.6-x4nsmnz6sgpkm7ksorpmc2qdrkdxym22/lib/libgsl.so.25 (0x00002ab227e41000)
libgslcblas.so.0 => /cluster/apps/gcc-8.2.0/gsl-2.6-x4nsmnz6sgpkm7ksorpmc2qdrkdxym22/lib/libgslcblas.so.0 (0x00002ab228337000)
libfftw3.so.3 => /cluster/apps/gcc-8.2.0/fftw-3.3.9-w5zvgavdpyt5z3ryppx3uwbfg27al4v6/lib/libfftw3.so.3 (0x00002ab228595000)
libz.so.1 => /lib64/libz.so.1 (0x00002ab228a36000)
libfftw3_omp.so.3 => /cluster/apps/gcc-8.2.0/fftw-3.3.9-w5zvgavdpyt5z3ryppx3uwbfg27al4v6/lib/libfftw3_omp.so.3 (0x00002ab228c4c000)
libxtb.so.6 => /cluster/project/igc/iridiumcc/intel-19.1.0/xtb/build/libxtb.so.6 (0x00002ab228e53000)
libstdc++.so.6 => /cluster/spack/apps/linux-centos7-x86_64/gcc-4.8.5/gcc-8.2.0-6xqov2fhvbmehix42slain67vprec3fs/lib64/libstdc++.so.6 (0x00002ab22a16d000)
libm.so.6 => /lib64/libm.so.6 (0x00002ab22a4f1000)
libgomp.so.1 => /cluster/spack/apps/linux-centos7-x86_64/gcc-4.8.5/gcc-8.2.0-6xqov2fhvbmehix42slain67vprec3fs/lib64/libgomp.so.1 (0x00002ab22a7f3000)
libgcc_s.so.1 => /cluster/spack/apps/linux-centos7-x86_64/gcc-4.8.5/gcc-8.2.0-6xqov2fhvbmehix42slain67vprec3fs/lib64/libgcc_s.so.1 (0x00002ab22aa21000)
libc.so.6 => /lib64/libc.so.6 (0x00002ab22ac39000)
/lib64/ld-linux-x86-64.so.2 (0x00002ab227a01000)
libmkl_intel_lp64.so => /cluster/apps/intel/parallel_studio_xe_2020_r0/compilers_and_libraries_2020.0.166/linux/mkl/lib/intel64/libmkl_intel_lp64.so (0x00002ab22b007000)
libmkl_intel_thread.so => /cluster/apps/intel/parallel_studio_xe_2020_r0/compilers_and_libraries_2020.0.166/linux/mkl/lib/intel64/libmkl_intel_thread.so (0x00002ab22bb73000)
libmkl_core.so => /cluster/apps/intel/parallel_studio_xe_2020_r0/compilers_and_libraries_2020.0.166/linux/mkl/lib/intel64/libmkl_core.so (0x00002ab22e0df000)
libifcore.so.5 => /cluster/apps/intel/parallel_studio_xe_2020_r0/compilers_and_libraries_2020.0.166/linux/compiler/lib/intel64/libifcore.so.5 (0x00002ab2323ff000)
libimf.so => /cluster/apps/intel/parallel_studio_xe_2020_r0/compilers_and_libraries_2020.0.166/linux/compiler/lib/intel64/libimf.so (0x00002ab232763000)
libsvml.so => /cluster/apps/intel/parallel_studio_xe_2020_r0/compilers_and_libraries_2020.0.166/linux/compiler/lib/intel64/libsvml.so (0x00002ab232d01000)
libirng.so => /cluster/apps/intel/parallel_studio_xe_2020_r0/compilers_and_libraries_2020.0.166/linux/compiler/lib/intel64/libirng.so (0x00002ab234688000)
libiomp5.so => /cluster/apps/intel/parallel_studio_xe_2020_r0/compilers_and_libraries_2020.0.166/linux/compiler/lib/intel64/libiomp5.so (0x00002ab2349f2000)
libintlc.so.5 => /cluster/apps/intel/parallel_studio_xe_2020_r0/compilers_and_libraries_2020.0.166/linux/compiler/lib/intel64/libintlc.so.5 (0x00002ab234de2000)
libdl.so.2 => /lib64/libdl.so.2 (0x00002ab235059000)

I did some research and found interesting discussions here and at Intel's documentation regarding crashes with two different OMP implementations:

Telling GCC to *not* link libgomp so it links libiomp5 instead
https://www.intel.com/content/www/us/en/develop/documentation/cpp-compiler-developer-guide-and-reference/top/optimization-and-programming-guide/openmp-support/openmp-library-support/using-the-openmp-libraries.html
http://www.nacad.ufrj.br/online/intel/Documentation/en_US/compiler_c/main_cls/optaps/common/optaps_par_compat_libs_using.htm

I followed the guidelines provided for the Intel OpenMP compatibility libraries. Compilation of my C++ code was done from the GCC environment using the -fopenmp flag as always. During the linking stage (g++), I took the same linker command I usually take but replaced -fopenmp by -L/cluster/apps/intel/parallel_studio_xe_2020_r0/compilers_and_libraries_2020.0.166/linux/compiler/lib/intel64 -liomp5 -lpthread. The resulting binary runs like a charm and is roughly twice as fast as my original built (GCC/openblas).

Here is the output of ldd of the compiled C++ code:

linux-vdso.so.1 =>  (0x00007ffd7eb9a000)
libiomp5.so => /cluster/apps/intel/parallel_studio_xe_2020_r0/compilers_and_libraries_2020.0.166/linux/compiler/lib/intel64/libiomp5.so (0x00002b4fb08da000)
libpthread.so.0 => /lib64/libpthread.so.0 (0x00002b4fb0cca000)
libgsl.so.25 => /cluster/apps/gcc-8.2.0/gsl-2.6-x4nsmnz6sgpkm7ksorpmc2qdrkdxym22/lib/libgsl.so.25 (0x00002b4fb0ee6000)
libgslcblas.so.0 => /cluster/apps/gcc-8.2.0/gsl-2.6-x4nsmnz6sgpkm7ksorpmc2qdrkdxym22/lib/libgslcblas.so.0 (0x00002b4fb13dc000)
libfftw3.so.3 => /cluster/apps/gcc-8.2.0/fftw-3.3.9-w5zvgavdpyt5z3ryppx3uwbfg27al4v6/lib/libfftw3.so.3 (0x00002b4fb163a000)
libz.so.1 => /lib64/libz.so.1 (0x00002b4fb1adb000)
libfftw3_omp.so.3 => /cluster/apps/gcc-8.2.0/fftw-3.3.9-w5zvgavdpyt5z3ryppx3uwbfg27al4v6/lib/libfftw3_omp.so.3 (0x00002b4fb1cf1000)
libxtb.so.6 => /cluster/project/igc/iridiumcc/intel-19.1.0/xtb/build/libxtb.so.6 (0x00002b4fb1ef8000)
libstdc++.so.6 => /cluster/spack/apps/linux-centos7-x86_64/gcc-4.8.5/gcc-8.2.0-6xqov2fhvbmehix42slain67vprec3fs/lib64/libstdc++.so.6 (0x00002b4fb3212000)
libm.so.6 => /lib64/libm.so.6 (0x00002b4fb3596000)
libgcc_s.so.1 => /cluster/spack/apps/linux-centos7-x86_64/gcc-4.8.5/gcc-8.2.0-6xqov2fhvbmehix42slain67vprec3fs/lib64/libgcc_s.so.1 (0x00002b4fb3898000)
libc.so.6 => /lib64/libc.so.6 (0x00002b4fb3ab0000)
/lib64/ld-linux-x86-64.so.2 (0x00002b4fb06b6000)
libdl.so.2 => /lib64/libdl.so.2 (0x00002b4fb3e7e000)
libgomp.so.1 => /cluster/spack/apps/linux-centos7-x86_64/gcc-4.8.5/gcc-8.2.0-6xqov2fhvbmehix42slain67vprec3fs/lib64/libgomp.so.1 (0x00002b4fb4082000)
libmkl_intel_lp64.so => /cluster/apps/intel/parallel_studio_xe_2020_r0/compilers_and_libraries_2020.0.166/linux/mkl/lib/intel64/libmkl_intel_lp64.so (0x00002b4fb42b0000)
libmkl_intel_thread.so => /cluster/apps/intel/parallel_studio_xe_2020_r0/compilers_and_libraries_2020.0.166/linux/mkl/lib/intel64/libmkl_intel_thread.so (0x00002b4fb4e1c000)
libmkl_core.so => /cluster/apps/intel/parallel_studio_xe_2020_r0/compilers_and_libraries_2020.0.166/linux/mkl/lib/intel64/libmkl_core.so (0x00002b4fb7388000)
libifcore.so.5 => /cluster/apps/intel/parallel_studio_xe_2020_r0/compilers_and_libraries_2020.0.166/linux/compiler/lib/intel64/libifcore.so.5 (0x00002b4fbb6a8000)
libimf.so => /cluster/apps/intel/parallel_studio_xe_2020_r0/compilers_and_libraries_2020.0.166/linux/compiler/lib/intel64/libimf.so (0x00002b4fbba0c000)
libsvml.so => /cluster/apps/intel/parallel_studio_xe_2020_r0/compilers_and_libraries_2020.0.166/linux/compiler/lib/intel64/libsvml.so (0x00002b4fbbfaa000)
libirng.so => /cluster/apps/intel/parallel_studio_xe_2020_r0/compilers_and_libraries_2020.0.166/linux/compiler/lib/intel64/libirng.so (0x00002b4fbd931000)
libintlc.so.5 => /cluster/apps/intel/parallel_studio_xe_2020_r0/compilers_and_libraries_2020.0.166/linux/compiler/lib/intel64/libintlc.so.5 (0x00002b4fbdc9b000)

Unlike in approach (2), the binary is linked against both libiomp5 and libgomp. I suspect that I get references to libgomp because I link to libfftw3_omp, which was compiled with GCC/8.2.0. I find it quite puzzing that ldd seems to give the exact same links as for my first attempt with approach (3), only the order seems to have changed (libiomp5 before libgomp).

While I am quite happy to have gotten a working binary in the end, I have some questions I could not resolve by myself:

do you interpret Intel's documentation and the previous SO post like I do and agree that the Intel OpenMP compatibility libraries are applicable in my case and that I have used the correct workflow? Or do you think approach (3) is a recipe for disaster in the future?
does any of you have more experience with Intel's C++ compiler and has seen the error code described in approach (1)? (see update below)
do you think it's worth investigating whether I can completely get rid of libgomp by, for example, manually linking to Intel compiled libfftw3_omp that only depends on libiomp5? (see update below)
do you have an explanation why thread creation fails in some cases using approach (2)?

Thank you very much in advance!

// Update: In the meantime I managed to tweak approach (3) by not linking against GCC/8.2.0 compiled gsl and fftw but used instead Intel/19.1.0 compiled gsl and fftw. The resulting binary is similar in speed compared to what I have gotten before, however, links only to libiomp5.so, which seems like the cleaner solution to me.

// Update: Manual exclusion of compiler optimizations for files that throw internal errors from CMakeLists.txt (CMake: how to disable optimization of a single *.cpp file?) gave me a working binary, however, with linker warnings.

分享到QQ

分享到微博