我正在创建一个自定义库(用C ++编写),该库与Arpack-ng一起做一些数字功能。该函数包裹在PYBIND11中,以在软件包中从Python提供对该方法的访问。我观察到奇怪的行为。
一个问题
当调用我的方法之前导入numpy时,发生了
import numpy as np
from mylib import mymethod
mymethod() # Segfault
时,发生了一个segfault。如果进口订单更改,也会产生相同的结果。
from mylib import mymethod
import numpy as np
mymethod() # Segfault
当调用我的方法后导入numpy时,一切正常。
from mylib import mymethod
mymethod() # Works fine
import numpy as np
# Further calls to NumPy or my library works also.
GDB跟踪
回溯如这样。
#0 0x00007fffec59d2ef in mkl_blas.cdotc () from /home/myname/.conda/envs/mylib/lib/./libmkl_intel_lp64.so.1
#1 0x00007ffff7281974 in cneupd_ () from /home/myname/.conda/envs/mylib/lib/libarpack.so.2
#2 0x00007ffff72af228 in cneupd_c () from /home/myname/.conda/envs/mylib/lib/libarpack.so.2
#3 0x00007ffff76b25cd in void complex_symmetric_runner<float>(double const&) ()
from /home/myname/Documents/mylib/build/lib.linux-x86_64-3.9/mylib/libmylib.so
#4 0x00007ffff76b102b in mymethod() ()
可复制示例
测试代码本质上与,主要方法替换为 myMethod()
。最小的绑定代码是
#include<pybind11/pybind11.h>
// ARPACK-NG C++ example code goes here. The main method is replaced with mymethod so it can be called from pybind11.
void mymethod(){
// ...Contents of the main() function in the example ARPACK-NG code...
}
PYBIND11_MODULE(mylib, m){
m.def("mymethod", mymethod);
}
我对问题的猜测。
我相信Numpy和Mkl的初始化存在一些问题,类似于此问题。从我收集的内容来看,Numpy通过 mkl_rt
通过 libmkl_rt.so
动态地链接到MKL,如下所示。
import numpy
numpy.show_config()
blas_mkl_info:
libraries = ['mkl_rt', 'pthread']
library_dirs = ['/home/myname/.conda/envs/mylib/lib']
define_macros = [('SCIPY_MKL_H', None), ('HAVE_CBLAS', None)]
include_dirs = ['/home/myname/.conda/envs/mylib/include']
blas_opt_info:
libraries = ['mkl_rt', 'pthread']
library_dirs = ['/home/myname/.conda/envs/mylib/lib']
define_macros = [('SCIPY_MKL_H', None), ('HAVE_CBLAS', None)]
include_dirs = ['/home/myname/.conda/envs/mylib/include']
lapack_mkl_info:
libraries = ['mkl_rt', 'pthread']
library_dirs = ['/home/myname/.conda/envs/mylib/lib']
define_macros = [('SCIPY_MKL_H', None), ('HAVE_CBLAS', None)]
include_dirs = ['/home/myname/.conda/envs/mylib/include']
lapack_opt_info:
libraries = ['mkl_rt', 'pthread']
library_dirs = ['/home/myname/.conda/envs/mylib/lib']
define_macros = [('SCIPY_MKL_H', None), ('HAVE_CBLAS', None)]
include_dirs = ['/home/myname/.conda/envs/mylib/include']
Supported SIMD extensions in this NumPy install:
baseline = SSE,SSE2,SSE3
found = SSSE3,SSE41,POPCNT,SSE42,AVX,F16C,FMA3,AVX2
not found = AVX512F,AVX512CD,AVX512_KNL,AVX512_KNM,AVX512_SKX,AVX512_CNL
我的库通过Arpack-NG的共享库链接到它,并且根据GDB跟踪,最终链接到 libmkl_intel_lp64.so
。但是,这是令人困惑的,因为当我键入 ldd/home/myname/.conda/envs/mylib/lib/lib/libarpack.so.2
没有提及 libarpack.so 链接到MKL。
linux-vdso.so.1 (0x0000697945afd000)
libblas.so.3 => /home/myname/.conda/envs/mylib/lib/./libblas.so.3 (0x0000697945200000)
libgfortran.so.4 => /home/myname/.conda/envs/mylib/lib/./libgfortran.so.4 (0x0000697945972000)
libm.so.6 => /usr/lib/libm.so.6 (0x00006979450db000)
libc.so.6 => /usr/lib/libc.so.6 (0x0000697944ed1000)
libdl.so.2 => /usr/lib/libdl.so.2 (0x000069794596d000)
libquadmath.so.0 => /home/myname/.conda/envs/mylib/lib/./libquadmath.so.0 (0x0000697944e97000)
libgcc_s.so.1 => /home/myname/.conda/envs/mylib/lib/./libgcc_s.so.1 (0x0000697944e82000)
/usr/lib64/ld-linux-x86-64.so.2 (0x0000697945aff000)
如果我猜测正在发生的事情,Numpy正在检查是否在导入时是否加载了一些Blas库。如果我的代码首先称为 libblas.so
被加载,而Numpy恰好使用它。但是,如果首先导入numpy,它将在MKL中加载BLAS库,该库以某种方式干扰了 libarpack.so
。
我的评估是否正确,有没有办法解决这个问题?
I am creating a custom library (written in C++) that does some numerical stuff with ARPACK-NG. The function is wrapped in pybind11 to provide access to the method from Python in a package. I observe strange behavior.
Overview of the issue
When NumPy is imported before my method is called, a segfault occurs.
import numpy as np
from mylib import mymethod
mymethod() # Segfault
The same also results if the import order changes.
from mylib import mymethod
import numpy as np
mymethod() # Segfault
When NumPy is imported after my method is called, everything works fine.
from mylib import mymethod
mymethod() # Works fine
import numpy as np
# Further calls to NumPy or my library works also.
GDB Trace
The backtrace looks like this.
#0 0x00007fffec59d2ef in mkl_blas.cdotc () from /home/myname/.conda/envs/mylib/lib/./libmkl_intel_lp64.so.1
#1 0x00007ffff7281974 in cneupd_ () from /home/myname/.conda/envs/mylib/lib/libarpack.so.2
#2 0x00007ffff72af228 in cneupd_c () from /home/myname/.conda/envs/mylib/lib/libarpack.so.2
#3 0x00007ffff76b25cd in void complex_symmetric_runner<float>(double const&) ()
from /home/myname/Documents/mylib/build/lib.linux-x86_64-3.9/mylib/libmylib.so
#4 0x00007ffff76b102b in mymethod() ()
Replicable example
The test code is essentially the same as the C++ example provided by ARPACK-NG, with the main method replaced by mymethod()
. The minimal binding code is
#include<pybind11/pybind11.h>
// ARPACK-NG C++ example code goes here. The main method is replaced with mymethod so it can be called from pybind11.
void mymethod(){
// ...Contents of the main() function in the example ARPACK-NG code...
}
PYBIND11_MODULE(mylib, m){
m.def("mymethod", mymethod);
}
My guess at what the issue is.
I believe there is some issue with NumPy and MKL's initialization, similar to this issue. From what I've gathered, NumPy links to MKL through mkl_rt
dynamically through libmkl_rt.so
, as shown in the NumPy config below.
import numpy
numpy.show_config()
blas_mkl_info:
libraries = ['mkl_rt', 'pthread']
library_dirs = ['/home/myname/.conda/envs/mylib/lib']
define_macros = [('SCIPY_MKL_H', None), ('HAVE_CBLAS', None)]
include_dirs = ['/home/myname/.conda/envs/mylib/include']
blas_opt_info:
libraries = ['mkl_rt', 'pthread']
library_dirs = ['/home/myname/.conda/envs/mylib/lib']
define_macros = [('SCIPY_MKL_H', None), ('HAVE_CBLAS', None)]
include_dirs = ['/home/myname/.conda/envs/mylib/include']
lapack_mkl_info:
libraries = ['mkl_rt', 'pthread']
library_dirs = ['/home/myname/.conda/envs/mylib/lib']
define_macros = [('SCIPY_MKL_H', None), ('HAVE_CBLAS', None)]
include_dirs = ['/home/myname/.conda/envs/mylib/include']
lapack_opt_info:
libraries = ['mkl_rt', 'pthread']
library_dirs = ['/home/myname/.conda/envs/mylib/lib']
define_macros = [('SCIPY_MKL_H', None), ('HAVE_CBLAS', None)]
include_dirs = ['/home/myname/.conda/envs/mylib/include']
Supported SIMD extensions in this NumPy install:
baseline = SSE,SSE2,SSE3
found = SSSE3,SSE41,POPCNT,SSE42,AVX,F16C,FMA3,AVX2
not found = AVX512F,AVX512CD,AVX512_KNL,AVX512_KNM,AVX512_SKX,AVX512_CNL
My library links to it through ARPACK-NG's shared library dynamically, and per the GDB trace, ends up linking to libmkl_intel_lp64.so
. This is confusing, however, because when I type ldd /home/myname/.conda/envs/mylib/lib/libarpack.so.2
there is no mention of libarpack.so
linking to MKL.
linux-vdso.so.1 (0x0000697945afd000)
libblas.so.3 => /home/myname/.conda/envs/mylib/lib/./libblas.so.3 (0x0000697945200000)
libgfortran.so.4 => /home/myname/.conda/envs/mylib/lib/./libgfortran.so.4 (0x0000697945972000)
libm.so.6 => /usr/lib/libm.so.6 (0x00006979450db000)
libc.so.6 => /usr/lib/libc.so.6 (0x0000697944ed1000)
libdl.so.2 => /usr/lib/libdl.so.2 (0x000069794596d000)
libquadmath.so.0 => /home/myname/.conda/envs/mylib/lib/./libquadmath.so.0 (0x0000697944e97000)
libgcc_s.so.1 => /home/myname/.conda/envs/mylib/lib/./libgcc_s.so.1 (0x0000697944e82000)
/usr/lib64/ld-linux-x86-64.so.2 (0x0000697945aff000)
If I were to guess what is happening, NumPy is checking to see if some BLAS library is loaded in when it is imported. If my code is called first, libblas.so
is loaded in by it and NumPy happens to use that. However, if NumPy is imported first, it loads in MKL for the BLAS library, which somehow interferes with libarpack.so
.
Is my assessment correct and is there a way to solve this problem?
发布评论
评论(1)
据我所知,我认为是问题的根本原因是正确的。我已经找到了一个解决方案,尽管并不完全令人满意,但仍解决了问题:使用
nomkl
软件包实例化Anaconda环境(即代码>)。 Numpy将不再尝试将Blas从Arpack-NG下方换成。As far as I can tell, what I believed to be the root cause of the issue is correct. I have arrived at a solution that, while not completely satisfactory, nonetheless solves the problem: instantiate the Anaconda environment with the
nomkl
package (i.e.conda create -n mylib_nomkl nomkl python=3.9 numpy
). NumPy will no longer try to swap out the BLAS out from under ARPACK-NG.