SWIG C++ Ubuntu 上模块中具有多个扩展的 python(相对于 MacOS)

发布于 2025-01-09 02:23:40 字数 4262 浏览 1 评论 0原文

我正在尝试运行 C++ 代码,用 SWIG 包装到远程计算集群上的 Python 3.8 模块中(我没有 root 访问权限)。我在自己的计算机(MacOS Monterey)上编写代码,而集群运行 Ubuntu20.04。本规范使用了库 四边形nloptgsl。使用后两者,我每一步都使用 nlopt 进行优化并与 gsl 集成,但这仅适用于我自己的 Mac,在 Ubuntu 集群上,运行时有一些未定义的行为,即集成时我的变化率为零,约为 50%集成步骤(在 Mac 上一切正常)。 要将我的代码安装为 python 模块,我使用 setup.py。我目前使用的过程和模块结构如下:

  • 安装quadedge
  • 安装polygeo (依赖于quadedge 的自己的模块)
  • 安装 plantdevelopment (依赖于polygeo 的自己的模块)

Polygeo 和plantdevelopment 在一个模块中都包含两个扩展。我的Mac上的polygeo的setup.py看起来像这样:

from setuptools import setup, find_packages, Extension
from setuptools.command.build_py import build_py as _build_py
import os
import glob

class build_py(_build_py):
    def run(self):
        self.run_command("build_ext")
        return super().run()

polygeoDir = 'polygeo/'
surfnloptDir = 'polygeo/surfnlopt/'
includeDir = 'polygeo/include/'
srcDir = 'polygeo/src/'

headerFiles = glob.glob(includeDir + '*.hpp')
srcFiles = glob.glob(srcDir + '*.cpp')
srcFiles.append(polygeoDir + 'polygeo.i')

surfnloptHeaderFiles = glob.glob(surfnloptDir + 'include/*.hpp')
surfnloptSrcFiles = glob.glob(surfnloptDir + 'src/*.cpp')
surfnloptSrcFiles.append(surfnloptDir + 'surfnlopt.i')

allHeaders = []
allHeaders += headerFiles
allHeaders += surfnloptHeaderFiles

polygeoExt = Extension('_polygeo',
                       sources=srcFiles,
                       include_dirs=[includeDir],
                       library_dirs=[],
                       libraries=[],
                       swig_opts=['-c++'],
                       extra_compile_args=['-std=c++11'],)

surfnloptExt = Extension('surfnlopt._surfnlopt',
                         sources=surfnloptSrcFiles,
                         include_dirs=[includeDir,
                                       surfnloptDir + '/include'],
                         library_dirs=[],
                         libraries=['m', 'nlopt'],
                         swig_opts=['-c++'],
                         extra_compile_args=['-std=c++11'],)

setup(name='polygeo',
      version='1.0',
      packages=find_packages(),
      ext_package='polygeo',
      ext_modules=[polygeoExt,
                   surfnloptExt],
      install_requires=['quadedge'],
      headers=allHeaders,
      cmdclass = {'build_py' : build_py},
      )

有趣的是,在我的Mac上似乎没有必要添加额外的路径或任何东西,但在远程它需要我编辑.bash_profile才能设置必要的包含路径ETC.. 除此之外,在 Ubuntu 上还需要向每个扩展添加 extra_link_args=['/home/.local/lib/python3.8/site-packages/quadedge/_quadedge.cpython-38- x86_64-linux-gnu.so'],指向我已经使用四边形的 setup.py 生成的 cpython.so。

另外,我需要将polygeo扩展的源文件添加到surfnlopt的源文件中,即sources=srcFiles+surfnloptSrcFiles,否则在python中导入模块时会抛出“符号未找到”错误,因为 surfnlopt 取决于 polygeo 的源文件(如果我不包含 .so,则相同,因为它还需要来自quadedge的函数)。

为了安装 plantdevelopment,我的脚本看起来几乎相同,但是我必须包含 polygeo.so 而不是quadedge.so,并且必须添加 swig_opts=['-c++','-I'+polygeoIncDir]that 指向来自 Polygeo 设置的头文件最终所在的目录。

首先,我想知道扩展依赖于先前安装的扩展是否会导致问题,因为 Plantdevelopment 也依赖于 Polygeo 及其扩展。 再次将源文件添加到扩展中是否会因运行代码时的歧义/重复而产生影响?这可能是内存泄漏吗? 到目前为止,我尝试了以下操作:

  • 尝试使用尽可能少的包含/源代码,以避免 setup.py 中的过度定义
  • 尝试使用 hombrew 和 pyenv 在 Ubuntu 远程本地安装 python,所有软件的版本与 Mac 上的版本相同(如以及编译器)
  • 尝试了python 3.10,因为显然曾经在某些时候存在内存泄漏问题
  • 尝试在没有root权限的情况下在Mac上安装所有内容(仍然可以完美地工作)
  • 尝试将所有文​​件放入一个模块中(这并不到期工作抽象类和 C++ 代码结构)
  • 测试了没有 python 的 C++ 代码是否正常工作,以确保它与 C 库无关(不幸的是,我需要该项目的 python API,这不是我的选择)
  • 使用 guppy 内存分析器,实际上,当我导入它并打印堆状态时,一些运行(〜50%)已经正常运行,但是似乎有很多随机性......

在我的程序中,我按以下方式使用孔雀鱼:

heap = hpy()

print("Heap Status At Starting : ")
heap_status1 = heap.heap()
print("Heap Size : ", heap_status1.size, " bytes\n")
print(heap_status1)


heap.setref()

print("\nHeap Status After Setting Reference Point : ")
heap_status2 = heap.heap()
print("Heap Size : ", heap_status2.size, " bytes\n")
print(heap_status2)

我可以't真的想到为什么仅仅导入和打印堆信息就会改变我的优化/集成的结果吗?

这可能是不同操作系统的问题吗?

或者也许Python的代码结构有问题,将所有内容构建在彼此之上/在单个模块中的扩展之间具有依赖关系?

我很乐意提供任何帮助,当然也会提供任何进一步的必要信息。

I am trying to run C++ code, wrapped with SWIG into Python 3.8 modules on a remote computing cluster (where I don't have root access). I was writing the code on my own computer (MacOS Monterey) while the cluster runs Ubuntu20.04. The Code makes use of libraries quadedge, nlopt and gsl. Using the latter two I perform an optimisation with nlopt and integration with gsl every step however this only works on my own Mac, on the Ubuntu cluster there's some undefined behaviour during runtime, i.e. my rate of change when integrating is zero at around 50% of the integration steps (On the Mac everything works flawlessly).
To install my code as python modules, I use a setup.py. The procedure and module structure I use is currently the following:

  • install quadedge
  • install polygeo (own module that depends on quadedge)
  • install plantdevelopment (own module that depends on polygeo)

Both polygeo and plantdevelopment contain two extensions in one module. My setup.py for polygeo on my Mac looks like this:

from setuptools import setup, find_packages, Extension
from setuptools.command.build_py import build_py as _build_py
import os
import glob

class build_py(_build_py):
    def run(self):
        self.run_command("build_ext")
        return super().run()

polygeoDir = 'polygeo/'
surfnloptDir = 'polygeo/surfnlopt/'
includeDir = 'polygeo/include/'
srcDir = 'polygeo/src/'

headerFiles = glob.glob(includeDir + '*.hpp')
srcFiles = glob.glob(srcDir + '*.cpp')
srcFiles.append(polygeoDir + 'polygeo.i')

surfnloptHeaderFiles = glob.glob(surfnloptDir + 'include/*.hpp')
surfnloptSrcFiles = glob.glob(surfnloptDir + 'src/*.cpp')
surfnloptSrcFiles.append(surfnloptDir + 'surfnlopt.i')

allHeaders = []
allHeaders += headerFiles
allHeaders += surfnloptHeaderFiles

polygeoExt = Extension('_polygeo',
                       sources=srcFiles,
                       include_dirs=[includeDir],
                       library_dirs=[],
                       libraries=[],
                       swig_opts=['-c++'],
                       extra_compile_args=['-std=c++11'],)

surfnloptExt = Extension('surfnlopt._surfnlopt',
                         sources=surfnloptSrcFiles,
                         include_dirs=[includeDir,
                                       surfnloptDir + '/include'],
                         library_dirs=[],
                         libraries=['m', 'nlopt'],
                         swig_opts=['-c++'],
                         extra_compile_args=['-std=c++11'],)

setup(name='polygeo',
      version='1.0',
      packages=find_packages(),
      ext_package='polygeo',
      ext_modules=[polygeoExt,
                   surfnloptExt],
      install_requires=['quadedge'],
      headers=allHeaders,
      cmdclass = {'build_py' : build_py},
      )

Interestingly, it seems not to be necessary on my Mac to add extra paths or anything, but on the remote it requires me to edit the .bash_profile in order to set the necessary includepaths etc..
On top of that on Ubuntu it is required though to also add to each of the extensions the extra_link_args=['/home/.local/lib/python3.8/site-packages/quadedge/_quadedge.cpython-38-x86_64-linux-gnu.so'],that point to the cpython.so that I already generated with the setup.py for quadedge.

Also I need to add the source files of the polygeo extension to the source files of surfnlopt, i.e. sources=srcFiles+surfnloptSrcFiles, otherwise it throws a "symbols not found" error when importing the modules in python, because surfnlopt depends on the source files of polygeo (same if I don't include the .so as it also needs functions from quadedge).

For installing plantdevelopment my script looks almost the same, however there I have to include the polygeo.so instead of quadedge.so and have to add swig_opts=['-c++','-I'+polygeoIncDir]that points to the directory where the header files from the polygeo setup end up.

First thing I was wondering if it causes problems to have the extensions depend on the previously installed ones as then plantdevelopment also depends on polygeo and its extensions.
Could adding the source files to the extensions again, have an impact due to ambiguity/duplication when running the code? Could this be a memory leak?
So far I tried the following:

  • try to have as little includes/sources as possible to avoid overdefinition in the setup.py
  • try to install python locally on the Ubuntu remote with hombrew and pyenv with all software the same version as on the Mac (as well as the compiler)
  • tried out python 3.10 as there apparently used to be a memory leak issue at some point
  • tried to install everything on Mac without root privileges (still works flawlessly)
  • tried to put ALL files into one single module (which doesn't work due to abstract classes and the C++ code structure)
  • tested that the C++ code without python is working properly to make sure it's not about the C libraries (unfortunately I need the python API for the project and it's not my choice)
  • Using the guppy memory profiler, actually some of the runs (~50%) work out properly, already when I just import it and print the heap status, however there seems to be a lot of randomness...

In my program I use guppy the following way:

heap = hpy()

print("Heap Status At Starting : ")
heap_status1 = heap.heap()
print("Heap Size : ", heap_status1.size, " bytes\n")
print(heap_status1)


heap.setref()

print("\nHeap Status After Setting Reference Point : ")
heap_status2 = heap.heap()
print("Heap Size : ", heap_status2.size, " bytes\n")
print(heap_status2)

I can't really think of a reason why just importing and printing the heap info should change anything of the outcome of my optimization/integration?

Could this be an issue with the different OS?

Or maybe the code structure it problematic with python, building everything on top of each other/having dependencies between extensions in a single module?

I'd be happy for any kind of help and will of course provide any further necessary info.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。
列表为空,暂无数据
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文