为什么原生程序直接执行时可以正常运行,但通过Condor提交时会失败并出现段错误
我有一个第三方库,我正在尝试将其合并到模拟中。我们有静态库 (.a),以及它的所有运行时依赖项(共享对象)。我创建了一个非常简单的应用程序(用 C 语言),该应用程序链接到该库。它所做的只是调用作为第三方库 API 一部分的初始化函数,然后退出。当我直接从命令行运行它时,它工作得很好。如果我将可执行文件提交到我们的 Condor 网格,它会因 strncpy (libc.so.6) 上的段错误而失败。我强制Condor仅在特定机器上运行可执行文件,如果我直接在该机器上运行它,它就可以正常工作。
我主要是一名 Java 程序员...本机编码经验有限。我熟悉 nm、ldd、catchsegv 等工具……以至于我可以运行它们。我真的不知道从哪里开始寻找问题。
我直接在执行机器上运行 ldd,并通过 condor 提交的脚本和我的可执行文件一起运行。 ldd 在这两种情况下报告相同的文件。
我不明白直接运行它是如何工作的,但是由秃鹰运行它会失败。最终执行程序的进程condor_startd是一个以root身份启动的进程,并将其有效uid更改为提交者。或许这也有关系吧?
I have a third party library that I'm attempting to incorporate into a simulation. We have the static library (.a), along with all of it's runtime dependencies (shared objects). I've created a very simple application (in C) that is linked against the library. All it does is call an initialization function that is part of the third party library's API, and exits. When I run this directly from the command line, it works fine. If I submit the executable to our Condor grid, it fails with a seg fault on strncpy (libc.so.6). I've forced condor to only run the executable on a particular machine, and if I run it directly on that machine, it works fine.
I'm mostly a Java programmer... limited amount of native coding experience. I'm familiar with tools such as nm, ldd, catchsegv, etc... to the point where I can run them. I don't really know where to start looking for an issue though.
I've run ldd directly on the executing machine, and via a script submitted through condor, along with my executable. ldd reports the same files in both cases.
I don't understand how running it directly would work, but it would fail being run by condor. The process that ultimately executes the program, condor_startd, is a process that starts as root, and changes its effective uid to the submitter. Perhaps this has something to do with it?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
不知道为什么这会导致问题,但罪魁祸首是 LANG 环境变量。在Condor下运行时未设置,但在本地运行时设置为US_EN.UTF-8。将此值添加到 condor 执行环境中解决了问题。
Don't know why this would cause an issue, but the culprit was the LANG environment variable. It was not set when running under Condor, but was set to US_EN.UTF-8 when running locally. Adding this value to the condor execution environment fixed the problem.