是什么导致 sprof 抱怨“ld.so 检测到的不一致”?

发布于 2024-11-13 00:32:31 字数 480 浏览 7 评论 0原文

我正在尝试使用 sprof 来分析一些软件(ossim),其中几乎所有代码都在共享库中。我已经生成了一个分析文件,但是当我运行 sprof 时,出现以下错误:

> sprof /home/eca7215/usr/lib/libossim.so.1 libossim.so.1.profile -p > log
Inconsistency detected by ld.so: dl-open.c: 612: _dl_open: Assertion `_dl_debug_initialize (0, args.nsid)->r_state == RT_CONSISTENT' failed!

我遵循的说明说我需要 libc 版本至少为 2.5-34,我有 libc 版本 2.12.2(Gentoo,内核 2.6)。 36-r5)。

我找不到任何关于错误含义或(更有趣的是)如何修复它的解释,唯一半相关的谷歌结果是旧版本 Skype 中的错误。

I'm trying to use sprof to profile some software (ossim) where almost all the code is in a shared library. I've generated a profiling file, but when I run sprof, I get the following error:

> sprof /home/eca7215/usr/lib/libossim.so.1 libossim.so.1.profile -p > log
Inconsistency detected by ld.so: dl-open.c: 612: _dl_open: Assertion `_dl_debug_initialize (0, args.nsid)->r_state == RT_CONSISTENT' failed!

The instructions I was following said that I needed libc version at least 2.5-34, I have libc version 2.12.2 (Gentoo, kernel 2.6.36-r5).

I can't find any explanation as to what the error means or (more interestingly) how to fix it, the only half-relevant google results are for a bug in an old version of Skype.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

演出会有结束 2024-11-20 00:32:31

我有点好奇,因为这在 OpenSuse 12.x 中仍然被破坏。我本以为最初在 09 年左右报告的错误现在已经被修复了。我猜没有人真正使用 sprof。 (或者也许 dl-open 太脆弱了,人们不敢碰它:-)

问题归结为 __RTLD_SPROF 标志用作 dlopen 的参数。
采用任何调用 dlopen 的简单程序,或者将该标志设置为第二个参数,您都会得到相同的失败断言。我使用了 http://linux.die.net/man/3/dlopen 底部的示例程序举个例子,

handle = dlopen(argv[1], RTLD_LAZY | __RTLD_SPROF);

通过快速查看 dl-open.c,我可以看出,这个标记使 dl_open 的某些功能短路。因此断言中指定的 r_flag 不会设置为 RT_CONSISTENT。

I got a bit curious since this is still broken in OpenSuse 12.x. I would have thought a bug originally reported in '09 or so would have been fixed by now. I guess nobody really uses sprof. (or maybe dl-open is so fragile that people are scared to touch it :-)

The issue boils down to the __RTLD_SPROF flag used as argument to dlopen.
Take any simple program that calls dlopen, or that flag to the second arg and you get the same failed assertion. I used the sample program at the bottom of http://linux.die.net/man/3/dlopen as an example

handle = dlopen(argv[1], RTLD_LAZY | __RTLD_SPROF);

From what I can tell from a quick look at dl-open.c, this flags short circuits some of what dl_open does. So the r_flag specified in the assertion doesn't get set to RT_CONSISTENT.

黯然#的苍凉 2024-11-20 00:32:31

使用多个工作人员时,我在 PyTorch DataLoader 中遇到此错误。 Python 通过启动许多进程来进行多处理,其中一个进程在以只读模式(对于 CIFAR10 数据集)读取文件时出现此错误。只需重新运行脚本就可以解决问题,因此我相信这是某种偶发的罕见操作系统错误。使用 PyTorch,如果您设置 num_workers=0 也可能有助于解决错误。

如果有人感兴趣,下面是完整的错误:

Inconsistency detected by ld.so dl-open.c   272 dl_open_worker  Assertion `_dl_debug_initialize (0, args->nsid)->r_state == RT_CONSISTENT' failed!
Traceback (most recent call last):
  File "/miniconda/envs/petridishpytorchcuda92/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 724, in _try_get_data
    data = self._data_queue.get(timeout=timeout)
  File "/miniconda/envs/petridishpytorchcuda92/lib/python3.6/queue.py", line 173, in get
    self.not_empty.wait(remaining)
  File "/miniconda/envs/petridishpytorchcuda92/lib/python3.6/threading.py", line 299, in wait
    gotit = waiter.acquire(True, timeout)
  File "/miniconda/envs/petridishpytorchcuda92/lib/python3.6/site-packages/torch/utils/data/_utils/signal_handling.py", line 66, in handler
    _error_if_any_worker_fails()
RuntimeError    DataLoader worker (pid 272) exited unexpectedly with exit code 127. Details are lost due to multiprocessing. Rerunning with num_workers=0 may give better error trace.

I got this error with PyTorch DataLoader when using multiple workers. Python does multiprocessing by launching many processes and one of the process had this error while reading a file in read-only mode (for CIFAR10 dataset). Simply re-running the script solved the issue so I believe this some sort of sporadic rare OS error. With PyTorch if you set num_workers=0 that may also help resolve the error.

Below is the full error in case anyone is interested:

Inconsistency detected by ld.so dl-open.c   272 dl_open_worker  Assertion `_dl_debug_initialize (0, args->nsid)->r_state == RT_CONSISTENT' failed!
Traceback (most recent call last):
  File "/miniconda/envs/petridishpytorchcuda92/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 724, in _try_get_data
    data = self._data_queue.get(timeout=timeout)
  File "/miniconda/envs/petridishpytorchcuda92/lib/python3.6/queue.py", line 173, in get
    self.not_empty.wait(remaining)
  File "/miniconda/envs/petridishpytorchcuda92/lib/python3.6/threading.py", line 299, in wait
    gotit = waiter.acquire(True, timeout)
  File "/miniconda/envs/petridishpytorchcuda92/lib/python3.6/site-packages/torch/utils/data/_utils/signal_handling.py", line 66, in handler
    _error_if_any_worker_fails()
RuntimeError    DataLoader worker (pid 272) exited unexpectedly with exit code 127. Details are lost due to multiprocessing. Rerunning with num_workers=0 may give better error trace.
脱离于你 2024-11-20 00:32:31

如果您使用 Docker,可能还有另一种解释。就我而言,分析数据是从 Docker 容器内运行的进程生成的,我尝试从容器内运行 sprof 并收到与问题中所述相同的错误。从主机(而不是容器)运行 sprof 解决了这个问题。

If you're using Docker, there could be another explanation. In my case the profiling data was generated from a process running inside a Docker container, I tried running sprof from within the container and received the same error as described in the question. Running sprof from the host (instead of the container) solved it.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文