fork系统调用导致分段错误

发布于 2024-09-03 19:17:18 字数 689 浏览 9 评论 0原文

我编写了一个多线程程序,并且该线程的实现方式是分叉一个子进程,并通过该子进程加载多个模块。

在我的一次测试过程中,我发现进程(在Solaris平台中运行)一次中止并产生了分段错误。在查看转储文件时,我真的很震惊地发现,solaris 中的 fork() 系统调用导致了此分段错误。

下面是 fork() 中止时的堆栈跟踪:

(l@5) stopped in (unknown) at 0xfe524970
0xfe524970:     <bad address 0xfe524970>
(/opt/SUNWspro/bin/../WS6U2/bin/sparcv9/dbx) where
  [1] 0xfe524970(0xfe524970, 0x0, 0xffffffff, 0x1, 0x0, 0x0), at 0xfe52496f
  [2] run_prefork(0xfecc04b8, 0xfecc04d0, 0x242f4, 0xfea5d3c8, 0x0, 0x0), at 0xfec97ce8
  [3] _ti_fork1(0x1, 0x1ab18, 0x0, 0x0, 0x0, 0x0), at 0xfea5d3c8
  [4] _ti_fork(0x0, 0x0, 0x0, 0x0, 0x0, 0x0), at 0xfea5d50c

任何人都可以描述为什么 Solaris 中的 fork() 系统调用会导致此行为吗?

I have written a multithreaded program and the thread was implemented in such a way that as to fork a child process and through this child process several modules were loaded.

During my testing at one time, i find process (running in solaris platform) aborts one time and it creates a segementation fault. On going through the dump file, i really shocked to see that fork() system call in solaris causes this segmentation fault.

Below was the stack trace at the time of fork() abort:

(l@5) stopped in (unknown) at 0xfe524970
0xfe524970:     <bad address 0xfe524970>
(/opt/SUNWspro/bin/../WS6U2/bin/sparcv9/dbx) where
  [1] 0xfe524970(0xfe524970, 0x0, 0xffffffff, 0x1, 0x0, 0x0), at 0xfe52496f
  [2] run_prefork(0xfecc04b8, 0xfecc04d0, 0x242f4, 0xfea5d3c8, 0x0, 0x0), at 0xfec97ce8
  [3] _ti_fork1(0x1, 0x1ab18, 0x0, 0x0, 0x0, 0x0), at 0xfea5d3c8
  [4] _ti_fork(0x0, 0x0, 0x0, 0x0, 0x0, 0x0), at 0xfea5d50c

Can anyone describe why the fork() system call in solaris causes this behaviour?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

凉城 2024-09-10 19:17:18

另外:

一种可能的情况是(以C++为例):

  • 线程A中创建一个对象,
  • 线程B调用一个调用fork()的方法
  • ,线程A删除了该对象,而线程B仍然在在运行 fork() 之前 然后
  • ,根据时间,在实际调用 fork() 期间,内存地址无效,

这或多或少可能会发生。也许您可以通过引入一些睡眠来强制这种情况...(如果您觉得可能是这种情况)

另一个问题可能是某些硬件缺陷。在进一步查看之前,我会使用一个工具来检查 RAM 运行情况并查看是否存在问题。如果是这种情况,请告诉我们。

另一种可能性是系统代码中存在错误,该错误无法解释为什么有时可以工作有时不能。对我来说,这听起来不太可能是问题所在。

PS 0xfe52496f 处的地址是奇数/不是四的倍数,这对于优化程序来说并不常见。这也是 RAM 缺陷方向的暗示...我希望我是错的,另一方面,如果我是对的,你知道该怎么做...

Addition:

A possible scenario might be (let's take C++ as an example):

  • creation of an object in thread A
  • thread B calls a method that calls fork()
  • thread A deletes the object while thread B still is before running fork()
  • Then memory addresses are invalid during the actual call to fork()

according to timings this might more or less likely happen. Maybe you can enforce this situation by introducing some sleeps... (if you feel this might be the case at all)

Another issue could be some hardware defect. I would let a tool to check the RAM run and see if there are problems before looking any further. Let us know if this was the case.

Another possibility would be a bug in the system code which would not explain why sometimes it works sometimes not. For me it sounds unlikely that it is the issue.

PS the address at 0xfe52496f is odd/ not multiple of four which is not usual for optimized programs. That also is a hint in direction of a defect RAM... I hope I am wrong, on the other hand if I am right, you know what to do...

不羁少年 2024-09-10 19:17:18

混合使用叉子和螺纹通常是不明智的。这是因为分叉的进程只有一个线程,即调用 fork 的线程。新进程中不存在所有其他线程,这意味着任何共享内存资源都处于未知状态。另一个线程可以持有互斥锁并且永远不会释放它,等等。有一些机制可以缓解这种情况,例如 pthread_atfork,但作为一般规则,在处理多个线程时,您应该只 fork 尽快调用 exec。您是在父进程还是新子进程中出现段错误?

Mixing fork and threads is generally ill-advised. This is because the forked process will only have a single thread, the thread that called fork. All the other threads do not exist in the new process, which means that any shared memory resources are in an unknown state. Another thread could hold a mutex and never release it, etc. There are mechanisms to mitigate this such as pthread_atfork, but as a general rule, you should only fork to call exec as soon as possible when working with multiple threads. Are you segfaulting in the parent process or the new child process?

盛夏已如深秋| 2024-09-10 19:17:18

我认为您的堆栈或堆栈指针可能在您调用 fork 时已损坏。或者您实际上已经用完了堆栈空间,并且在调用 fork() 之前堆栈指针刚好低于该限制。

在调用 fork 之前调用其他函数或使用 allocamemset 将该区域设置为 0 的适量内存将显示这是否是这种情况,因为错误会更早出现。

也有可能,如果您在一个进程的非主线程中分叉(我不熟悉 Solaris 的线程模型,所以我可能会胡言乱语),那么您已经以某种方式指定/分配了该线程(调用 < code>fork) 堆栈的方式阻止它在 fork 后被新进程访问。

这可以重复吗?它持续发生吗?

I think your stack or stack pointer may have been corrupted at the point where you make the call to fork. Either that or you have actually used up your stack space and the stack pointer is just shy of that limit before you make the call to fork().

Calling other functions or allocating a moderate amount of memory with alloca and memset that area to 0 just before the call to fork would reveal if this is the case as the error would present itself earlier.

It might also be possible that if you are forking in a non-main thread of a process (I'm not familiar with Solaris's threading model so I could be spouting jibberish) that you have somehow specified/allocated this thread's (the one calling fork) stack in such a way that prevents it from being accessible to the new process after the fork.

Is this repeatable? Does it happen consistently?

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文