fork系统调用导致分段错误
我编写了一个多线程程序,并且该线程的实现方式是分叉一个子进程,并通过该子进程加载多个模块。
在我的一次测试过程中,我发现进程(在Solaris平台中运行)一次中止并产生了分段错误。在查看转储文件时,我真的很震惊地发现,solaris 中的 fork() 系统调用导致了此分段错误。
下面是 fork() 中止时的堆栈跟踪:
(l@5) stopped in (unknown) at 0xfe524970
0xfe524970: <bad address 0xfe524970>
(/opt/SUNWspro/bin/../WS6U2/bin/sparcv9/dbx) where
[1] 0xfe524970(0xfe524970, 0x0, 0xffffffff, 0x1, 0x0, 0x0), at 0xfe52496f
[2] run_prefork(0xfecc04b8, 0xfecc04d0, 0x242f4, 0xfea5d3c8, 0x0, 0x0), at 0xfec97ce8
[3] _ti_fork1(0x1, 0x1ab18, 0x0, 0x0, 0x0, 0x0), at 0xfea5d3c8
[4] _ti_fork(0x0, 0x0, 0x0, 0x0, 0x0, 0x0), at 0xfea5d50c
任何人都可以描述为什么 Solaris 中的 fork() 系统调用会导致此行为吗?
I have written a multithreaded program and the thread was implemented in such a way that as to fork a child process and through this child process several modules were loaded.
During my testing at one time, i find process (running in solaris platform) aborts one time and it creates a segementation fault. On going through the dump file, i really shocked to see that fork() system call in solaris causes this segmentation fault.
Below was the stack trace at the time of fork() abort:
(l@5) stopped in (unknown) at 0xfe524970
0xfe524970: <bad address 0xfe524970>
(/opt/SUNWspro/bin/../WS6U2/bin/sparcv9/dbx) where
[1] 0xfe524970(0xfe524970, 0x0, 0xffffffff, 0x1, 0x0, 0x0), at 0xfe52496f
[2] run_prefork(0xfecc04b8, 0xfecc04d0, 0x242f4, 0xfea5d3c8, 0x0, 0x0), at 0xfec97ce8
[3] _ti_fork1(0x1, 0x1ab18, 0x0, 0x0, 0x0, 0x0), at 0xfea5d3c8
[4] _ti_fork(0x0, 0x0, 0x0, 0x0, 0x0, 0x0), at 0xfea5d50c
Can anyone describe why the fork() system call in solaris causes this behaviour?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
另外:
一种可能的情况是(以C++为例):
fork()
的方法fork()
之前 然后fork()
期间,内存地址无效,这或多或少可能会发生。也许您可以通过引入一些睡眠来强制这种情况...(如果您觉得可能是这种情况)
另一个问题可能是某些硬件缺陷。在进一步查看之前,我会使用一个工具来检查 RAM 运行情况并查看是否存在问题。如果是这种情况,请告诉我们。
另一种可能性是系统代码中存在错误,该错误无法解释为什么有时可以工作有时不能。对我来说,这听起来不太可能是问题所在。
PS 0xfe52496f 处的地址是奇数/不是四的倍数,这对于优化程序来说并不常见。这也是 RAM 缺陷方向的暗示...我希望我是错的,另一方面,如果我是对的,你知道该怎么做...
Addition:
A possible scenario might be (let's take C++ as an example):
fork()
fork()
fork()
according to timings this might more or less likely happen. Maybe you can enforce this situation by introducing some sleeps... (if you feel this might be the case at all)
Another issue could be some hardware defect. I would let a tool to check the RAM run and see if there are problems before looking any further. Let us know if this was the case.
Another possibility would be a bug in the system code which would not explain why sometimes it works sometimes not. For me it sounds unlikely that it is the issue.
PS the address
at 0xfe52496f
is odd/ not multiple of four which is not usual for optimized programs. That also is a hint in direction of a defect RAM... I hope I am wrong, on the other hand if I am right, you know what to do...混合使用叉子和螺纹通常是不明智的。这是因为分叉的进程只有一个线程,即调用 fork 的线程。新进程中不存在所有其他线程,这意味着任何共享内存资源都处于未知状态。另一个线程可以持有互斥锁并且永远不会释放它,等等。有一些机制可以缓解这种情况,例如
pthread_atfork
,但作为一般规则,在处理多个线程时,您应该只 fork 尽快调用exec
。您是在父进程还是新子进程中出现段错误?Mixing fork and threads is generally ill-advised. This is because the forked process will only have a single thread, the thread that called fork. All the other threads do not exist in the new process, which means that any shared memory resources are in an unknown state. Another thread could hold a mutex and never release it, etc. There are mechanisms to mitigate this such as
pthread_atfork
, but as a general rule, you should only fork to callexec
as soon as possible when working with multiple threads. Are you segfaulting in the parent process or the new child process?我认为您的堆栈或堆栈指针可能在您调用 fork 时已损坏。或者您实际上已经用完了堆栈空间,并且在调用
fork()
之前堆栈指针刚好低于该限制。在调用
fork
之前调用其他函数或使用alloca
和memset
将该区域设置为 0 的适量内存将显示这是否是这种情况,因为错误会更早出现。也有可能,如果您在一个进程的非主线程中分叉(我不熟悉 Solaris 的线程模型,所以我可能会胡言乱语),那么您已经以某种方式指定/分配了该线程(调用 < code>fork) 堆栈的方式阻止它在 fork 后被新进程访问。
这可以重复吗?它持续发生吗?
I think your stack or stack pointer may have been corrupted at the point where you make the call to fork. Either that or you have actually used up your stack space and the stack pointer is just shy of that limit before you make the call to
fork()
.Calling other functions or allocating a moderate amount of memory with
alloca
andmemset
that area to 0 just before the call tofork
would reveal if this is the case as the error would present itself earlier.It might also be possible that if you are forking in a non-main thread of a process (I'm not familiar with Solaris's threading model so I could be spouting jibberish) that you have somehow specified/allocated this thread's (the one calling
fork
) stack in such a way that prevents it from being accessible to the new process after the fork.Is this repeatable? Does it happen consistently?