可能的堆栈损坏
参考我之前关于GDB not pinpointing the SIGSEGV point的问题,
我的线程代码如下:
void *runner(void *unused)
{
do
{
sem_wait(&x);
...
if(/*condition 1 check*/)
{
sem_post(&x);
sleep(5);
sem_wait(&x);
if(/*repeat condition 1 check; after atleast 5 seconds*/)
{
printf("LEAVING...\n");
sem_post(&x);
// putting exit(0); here resolves the dilemma
return(NULL);
}
}
sem_post(&x);
}while(1);
}
主要代码:
sem_t x;
int main(void)
{
sem_init(&x,0,1);
...
pthread_t thrId;
pthread_create(&thrId,NULL,runner,NULL);
...
pthread_join(thrId,NULL);
return(0);
}
编辑:在运行程序线程代码中添加 exit(0) 可以使故障消失。
堆栈损坏背后的原因可能是什么?
GDB 输出:(0xb7fe2b70 是运行线程 ID)
LEAVING...
Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0xb7fe2b70 (LWP 2604)]
0x00000011 in ?? ()
Valgrind 输出:
==3076== Thread 2:
==3076== Jump to the invalid address stated on the next line
==3076== at 0x11: ???
==3076== by 0xA26CCD: clone (clone.S:133)
==3076== Address 0x11 is not stack'd, malloc'd or (recently) free'd
==3076==
==3076==
==3076== Process terminating with default action of signal 11 (SIGSEGV)
==3076== Bad permissions for mapped region at address 0x11
==3076== at 0x11: ???
==3076== by 0xA26CCD: clone (clone.S:133)
==3076== Address 0x11 is not stack'd, malloc'd or (recently) free'd
With reference to my previous question about GDB not pinpointing the SIGSEGV point,
My thread code is as follows:
void *runner(void *unused)
{
do
{
sem_wait(&x);
...
if(/*condition 1 check*/)
{
sem_post(&x);
sleep(5);
sem_wait(&x);
if(/*repeat condition 1 check; after atleast 5 seconds*/)
{
printf("LEAVING...\n");
sem_post(&x);
// putting exit(0); here resolves the dilemma
return(NULL);
}
}
sem_post(&x);
}while(1);
}
Main code:
sem_t x;
int main(void)
{
sem_init(&x,0,1);
...
pthread_t thrId;
pthread_create(&thrId,NULL,runner,NULL);
...
pthread_join(thrId,NULL);
return(0);
}
Edit: Having an exit(0) in the runner thread code, makes the fault vanish.
What could be the reasons behind the stack corruption?
GDB Output: (0xb7fe2b70 is runner thread id)
LEAVING...
Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0xb7fe2b70 (LWP 2604)]
0x00000011 in ?? ()
Valgrind Output:
==3076== Thread 2:
==3076== Jump to the invalid address stated on the next line
==3076== at 0x11: ???
==3076== by 0xA26CCD: clone (clone.S:133)
==3076== Address 0x11 is not stack'd, malloc'd or (recently) free'd
==3076==
==3076==
==3076== Process terminating with default action of signal 11 (SIGSEGV)
==3076== Bad permissions for mapped region at address 0x11
==3076== at 0x11: ???
==3076== by 0xA26CCD: clone (clone.S:133)
==3076== Address 0x11 is not stack'd, malloc'd or (recently) free'd
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
使用
main
函数编写一个新的源文件,该函数与您在此处发布的main
执行相同的操作,只是不使用pthread_create
只是调用该函数。看看是否可以独立于使用线程重新创建问题。从表面上看,您的信号量在单线程环境中应该仍然可以正常工作。如果仍然失败,您将可以更轻松地调试它。
既然您说调用
exit
而不是返回不会产生错误,那么这表明您已经损坏了runner
启动时堆栈上的返回地址。通过调用exit
,您不依赖此内存区域来访问退出函数(如果您返回了 pthread_exit,则调用runner
的 pthread 库代码将调用 pthread_exit >)。我认为 valgrind 输出不是 100% 准确 - 不是由于 valgrind 中的任何错误,而是因为触发错误的位置以及触发的错误类型使得很难确定谁调用了什么。您可能感兴趣的一些
gcc
标志:如果没有 -f 选项,警告选项将不起作用。
您可能还想尝试:
Write a new source file with a
main
function that does the same things as themain
you posted here except rather than usingpthread_create
just call the function. See if you can recreate the issue independent of using threads. From the way things look your semaphores should still work just fine in a single threaded environment.If this still fails you will have an easier time debugging it.
Since you said that calling
exit
rather than returning did not yield the error it would suggest that you have corrupted either the return address that is on the stack whenrunner
is started. By callingexit
you don't rely on this memory area to get to an exiting function (if you had returned pthread_exit would have been called by the pthread library code that had calledrunner
). I think that the valgrind output is not 100% accurate -- not due to any fault in valgrind, but because the place where you are triggering the error coupled with the type of error you are triggering makes this very difficult to be sure who called what.Some
gcc
flags you may be interested in:The warning option doesn't work without the -f option here.
You may also want to try:
代码中缺少所有重要部分,但堆栈损坏的最常见原因是:
char buffer[20]
并在边界之外写入(sprintf
是实现这一点的绝佳方法)。All the important parts are missing in your code, but the most common reasons for stack corruption:
char buffer[20]
on the stack and writing outside the bounds (sprintf
is a fantastic way to accomplish that).使用 valgrind 或等效的内存检查工具来解决这个问题。
别再猜测了。也停止发布不完整的代码,特别是如果您不知道它是否有问题。该错误可能位于该函数之外。例如,信号量可能未初始化。
从 valgrind 输出中,我可以建议您的 pthread_create() 行必须包含无效的函数指针。所以 pthread 跳转到那个假地址,然后崩溃。显然没有堆栈...
Use valgrind or an equivalent memory checking tool to figure it out.
Stop guessing. Also stop posting incomplete code, especially if you don't know if it has a problem or not. The bug could be outside of this function. For exemple, maybe the semaphore isn't initialized.
From the valgrind output, i can suggest that your
pthread_create()
line must contains a invalid function pointer. So pthread jumps to that fake address, and crashes. Obviously there is no stack ...