可能的堆栈损坏

发布于 2024-09-28 08:06:58 字数 1655 浏览 6 评论 0原文

参考我之前关于GDB not pinpointing the SIGSEGV point的问题,

我的线程代码如下:

void *runner(void *unused)
{
 do
 {
 sem_wait(&x);
  ...

  if(/*condition 1 check*/)
  {
   sem_post(&x);
   sleep(5);
   sem_wait(&x);
   if(/*repeat condition 1 check; after atleast 5 seconds*/)
   {
    printf("LEAVING...\n");
    sem_post(&x); 
    // putting exit(0); here resolves the dilemma
    return(NULL);  
   }
  }
 sem_post(&x);
 }while(1);

}

主要代码:

sem_t x;    

int main(void)
{   
    sem_init(&x,0,1);
        ...
    pthread_t thrId;
    pthread_create(&thrId,NULL,runner,NULL);
        ...
    pthread_join(thrId,NULL);
    return(0);
}

编辑:在运行程序线程代码中添加 exit(0) 可以使故障消失。


堆栈损坏背后的原因可能是什么?

GDB 输出:(0xb7fe2b70 是运行线程 ID)

LEAVING...
Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0xb7fe2b70 (LWP 2604)]
0x00000011 in ?? ()

Valgrind 输出:

==3076== Thread 2:
==3076== Jump to the invalid address stated on the next line
==3076==    at 0x11: ???
==3076==    by 0xA26CCD: clone (clone.S:133)
==3076==  Address 0x11 is not stack'd, malloc'd or (recently) free'd
==3076== 
==3076== 
==3076== Process terminating with default action of signal 11 (SIGSEGV)
==3076==  Bad permissions for mapped region at address 0x11
==3076==    at 0x11: ???
==3076==    by 0xA26CCD: clone (clone.S:133)
==3076==  Address 0x11 is not stack'd, malloc'd or (recently) free'd

With reference to my previous question about GDB not pinpointing the SIGSEGV point,

My thread code is as follows:

void *runner(void *unused)
{
 do
 {
 sem_wait(&x);
  ...

  if(/*condition 1 check*/)
  {
   sem_post(&x);
   sleep(5);
   sem_wait(&x);
   if(/*repeat condition 1 check; after atleast 5 seconds*/)
   {
    printf("LEAVING...\n");
    sem_post(&x); 
    // putting exit(0); here resolves the dilemma
    return(NULL);  
   }
  }
 sem_post(&x);
 }while(1);

}

Main code:

sem_t x;    

int main(void)
{   
    sem_init(&x,0,1);
        ...
    pthread_t thrId;
    pthread_create(&thrId,NULL,runner,NULL);
        ...
    pthread_join(thrId,NULL);
    return(0);
}

Edit: Having an exit(0) in the runner thread code, makes the fault vanish.


What could be the reasons behind the stack corruption?

GDB Output: (0xb7fe2b70 is runner thread id)

LEAVING...
Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0xb7fe2b70 (LWP 2604)]
0x00000011 in ?? ()

Valgrind Output:

==3076== Thread 2:
==3076== Jump to the invalid address stated on the next line
==3076==    at 0x11: ???
==3076==    by 0xA26CCD: clone (clone.S:133)
==3076==  Address 0x11 is not stack'd, malloc'd or (recently) free'd
==3076== 
==3076== 
==3076== Process terminating with default action of signal 11 (SIGSEGV)
==3076==  Bad permissions for mapped region at address 0x11
==3076==    at 0x11: ???
==3076==    by 0xA26CCD: clone (clone.S:133)
==3076==  Address 0x11 is not stack'd, malloc'd or (recently) free'd

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

梦萦几度 2024-10-05 08:06:58

使用 main 函数编写一个新的源文件,该函数与您在此处发布的 main 执行相同的操作,只是不使用 pthread_create 只是调用该函数。看看是否可以独立于使用线程重新创建问题。从表面上看,您的信号量在单线程环境中应该仍然可以正常工作。

如果仍然失败,您将可以更轻松地调试它。

既然您说调用 exit 而不是返回不会产生错误,那么这表明您已经损坏了 runner 启动时堆栈上的返回地址。通过调用 exit,您不依赖此内存区域来访问退出函数(如果您返回了 pthread_exit,则调用 runner 的 pthread 库代码将调用 pthread_exit >)。我认为 valgrind 输出不是 100% 准确 - 不是由于 valgrind 中的任何错误,而是因为触发错误的位置以及触发的错误类型使得很难确定谁调用了什么。

您可能感兴趣的一些 gcc 标志:

-fstack-protector-all -Wstack-protector

如果没有 -f 选项,警告选项将不起作用。

您可能还想尝试:

-fno-omit-frame-pointer

Write a new source file with a main function that does the same things as the main you posted here except rather than using pthread_create just call the function. See if you can recreate the issue independent of using threads. From the way things look your semaphores should still work just fine in a single threaded environment.

If this still fails you will have an easier time debugging it.

Since you said that calling exit rather than returning did not yield the error it would suggest that you have corrupted either the return address that is on the stack when runner is started. By calling exit you don't rely on this memory area to get to an exiting function (if you had returned pthread_exit would have been called by the pthread library code that had called runner). I think that the valgrind output is not 100% accurate -- not due to any fault in valgrind, but because the place where you are triggering the error coupled with the type of error you are triggering makes this very difficult to be sure who called what.

Some gcc flags you may be interested in:

-fstack-protector-all -Wstack-protector

The warning option doesn't work without the -f option here.

You may also want to try:

-fno-omit-frame-pointer
濫情▎り 2024-10-05 08:06:58

代码中缺少所有重要部分,但堆栈损坏的最常见原因是:

  • 存储指向堆栈上元素的指针并在对象已经超出范围后使用它。
  • 缓冲区溢出,就像堆栈上有一个 char buffer[20] 并在边界之外写入(sprintf 是实现这一点的绝佳方法)。
  • 错误的转换,即在堆栈上有一个基类 A,将其转换为派生类并使用它。

All the important parts are missing in your code, but the most common reasons for stack corruption:

  • Storing a pointer to an element on the stack and using it after the object is already out of scope.
  • Buffer overrun, like having a char buffer[20] on the stack and writing outside the bounds (sprintf is a fantastic way to accomplish that).
  • Bad cast, i.e. having a base class A on the stack, casting it to a derived class and using it.
鲸落 2024-10-05 08:06:58

使用 valgrind 或等效的内存检查工具来解决这个问题。
别再猜测了。也停止发布不完整的代码,特别是如果您不知道它是否有问题。该错误可能位于该函数之外。例如,信号量可能未初始化。

从 valgrind 输出中,我可以建议您的 pthread_create() 行必须包含无效的函数指针。所以 pthread 跳转到那个假地址,然后崩溃。显然没有堆栈...

Use valgrind or an equivalent memory checking tool to figure it out.
Stop guessing. Also stop posting incomplete code, especially if you don't know if it has a problem or not. The bug could be outside of this function. For exemple, maybe the semaphore isn't initialized.

From the valgrind output, i can suggest that your pthread_create() line must contains a invalid function pointer. So pthread jumps to that fake address, and crashes. Obviously there is no stack ...

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文