由于段错误，我的程序在 fflush 上崩溃，...但并非总是如此？

发布于 2024-10-04 03:07:14 字数 1825 浏览 12 评论 0原文

对于标题中描述的情况，您知道哪些可能的原因？这是我的 bt 的样子：

#0  0x00a40089 in ?? ()
#1  0x09e3fac0 in ?? ()
#2  0x09e34f30 in ?? ()
#3  0xb7ef9074 in ?? ()
#4  0xb7ef9200 in ?? ()
#5  0xb7ef9028 in ?? ()
#6  0x081d45a0 in LogFile::Flush ()
#7  0x081d45a0 in LogFile::Flush ()
#8  0x081d46e0 in LogFile::Close ()
#9  0x081d4dbf in LogFile::OpenLogFile ()
#10 0x081d4eb9 in LogFile::PerformPeriodicalFlush ()
#11 0x081d4fca in LogFile::StoreRecord ()
#12 0x081d50c2 in LogFile::StoreRecord ()

它给了我程序终止于信号 11，分段错误。

fflush() 的包装很简单，什么也不做，只是调用 fflash 和检查错误（如果返回的代码是 <0 ）。所以，我猜测seg错误是由fflash引起的。或者有可能在其他地方，因为 ?? 位于堆栈顶部？

操作系统：RHEL5； gcc版本3.4.6 20060404（红帽3.4.6-3）；使用 gdb 进行调试，原始 exe 中包含最大调试信息。

我知道磁盘上没有空间会出现段错误，但情况并非如此（因为我有一个应用程序看门狗，它会再次重新启动程序，一切都保持正常工作）。

任何想法都会有帮助。谢谢。

编辑


void LogFile::PerformPeriodicalFlush( const utils::dt::TimeStamp& tsNow )
throw( LibCException )
{
 m_tsLastPeriodicalCheck = tsNow;

 struct stat LogFileStat;
 int nResult = stat( m_sCurrentFullFileName.c_str(), &LogFileStat );
 if ( 0 == nResult && S_ISREG( LogFileStat.st_mode ) )
 {
  //we successfuly stated the file, so it exists. We can safely perform 
  //a flush.
  try
  {
   Flush();
   return;
  }
  catch ( LibCException& )
  {
   OpenLogFile( tsNow );
   return;
  }
 }
 else
 {
  OpenLogFile( tsNow );
 }
}
void RotatingLogFile::Flush() throw( object::LibCException )
{
    if ( m_pFile != NULL )
    {
        if ( fflush( m_pFile ) (less_than) 0 )
        {
            throw object::LibCException();
        }
    }
}

注意无法粘贴整个代码，它是 10 多个代码的一部分。而且，这在不同的应用程序、实时系统上已经运行了多年。这种事故非常非常罕见——每年大约发生两次。所以，我认为这不是代码中的问题。我知道没有人可以帮助我解决这类问题，这就是为什么我只是寻求任何想法，为什么 fflush 可能会导致段错误。

原文

What possible reasons do you know for the situation, described in the title? Here's what my bt looks like:

#0  0x00a40089 in ?? ()
#1  0x09e3fac0 in ?? ()
#2  0x09e34f30 in ?? ()
#3  0xb7ef9074 in ?? ()
#4  0xb7ef9200 in ?? ()
#5  0xb7ef9028 in ?? ()
#6  0x081d45a0 in LogFile::Flush ()
#7  0x081d45a0 in LogFile::Flush ()
#8  0x081d46e0 in LogFile::Close ()
#9  0x081d4dbf in LogFile::OpenLogFile ()
#10 0x081d4eb9 in LogFile::PerformPeriodicalFlush ()
#11 0x081d4fca in LogFile::StoreRecord ()
#12 0x081d50c2 in LogFile::StoreRecord ()

and it gives me Program terminated with signal 11, Segmentation fault.

The wrapper around fflush() is simple, does nothing, just calls fflash and check for errors (if the returned code is <0 ). So, I guess the seg fault is caused by fflash. Or it's possible to be somewhere else, because of the ?? at the top of the stack?

OS: RHEL5; gcc version 3.4.6 20060404 (Red Hat 3.4.6-3); debugged with gdb, with the original exe with max debug information in it.

I know about seg fault on no space on the disk, but this is not this case (as I have a watch-dog for the application, that restarts the program again and everything keeps working just fine).

Any ideas would be helpful.
Thanks.

EDIT


void LogFile::PerformPeriodicalFlush( const utils::dt::TimeStamp& tsNow )
throw( LibCException )
{
 m_tsLastPeriodicalCheck = tsNow;

 struct stat LogFileStat;
 int nResult = stat( m_sCurrentFullFileName.c_str(), &LogFileStat );
 if ( 0 == nResult && S_ISREG( LogFileStat.st_mode ) )
 {
  //we successfuly stated the file, so it exists. We can safely perform 
  //a flush.
  try
  {
   Flush();
   return;
  }
  catch ( LibCException& )
  {
   OpenLogFile( tsNow );
   return;
  }
 }
 else
 {
  OpenLogFile( tsNow );
 }
}
void RotatingLogFile::Flush() throw( object::LibCException )
{
    if ( m_pFile != NULL )
    {
        if ( fflush( m_pFile ) (less_than) 0 )
        {
            throw object::LibCException();
        }
    }
}

NOTE can't paste the whole code, it's a part of 10+ thousands of code. Also this is working for years on different applications, on real-time systems. Such crashes are very, very rare - kinda twice a year. So, I don't think this is problem in the code. I know that noone can help me with this kind of stuff, that's why I'm just asking for any ideas, why fflush may cause seg fault.

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

紫罗兰の梦幻 2024-10-11 03:07:14

我的猜测：你的某个地方有内存损坏，并且 LogFile 的“this”指向你无法访问的内存区域。

无论如何，没有代码很难判断。

回复收藏 0 原文

流心雨 2024-10-11 03:07:14

看来，由于某些原因，权限出现了一些奇怪的情况（不确定到底是什么），但这是在小时变化时发生的，因为每个小时都会写入不同的文件。因此，在某种程度上，文件已创建，但没有写入权限，或类似的情况。没有人真正理解发生了什么、为什么以及如何发生（因为崩溃后，应用程序重新启动，一切都很好）。因此，由于没有权限执行此操作，flush 崩溃了。

这仍然是个谜..但已经解决了 xD

回复收藏 0 原文