由于段错误,我的程序在 fflush 上崩溃,...但并非总是如此?
对于标题中描述的情况,您知道哪些可能的原因?这是我的 bt 的样子:
#0 0x00a40089 in ?? () #1 0x09e3fac0 in ?? () #2 0x09e34f30 in ?? () #3 0xb7ef9074 in ?? () #4 0xb7ef9200 in ?? () #5 0xb7ef9028 in ?? () #6 0x081d45a0 in LogFile::Flush () #7 0x081d45a0 in LogFile::Flush () #8 0x081d46e0 in LogFile::Close () #9 0x081d4dbf in LogFile::OpenLogFile () #10 0x081d4eb9 in LogFile::PerformPeriodicalFlush () #11 0x081d4fca in LogFile::StoreRecord () #12 0x081d50c2 in LogFile::StoreRecord ()
它给了我程序终止于信号 11,分段错误。
fflush() 的包装很简单,什么也不做,只是调用 fflash
和检查错误(如果返回的代码是 <0 )。所以,我猜测seg错误是由fflash
引起的。或者有可能在其他地方,因为 ??
位于堆栈顶部?
操作系统:RHEL5; gcc版本3.4.6 20060404(红帽3.4.6-3);使用 gdb 进行调试,原始 exe 中包含最大调试信息。
我知道磁盘上没有空间会出现段错误,但情况并非如此(因为我有一个应用程序看门狗,它会再次重新启动程序,一切都保持正常工作)。
任何想法都会有帮助。 谢谢。
编辑
void LogFile::PerformPeriodicalFlush( const utils::dt::TimeStamp& tsNow )
throw( LibCException )
{
m_tsLastPeriodicalCheck = tsNow;
struct stat LogFileStat;
int nResult = stat( m_sCurrentFullFileName.c_str(), &LogFileStat );
if ( 0 == nResult && S_ISREG( LogFileStat.st_mode ) )
{
//we successfuly stated the file, so it exists. We can safely perform
//a flush.
try
{
Flush();
return;
}
catch ( LibCException& )
{
OpenLogFile( tsNow );
return;
}
}
else
{
OpenLogFile( tsNow );
}
}
void RotatingLogFile::Flush() throw( object::LibCException )
{
if ( m_pFile != NULL )
{
if ( fflush( m_pFile ) (less_than) 0 )
{
throw object::LibCException();
}
}
}
**注意** 无法粘贴整个代码,它是 10 多个代码的一部分。而且,这在不同的应用程序、实时系统上已经运行了多年。这种事故非常非常罕见——每年大约发生两次。所以,我认为这不是代码中的问题。我知道没有人可以帮助我解决这类问题,这就是为什么我只是寻求任何想法,为什么 fflush 可能会导致段错误。
What possible reasons do you know for the situation, described in the title? Here's what my bt looks like:
#0 0x00a40089 in ?? () #1 0x09e3fac0 in ?? () #2 0x09e34f30 in ?? () #3 0xb7ef9074 in ?? () #4 0xb7ef9200 in ?? () #5 0xb7ef9028 in ?? () #6 0x081d45a0 in LogFile::Flush () #7 0x081d45a0 in LogFile::Flush () #8 0x081d46e0 in LogFile::Close () #9 0x081d4dbf in LogFile::OpenLogFile () #10 0x081d4eb9 in LogFile::PerformPeriodicalFlush () #11 0x081d4fca in LogFile::StoreRecord () #12 0x081d50c2 in LogFile::StoreRecord ()
and it gives me Program terminated with signal 11, Segmentation fault.
The wrapper around fflush() is simple, does nothing, just calls fflash
and check for errors (if the returned code is <0 ). So, I guess the seg fault is caused by fflash
. Or it's possible to be somewhere else, because of the ??
at the top of the stack?
OS: RHEL5; gcc version 3.4.6 20060404 (Red Hat 3.4.6-3); debugged with gdb, with the original exe with max debug information in it.
I know about seg fault on no space on the disk, but this is not this case (as I have a watch-dog for the application, that restarts the program again and everything keeps working just fine).
Any ideas would be helpful.
Thanks.
EDIT
void LogFile::PerformPeriodicalFlush( const utils::dt::TimeStamp& tsNow )
throw( LibCException )
{
m_tsLastPeriodicalCheck = tsNow;
struct stat LogFileStat;
int nResult = stat( m_sCurrentFullFileName.c_str(), &LogFileStat );
if ( 0 == nResult && S_ISREG( LogFileStat.st_mode ) )
{
//we successfuly stated the file, so it exists. We can safely perform
//a flush.
try
{
Flush();
return;
}
catch ( LibCException& )
{
OpenLogFile( tsNow );
return;
}
}
else
{
OpenLogFile( tsNow );
}
}
void RotatingLogFile::Flush() throw( object::LibCException )
{
if ( m_pFile != NULL )
{
if ( fflush( m_pFile ) (less_than) 0 )
{
throw object::LibCException();
}
}
}
**NOTE** can't paste the whole code, it's a part of 10+ thousands of code. Also this is working for years on different applications, on real-time systems. Such crashes are very, very rare - kinda twice a year. So, I don't think this is problem in the code. I know that noone can help me with this kind of stuff, that's why I'm just asking for any ideas, why fflush may cause seg fault.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(4)
我的猜测:你的某个地方有内存损坏,并且 LogFile 的“this”指向你无法访问的内存区域。
无论如何,没有代码很难判断。
My guess: you have memory corruption somewhere and LogFile's "this" points to a memory area that you can't access.
Anyway, it's difficult to tell without code.
看来,由于某些原因,权限出现了一些奇怪的情况(不确定到底是什么),但这是在小时变化时发生的,因为每个小时都会写入不同的文件。因此,在某种程度上,文件已创建,但没有写入权限,或类似的情况。没有人真正理解发生了什么、为什么以及如何发生(因为崩溃后,应用程序重新启动,一切都很好)。因此,由于没有权限执行此操作,
flush
崩溃了。这仍然是个谜..但已经解决了 xD
It appeared, that for some reasons, there was something strange with the permissions (not sure what exactly), but this had happened on a hour change, as different files are written for each hour. So, In some way, the file was created, but there were no permissions to write in it, or something like this. No one actually understood what, why and how that happened(because after the crash, the application was restarted and everything was just perfectly fine). So,
flush
crashed, because of no permissions to do that.It's still mystery .. but solved xD
您没有提供
Flush()
的代码,但对我来说听起来很奇怪,它被调用了两次。事实上,它似乎在调用自己。这可能会导致一些资源泄漏,具体取决于Flush()
的实现。You don't provide the code for
Flush()
, but sounds strange to me that it is called twice. In fact it seems that it calls itself. This may cause some resource leak, depending on the implementation ofFlush()
.在 valgrind 下运行您的程序,它将帮助您找到应用程序内存损坏的根源。
Run your program under valgrind, it will help you find the source of where your application's memory is corrupted.