为什么此代码在 valgrind (helgrind) 下失败?
**已解决:在我的类的构造函数中,我有一个信号量的构造与线程的构造竞争,我希望首先创建信号量,然后创建线程。对我有用的解决方案是首先在基类中创建信号量,这样我就可以在派生类中依赖它。 **
我有一个相当小的 pthreads C++ 程序,在正常条件下可以正常工作。然而,当在程序上使用 valgrind 的线程错误检查工具时,它似乎发现了竞争条件。使这种竞争条件特别难以避免的原因是它发生在“Semaphore”类内部(它实际上只是封装了 sem_init、sem_wait 和 sem_post),所以我无法用另一个信号量来解决这个问题(并且不必这样做) )。我不认为 valgrind 会给出误报,因为我的程序在 valgrind 下运行时显示出不同的行为。
这是 Semaphore.cpp * :
#include "Semaphore.h" #include <stdexcept> #include <errno.h> #include <iostream> Semaphore::Semaphore(bool pshared,int initial) : m_Sem(new sem_t()) { if(m_Sem==0) throw std::runtime_error("Semaphore constructor error: m_Sem == 0"); if(sem_init(m_Sem,(pshared?1:0),initial)==-1) throw std::runtime_error("sem_init failed"); } Semaphore::~Semaphore() { sem_destroy(m_Sem); delete m_Sem; } void Semaphore::lock() { if(m_Sem==0) throw std::runtime_error("Semaphore::lock error: m_Sem == 0"); int rc; for(;;){ rc = sem_wait(m_Sem); if(rc==0) break; if(errno==EINTR) continue; throw std::runtime_error("sem_wait failed"); } } void Semaphore::unlock() { if(sem_post(m_Sem)!=0) throw std::runtime_error("sem_post failed"); }
- 注意 Semaphore 的构造函数如何创建一个名为“m_Sem”的新 sem_t,并在 m_Sem 仍然等于 0 的极不可能的情况下抛出异常。这只是意味着该构造函数应该不可能允许 m_Sem 等于0. 好吧...继续讨论 Semaphore::lock:无论从哪个线程(以及构造函数)调用该函数,理论上,m_Sem 仍然不可能为 0,对吗?好吧,当我在 helgrind 下运行我的程序时,Semaphore::lock 果然最终抛出了这个异常“Semaphore::lock error: m_Sem==0”,我真的认为这是不可能的。
我已经在其他程序中使用了这个 Semaphore 类,这些程序没有任何问题地通过了 helgrind,而且我真的不确定我在这里做了什么特别的事情导致了这个问题。根据 helgrind 的说法,竞争发生在一个线程中 Semaphore 构造函数中的写入和另一个线程中 Semaphore::lock 中的读取之间。老实说,我什至不明白这是怎么可能的:对象的方法如何与该对象的构造函数存在竞争条件? C++ 不保证在调用对象的方法之前已经调用了构造函数吗?即使在多线程环境中,怎么会违反这一点呢?
无论如何,现在是 valgrind 输出。我正在使用 valgind 版本“Valgrind-3.6.0.SVN-Debian”。 Memcheck 说一切都很好。这是 helgrind 的结果:
$ valgrind --tool=helgrind --read-var-info=yes ./try ==7776== Helgrind, a thread error detector ==7776== Copyright (C) 2007-2009, and GNU GPL'd, by OpenWorks LLP et al. ==7776== Using Valgrind-3.6.0.SVN-Debian and LibVEX; rerun with -h for copyright info ==7776== Command: ./try ==7776== terminate called after throwing an instance of '==7776== Thread #1 is the program's root thread ==7776== ==7776== Thread #2 was created ==7776== at 0x425FA38: clone (clone.S:111) ==7776== by 0x40430EA: pthread_create@@GLIBC_2.1 (createthread.c:249) ==7776== by 0x402950C: pthread_create_WRK (hg_intercepts.c:230) ==7776== by 0x40295A0: pthread_create@* (hg_intercepts.c:257) ==7776== by 0x804CD91: Thread::Thread(void* (*)(void*), void*) (Thread.cpp:10) ==7776== by 0x804B2D5: ActionQueue::ActionQueue() (ActionQueue.h:40) ==7776== by 0x80497CA: main (try.cpp:9) ==7776== ==7776== Possible data race during write of size 4 at 0x42ee04c by thread #1 ==7776== at 0x804D9C5: Semaphore::Semaphore(bool, int) (Semaphore.cpp:8) ==7776== by 0x804B333: ActionQueue::ActionQueue() (ActionQueue.h:40) ==7776== by 0x80497CA: main (try.cpp:9) ==7776== This conflicts with a previous read of size 4 by thread #2 ==7776== at 0x804D75B: Semaphore::lock() (Semaphore.cpp:26) ==7776== by 0x804B3BE: Lock::Lock(Semaphore&) (Lock.h:17) ==7776== by 0x804B497: ActionQueue::ActionQueueLoop() (ActionQueue.h:56) ==7776== by 0x8049ED5: void* CallMemFun, &(ActionQueue::ActionQueueLoop())>(void*) (CallMemFun.h:7) ==7776== by 0x402961F: mythread_wrapper (hg_intercepts.c:202) ==7776== by 0x404296D: start_thread (pthread_create.c:300) ==7776== by 0x425FA4D: clone (clone.S:130) ==7776== std::runtime_error' what(): Semaphore::lock error: m_Sem == 0 ==7776== ==7776== For counts of detected and suppressed errors, rerun with: -v ==7776== Use --history-level=approx or =none to gain increased speed, at ==7776== the cost of reduced accuracy of conflicting-access information ==7776== ERROR SUMMARY: 1 errors from 1 contexts (suppressed: 5 from 5)
任何拥有 git 和 valgrind 的人都可以通过检查我的 git repo 分支(据记录,当前提交 262369c2d25eb17a0147)中的代码来重现此内容,如下所示:
$ git clone git://github.com/notfed/concqueue -b semaphores $ cd concqueue $ make $ valgrind --tool=helgrind --read-var-info=yes ./try
**SOLVED: Inside my class's constructor, I had a Semaphore's construction racing with a Thread's construction, where I wanted the Semaphore to be created first and the Thread second. The solution that worked for me was to create the Semaphore first in a base class, that way I can depend on it in my derived class. **
I have a fairly small pthreads C++ program which works fine under normal conditions. However, when using valgrind's thread error checking tools on the program, it appears to uncover a race condition. What makes this race condition particularly difficult to avoid is that it is happening inside a "Semaphore" class (which really just encapsulates sem_init, sem_wait, and sem_post), so I can't fix this with another Semaphore (and shouldn't have to). I don't think valgrind is giving a false positive since my program shows different behavior when running under valgrind.
Here's Semaphore.cpp * :
#include "Semaphore.h" #include <stdexcept> #include <errno.h> #include <iostream> Semaphore::Semaphore(bool pshared,int initial) : m_Sem(new sem_t()) { if(m_Sem==0) throw std::runtime_error("Semaphore constructor error: m_Sem == 0"); if(sem_init(m_Sem,(pshared?1:0),initial)==-1) throw std::runtime_error("sem_init failed"); } Semaphore::~Semaphore() { sem_destroy(m_Sem); delete m_Sem; } void Semaphore::lock() { if(m_Sem==0) throw std::runtime_error("Semaphore::lock error: m_Sem == 0"); int rc; for(;;){ rc = sem_wait(m_Sem); if(rc==0) break; if(errno==EINTR) continue; throw std::runtime_error("sem_wait failed"); } } void Semaphore::unlock() { if(sem_post(m_Sem)!=0) throw std::runtime_error("sem_post failed"); }
- Notice how Semaphore's constructor creates a new sem_t named "m_Sem", and throws an exception in the extremely-unlikely scenario that m_Sem still equals 0. This just means it should be impossible for this constructor to allow m_Sem to equal 0. Well...move on to Semaphore::lock: regardless of what thread this function is called from (as well as the constructor), it should--theoretically--still be impossible for m_Sem to be 0, right? Well, when I run my program under helgrind, Semaphore::lock, sure enough, ends up throwing this exception "Semaphore::lock error: m_Sem==0" which I really thought should be impossible.
I have used this Semaphore class in other programs which pass helgrind with no problems, and I'm really not sure what I'm doing special here that is causing the issue. According to helgrind, the race is happening between a write in Semaphore's constructor in one thread and a read in Semaphore::lock in another thread. Honestly, I don't even see how that's possible: how can a method of an object have a race condition with the constructor of that object?? Doesn't C++ guarantee that the constructor has been called before it's possible to invoke a method on an object? How can this ever be violated, even in a multithreaded environment?
Anyway, now for the valgrind output. I'm using valgind version "Valgrind-3.6.0.SVN-Debian". Memcheck says all is well. Here's the result of helgrind:
$ valgrind --tool=helgrind --read-var-info=yes ./try ==7776== Helgrind, a thread error detector ==7776== Copyright (C) 2007-2009, and GNU GPL'd, by OpenWorks LLP et al. ==7776== Using Valgrind-3.6.0.SVN-Debian and LibVEX; rerun with -h for copyright info ==7776== Command: ./try ==7776== terminate called after throwing an instance of '==7776== Thread #1 is the program's root thread ==7776== ==7776== Thread #2 was created ==7776== at 0x425FA38: clone (clone.S:111) ==7776== by 0x40430EA: pthread_create@@GLIBC_2.1 (createthread.c:249) ==7776== by 0x402950C: pthread_create_WRK (hg_intercepts.c:230) ==7776== by 0x40295A0: pthread_create@* (hg_intercepts.c:257) ==7776== by 0x804CD91: Thread::Thread(void* (*)(void*), void*) (Thread.cpp:10) ==7776== by 0x804B2D5: ActionQueue::ActionQueue() (ActionQueue.h:40) ==7776== by 0x80497CA: main (try.cpp:9) ==7776== ==7776== Possible data race during write of size 4 at 0x42ee04c by thread #1 ==7776== at 0x804D9C5: Semaphore::Semaphore(bool, int) (Semaphore.cpp:8) ==7776== by 0x804B333: ActionQueue::ActionQueue() (ActionQueue.h:40) ==7776== by 0x80497CA: main (try.cpp:9) ==7776== This conflicts with a previous read of size 4 by thread #2 ==7776== at 0x804D75B: Semaphore::lock() (Semaphore.cpp:26) ==7776== by 0x804B3BE: Lock::Lock(Semaphore&) (Lock.h:17) ==7776== by 0x804B497: ActionQueue::ActionQueueLoop() (ActionQueue.h:56) ==7776== by 0x8049ED5: void* CallMemFun, &(ActionQueue::ActionQueueLoop())>(void*) (CallMemFun.h:7) ==7776== by 0x402961F: mythread_wrapper (hg_intercepts.c:202) ==7776== by 0x404296D: start_thread (pthread_create.c:300) ==7776== by 0x425FA4D: clone (clone.S:130) ==7776== std::runtime_error' what(): Semaphore::lock error: m_Sem == 0 ==7776== ==7776== For counts of detected and suppressed errors, rerun with: -v ==7776== Use --history-level=approx or =none to gain increased speed, at ==7776== the cost of reduced accuracy of conflicting-access information ==7776== ERROR SUMMARY: 1 errors from 1 contexts (suppressed: 5 from 5)
Anyone with git and valgrind can reproduce this by checking out the code from my git repo branch (which, for the record, is currently on commit 262369c2d25eb17a0147) as follows:
$ git clone git://github.com/notfed/concqueue -b semaphores $ cd concqueue $ make $ valgrind --tool=helgrind --read-var-info=yes ./try
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
尽管看起来该线程正在尝试在线程 1 完成运行构造函数之前使用线程 2 中的信号量。在这种情况下,m_Sem 可以为 NULL(0) 或任何其他值。
Though it looks like the thread is trying to use the Semaphore in thread 2 before thread 1 has finished running the constructor. In this case it is possible to have m_Sem be NULL(0) or any other value.
好吧,我发现问题了。我的 ActionQueue 类在构造时创建(除了其他之外)两个对象:一个信号量和一个线程。问题是,这个线程正在使用那个信号量。我错误地认为信号量会在进入构造函数之前自动创建,因为它是一个成员对象。我的解决方案是从构建信号量的基类派生 ActionQueue;这样,当我到达 ActionQueue 的构造函数时,我可以指望已经构造了基类的成员。
Okay, I found the problem. My ActionQueue class was creating (in addition to others) two objects upon construction: a Semaphore, and a Thread. Problem was, this Thread was using that Semaphore. I incorrectly assumed that the Semaphore would be created automatically before entering the constructor since it is a member object. My solution was to derive ActionQueue from a base class in which my Semaphore is constructed; that way, by the time I get to ActionQueue's constructor, I can count on the base class's members already being constructed.