我正在尝试加速一个程序,该程序的核心是一个看起来微不足道的循环:
double sum=0.;
#pragma omp parallel for reduction(+:sum) // fails
for( size_t i=0; i<_S.size(); ++i ){
sum += _S[i].first* R(atoms,_S[i].second) ;
}
虽然循环本身很微不足道,但其中的对象不是 POD:这里 _S 实际上是一个
std::向量< std::pair; > >
,并且 R(...)
是某个对象的重载 operator(...) const
。它的两个参数都限定为 const,因此调用不会产生任何副作用。
由于大约 90% 的运行时间都花在这个调用上,因此如上所示,添加 OpenMP 编译指示似乎是一件简单的事情,并享受两到三倍的加速;
当然,该代码在单个线程中工作正常,但在多个线程中给出明显错误的结果:-)。
没有数据依赖性,_S
和 R(...)
似乎可以安全地在线程之间共享,但仍然会产生无意义的结果。
我真的很感激任何关于如何找出问题所在的指示。
UPD2:
想通了。与所有错误一样,这都是微不足道的。 R(...)
正在调用此类的 operator()
:
class objR{
public:
objR(const size_t N){
_buffer.reserve(N);
};
double operator(...) const{
// do something, using the _buffer to store intermediaries
}
private:
std::vector<double> _buffer;
};
显然,不同的线程同时使用 _buffer
时间并把它搞砸了。到目前为止,我的解决方案是分配更多空间(内存不是问题,代码受 CPU 限制):
class objR{
public:
objR(const size_t N){
int nth=1;
#ifdef _OPENMP
nth=omp_get_max_threads();
#endif
_buffer.resize(N);
}
double operator(...) const{
int thread_id=0;
#ifdef _OPENMP
thread_id = omp_get_thread_num();
#endif
// do something, using the _buffer[thread_id] to store intermediaries
}
private:
std::vector< std::vector<double> > _buffer;
};
这似乎工作正常。尽管如此,由于这是我第一次涉足多线程,如果有知识渊博的人可以评论是否有更好的方法,我将不胜感激。
I'm trying to speed up a program, in the heart of which is a trivially-looking loop:
double sum=0.;
#pragma omp parallel for reduction(+:sum) // fails
for( size_t i=0; i<_S.size(); ++i ){
sum += _S[i].first* R(atoms,_S[i].second) ;
}
While the looping itself is trivial, the objects inside it are not PODs: Here _S is in fact an
std::vector< std::pair<double, std::vector<size_t> > >
, and R(...)
is an overloaded operator(...) const
of some object. Both of its arguments qualified as const
, so that the call does not have any side effects.
Since some 90% of the runtime is spent in this call, it seemed a simple thing to throw in an OpenMP pragma as shown above, and enjoy a speedup by a factor of two or three;
but of course --- the code works OK with a single thread, but gives plain wrong results for more than one thread :-).
There is no data dependency, both _S
and R(...)
seem to be safe to share between threads, but still it produces nonsense.
I'd really appreciate any pointers into how to find what goes wrong.
UPD2:
Figured it. As all bugs, it's trivial. The R(...)
was calling the operator()
of something of this sort:
class objR{
public:
objR(const size_t N){
_buffer.reserve(N);
};
double operator(...) const{
// do something, using the _buffer to store intermediaries
}
private:
std::vector<double> _buffer;
};
Clearly, different threads use the _buffer
at the same time and mess it up. My solution so far is to allocate more space (memory is not a problem, the code is CPU-bound):
class objR{
public:
objR(const size_t N){
int nth=1;
#ifdef _OPENMP
nth=omp_get_max_threads();
#endif
_buffer.resize(N);
}
double operator(...) const{
int thread_id=0;
#ifdef _OPENMP
thread_id = omp_get_thread_num();
#endif
// do something, using the _buffer[thread_id] to store intermediaries
}
private:
std::vector< std::vector<double> > _buffer;
};
This seems to work correctly. Still, since it's my first foray into things mutithreaded, I'd appreciate if somebody knowledgeable could comment on if there is a better approach.
发布评论
评论(1)
对
_S[i].first
和_S[i].second
的访问是完全安全的(不能保证有关atoms
的任何信息)。这意味着您对 R 的函数调用一定是导致问题的原因。您需要找出R
是什么并发布它的用途。另一方面,以下划线开头并以大写字符开头的名称是为实现保留的,您可以通过使用它们来调用未定义的行为。
The access to
_S[i].first
and_S[i].second
is perfectly safe (can't guarantee anything aboutatoms
). This means that your function call toR
must be what's causing the problem. You need to find out whatR
is and post what it's doing.In another point, names which begin with an underscore and begin with an uppercase character are reserved for the implementation and you invoke undefined behaviour by using them.