openmp 与非 POD 并行

发布于 2024-11-14 13:31:14 字数 1667 浏览 4 评论 0 原文

我正在尝试加速一个程序,该程序的核心是一个看起来微不足道的循环:

double sum=0.;
#pragma omp parallel for reduction(+:sum) // fails
for( size_t i=0; i<_S.size(); ++i ){
  sum += _S[i].first* R(atoms,_S[i].second) ;
}

虽然循环本身很微不足道,但其中的对象不是 POD:这里 _S 实际上是一个 std::向量< std::pair; > >,并且 R(...) 是某个对象的重载 operator(...) const。它的两个参数都限定为 const,因此调用不会产生任何副作用。

由于大约 90% 的运行时间都花在这个调用上,因此如上所示,添加 OpenMP 编译指示似乎是一件简单的事情,并享受两到三倍的加速; 当然,该代码在单个线程中工作正常,但在多个线程中给出明显错误的结果:-)。

没有数据依赖性,_SR(...) 似乎可以安全地在线程之间共享,但仍然会产生无意义的结果。

我真的很感激任何关于如何找出问题所在的指示。

UPD2:

想通了。与所有错误一样,这都是微不足道的。 R(...) 正在调用此类的 operator()

class objR{
 public:
   objR(const size_t N){
     _buffer.reserve(N);
   };
   double operator(...) const{
     // do something, using the _buffer to store intermediaries
   }

 private:
   std::vector<double> _buffer;
};

显然,不同的线程同时使用 _buffer时间并把它搞砸了。到目前为止,我的解决方案是分配更多空间(内存不是问题,代码受 CPU 限制):

class objR{
 public:
   objR(const size_t N){
     int nth=1;
     #ifdef _OPENMP
       nth=omp_get_max_threads();
     #endif
     _buffer.resize(N);
   }

   double operator(...) const{
     int thread_id=0;
     #ifdef _OPENMP
       thread_id = omp_get_thread_num();
     #endif
     // do something, using the _buffer[thread_id] to store intermediaries
   }

 private:
   std::vector< std::vector<double> > _buffer;
};

这似乎工作正常。尽管如此,由于这是我第一次涉足多线程,如果有知识渊博的人可以评论是否有更好的方法,我将不胜感激。

I'm trying to speed up a program, in the heart of which is a trivially-looking loop:

double sum=0.;
#pragma omp parallel for reduction(+:sum) // fails
for( size_t i=0; i<_S.size(); ++i ){
  sum += _S[i].first* R(atoms,_S[i].second) ;
}

While the looping itself is trivial, the objects inside it are not PODs: Here _S is in fact an
std::vector< std::pair<double, std::vector<size_t> > >, and R(...) is an overloaded operator(...) const of some object. Both of its arguments qualified as const, so that the call does not have any side effects.

Since some 90% of the runtime is spent in this call, it seemed a simple thing to throw in an OpenMP pragma as shown above, and enjoy a speedup by a factor of two or three;
but of course --- the code works OK with a single thread, but gives plain wrong results for more than one thread :-).

There is no data dependency, both _S and R(...) seem to be safe to share between threads, but still it produces nonsense.

I'd really appreciate any pointers into how to find what goes wrong.

UPD2:

Figured it. As all bugs, it's trivial. The R(...) was calling the operator() of something of this sort:

class objR{
 public:
   objR(const size_t N){
     _buffer.reserve(N);
   };
   double operator(...) const{
     // do something, using the _buffer to store intermediaries
   }

 private:
   std::vector<double> _buffer;
};

Clearly, different threads use the _buffer at the same time and mess it up. My solution so far is to allocate more space (memory is not a problem, the code is CPU-bound):

class objR{
 public:
   objR(const size_t N){
     int nth=1;
     #ifdef _OPENMP
       nth=omp_get_max_threads();
     #endif
     _buffer.resize(N);
   }

   double operator(...) const{
     int thread_id=0;
     #ifdef _OPENMP
       thread_id = omp_get_thread_num();
     #endif
     // do something, using the _buffer[thread_id] to store intermediaries
   }

 private:
   std::vector< std::vector<double> > _buffer;
};

This seems to work correctly. Still, since it's my first foray into things mutithreaded, I'd appreciate if somebody knowledgeable could comment on if there is a better approach.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

你如我软肋 2024-11-21 13:31:14

_S[i].first_S[i].second 的访问是完全安全的(不能保证有关 atoms 的任何信息)。这意味着您对 R 的函数调用一定是导致问题的原因。您需要找出 R 是什么并发布它的用途。

另一方面,以下划线开头并以大写字符开头的名称是为实现保留的,您可以通过使用它们来调用未定义的行为。

The access to _S[i].first and _S[i].second is perfectly safe (can't guarantee anything about atoms). This means that your function call to R must be what's causing the problem. You need to find out what R is and post what it's doing.

In another point, names which begin with an underscore and begin with an uppercase character are reserved for the implementation and you invoke undefined behaviour by using them.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文