如何使用 OpenMP 并行化数组移位?
如何使用 OpenMP 并行化数组移位?
我已经尝试了一些方法,但没有得到以下示例的任何准确结果(该示例旋转 Carteira 对象数组的元素,用于排列算法):
void rotaciona(int i)
{
Carteira aux = this->carteira[i];
for(int c = i; c < this->size - 1; c++)
{
this->carteira[c] = this->carteira[c+1];
}
this->carteira[this->size-1] = aux;
}
非常感谢!
How can I parallelize an array shift with OpenMP?
I've tryed a few things but didn't get any accurate results for the following example (which rotates the elements of an array of Carteira objects, for a permutation algorithm):
void rotaciona(int i)
{
Carteira aux = this->carteira[i];
for(int c = i; c < this->size - 1; c++)
{
this->carteira[c] = this->carteira[c+1];
}
this->carteira[this->size-1] = aux;
}
Thank you very much!
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
这是一个带有循环携带依赖项的循环示例,因此不容易正如所写的那样并行化,因为任务(循环的每次迭代)不是独立的。打破依赖关系可以是微不足道的修改,也可以是完全不可能的修改
(例如,迭代循环)。
在这里,情况有点介于两者之间。并行执行此操作的问题是,您需要在邻居更改值之前找出最右边的值是什么。 OMP for 构造不会向您公开哪些循环迭代值将是“您的”,因此我认为您不能使用 OpenMP for worksharing 构造来打破循环。不过,您也可以自己做;但它需要更多的代码,并且它不会再很好地减少串行情况。
但是,下面仍然显示了如何执行此操作的示例。你必须自己打破循环,然后得到最右边的值。 OpenMP 屏障可确保在所有线程都缓存其新的最右边值之前,没有人开始修改值。
This is an example of a loop with loop-carried dependencies, and so can't be easily parallelized as written because the tasks (each iteration of the loop) aren't independent. Breaking the dependency can vary from a trivial modification to the completely impossible
(eg, an iteration loop).
Here, the case is somewhat in between. The issue with doing this in parallel is that you need to find out what your rightmost value is going to be before your neighbour changes the value. The OMP for construct doesn't expose to you which loop iterations values will be "yours", so I don't think you can use the OpenMP for worksharing construct to break up the loop. However, you can do it yourself; but it requires a lot more code, and it won't nicely reduce to the serial case any more.
But still, an example of how to do this is shown below. You have to break the loop up yourself, and then get your rightmost value. An OpenMP barrier ensures that no one starts modifying values until all the threads have cached their new rightmost value.
尽管您的示例没有显示任何显式的 openmp pragma,但我认为它并不容易工作:
您正在对重叠区域进行就地操作。
如果将循环分成块,则会在边界处出现竞争条件(因为 el[n] 是从 el[n+1] 复制的,而 el[n+1] 可能已在另一个线程中更新)。
我建议你进行手动分块(这是可以完成的),但我怀疑 openmp parallel for 不够灵活(还没有尝试过),所以你可以有一个并行区域来分块工作,并修复线程屏障/并行块结束后的边界元素
其他想法:
。
Though your sample doesn't show any explicit openmp pragma's, I don't think it could work easily:
you are doing an in-place operation with overlapping regions.
If you split the loop in chunks, you'll have race conditions at the boundaries (because el[n] gets copied from el[n+1], which might already have been updated in another thread).
I suggest that you do manual chunking (which can be done), but I suspect that openmp parallel for is not flexible enough (haven't tried), so you could just have a parallell region that does the work in chunks, and fixup the boundary elements after a thread barrier/end of parallel block
Other thoughts:
.