如何使用 OpenMP 并行化数组移位?

发布于 2024-11-07 22:13:10 字数 365 浏览 7 评论 0原文

如何使用 OpenMP 并行化数组移位?

我已经尝试了一些方法,但没有得到以下示例的任何准确结果(该示例旋转 Carteira 对象数组的元素,用于排列算法):

void rotaciona(int i)
{
    Carteira aux = this->carteira[i];
    for(int c = i; c < this->size - 1; c++)
    {
        this->carteira[c] = this->carteira[c+1];
    }
    this->carteira[this->size-1] = aux;
}

非常感谢!

How can I parallelize an array shift with OpenMP?

I've tryed a few things but didn't get any accurate results for the following example (which rotates the elements of an array of Carteira objects, for a permutation algorithm):

void rotaciona(int i)
{
    Carteira aux = this->carteira[i];
    for(int c = i; c < this->size - 1; c++)
    {
        this->carteira[c] = this->carteira[c+1];
    }
    this->carteira[this->size-1] = aux;
}

Thank you very much!

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

云归处 2024-11-14 22:13:10

这是一个带有循环携带依赖项的循环示例,因此不容易正如所写的那样并行化,因为任务(循环的每次迭代)不是独立的。打破依赖关系可以是微不足道的修改,也可以是完全不可能的修改
(例如,迭代循环)。

在这里,情况有点介于两者之间。并行执行此操作的问题是,您需要在邻居更改值之前找出最右边的值是什么。 OMP for 构造不会向您公开哪些循环迭代值将是“您的”,因此我认为您不能使用 OpenMP for worksharing 构造来打破循环。不过,您也可以自己做;但它需要更多的代码,并且它不会再很好地减少串行情况。

但是,下面仍然显示了如何执行此操作的示例。你必须自己打破循环,然后得到最右边的值。 OpenMP 屏障可确保在所有线程都缓存其新的最右边值之前,没有人开始修改值。

#include <stdio.h>
#include <stdlib.h>
#include <omp.h>

int main(int argc, char **argv) {
    int i;
    char *array;
    const int n=27;

    array = malloc(n * sizeof(char) );
    for (i=0; i<n-1; i++)
        array[i] = 'A'+i;

    array[n-1] = '\0';

    printf("Array pre-shift  = <%s>\n",array);

    #pragma omp parallel default(none) shared(array) private(i)
    {
        int nthreads = omp_get_num_threads();
        int tid = omp_get_thread_num();

        int blocksize = (n-2)/nthreads;
        int start = tid*blocksize;
        int end = start + blocksize - 1;
        if (tid == nthreads-1) end = n-2;

        /* we are responsible for values start...end */

        char rightval = array[end+1];
        #pragma omp barrier 

        for (i=start; i<end; i++)
            array[i] = array[i+1];

        array[end] = rightval;
    }
    printf("Array post-shift = <%s>\n",array);

    return 0;
}

This is an example of a loop with loop-carried dependencies, and so can't be easily parallelized as written because the tasks (each iteration of the loop) aren't independent. Breaking the dependency can vary from a trivial modification to the completely impossible
(eg, an iteration loop).

Here, the case is somewhat in between. The issue with doing this in parallel is that you need to find out what your rightmost value is going to be before your neighbour changes the value. The OMP for construct doesn't expose to you which loop iterations values will be "yours", so I don't think you can use the OpenMP for worksharing construct to break up the loop. However, you can do it yourself; but it requires a lot more code, and it won't nicely reduce to the serial case any more.

But still, an example of how to do this is shown below. You have to break the loop up yourself, and then get your rightmost value. An OpenMP barrier ensures that no one starts modifying values until all the threads have cached their new rightmost value.

#include <stdio.h>
#include <stdlib.h>
#include <omp.h>

int main(int argc, char **argv) {
    int i;
    char *array;
    const int n=27;

    array = malloc(n * sizeof(char) );
    for (i=0; i<n-1; i++)
        array[i] = 'A'+i;

    array[n-1] = '\0';

    printf("Array pre-shift  = <%s>\n",array);

    #pragma omp parallel default(none) shared(array) private(i)
    {
        int nthreads = omp_get_num_threads();
        int tid = omp_get_thread_num();

        int blocksize = (n-2)/nthreads;
        int start = tid*blocksize;
        int end = start + blocksize - 1;
        if (tid == nthreads-1) end = n-2;

        /* we are responsible for values start...end */

        char rightval = array[end+1];
        #pragma omp barrier 

        for (i=start; i<end; i++)
            array[i] = array[i+1];

        array[end] = rightval;
    }
    printf("Array post-shift = <%s>\n",array);

    return 0;
}
后eg是否自 2024-11-14 22:13:10

尽管您的示例没有显示任何显式的 openmp pragma,但我认为它并不容易工作:

您正在对重叠区域进行就地操作。
如果将循环分成块,则会在边界处出现竞争条件(因为 el[n] 是从 el[n+1] 复制的,而 el[n+1] 可能已在另一个线程中更新)。

我建议你进行手动分块(这是可以完成的),但我怀疑 openmp parallel for 不够灵活(还没有尝试过),所以你可以有一个并行区域来分块工作,并修复线程屏障/并行块结束后的边界元素


其他想法:

  1. 如果您的值是 POD,则可以使用 memmove 代替(
  2. 如果可以的话),只需切换到 list

std::list<Carteira> items(3000);

// rotation is now simply:
items.push_back(items.front());
items.erase(items.begin());

Though your sample doesn't show any explicit openmp pragma's, I don't think it could work easily:

you are doing an in-place operation with overlapping regions.
If you split the loop in chunks, you'll have race conditions at the boundaries (because el[n] gets copied from el[n+1], which might already have been updated in another thread).

I suggest that you do manual chunking (which can be done), but I suspect that openmp parallel for is not flexible enough (haven't tried), so you could just have a parallell region that does the work in chunks, and fixup the boundary elements after a thread barrier/end of parallel block


Other thoughts:

  1. if your values are POD, you can use memmove instead
  2. if you can, simply switch to a list

.

std::list<Carteira> items(3000);

// rotation is now simply:
items.push_back(items.front());
items.erase(items.begin());
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文