OpenMP 和共享结构和指针

发布于 2024-09-29 13:55:17 字数 1730 浏览 6 评论 0原文

我有一个通过引用传递两个结构的函数。这些结构由动态分配的数组组成。现在,当我尝试实现 OpenMP 时,我发现速度变慢了,而不是加速了。我认为这可以归因于可能的共享问题。以下是一些供您细读的代码(C):

void    leap(MHD *mhd,GRID *grid,short int gchk)
{
  /*-- V A R I A B L E S --*/
  // Indexes
  int i,j,k,tid;
  double rhoinv[grid->nx][grid->ny][grid->nz];
  double rhoiinv[grid->nx][grid->ny][grid->nz];
  double rhoeinv[grid->nx][grid->ny][grid->nz];
  double rhoninv[grid->nx][grid->ny][grid->nz]; // Rho Inversion
  #pragma omp parallel shared(mhd->rho,mhd->rhoi,mhd->rhoe,mhd->rhon,grid,rhoinv,rhoiinv,rhoeinv,rhoninv) \
                       private(i,j,k,tid,stime)
  {
    tid=omp_get_thread_num();
    printf("-----  Thread %d Checking in!\n",tid);
    #pragma omp barrier
    if (tid == 0)
    {
      stime=clock();
      printf("-----1) Calculating leap helpers");
    }
    #pragma omp for
    for(i=0;i<grid->nx;i++)
    {
      for(j=0;j<grid->ny;j++)
      {
        for(k=0;k<grid->nz;k++)
        {
          //      rho's
          rhoinv[i][j][k]=1./mhd->rho[i][j][k];
          rhoiinv[i][j][k]=1./mhd->rhoi[i][j][k];
          rhoeinv[i][j][k]=1./mhd->rhoe[i][j][k];
          rhoninv[i][j][k]=1./mhd->rhon[i][j][k];
        }
      }
    }
    if (tid == 0)
    {
      printf("........%04.2f [s] -----\n",(clock()-stime)/CLOCKS_PER_SEC);
      stime=clock();
    }
    #pragma omp barrier
  }/*-- End Parallel Region --*/
}

现在我已经尝试了默认(共享)和共享(mhd),但都没有显示出任何改进的迹象。难道是因为数组是

mhd->rho=(double ***)newarray(nx,ny,nz,sizeof(double));

通过声明结构或指向结构元素的指针来分配的,所以我实际上并没有共享内存,只是共享指向它的指针?在此示例中,Oh 和 nx=389 ny=7 且 nz=739。对于 8 个线程,此部分的串行执行时间为 0.23 [s] 和 0.79 [s]。

I have a function which is passed two structures by reference. These structures are composed of dynamically allocated arrays. Now when I try to implement OpenMP I'm getting a slowdown not a speedup. I'm thinking this can be attributed to possible sharing issues. Here's some of the code for your perusal (C):

void    leap(MHD *mhd,GRID *grid,short int gchk)
{
  /*-- V A R I A B L E S --*/
  // Indexes
  int i,j,k,tid;
  double rhoinv[grid->nx][grid->ny][grid->nz];
  double rhoiinv[grid->nx][grid->ny][grid->nz];
  double rhoeinv[grid->nx][grid->ny][grid->nz];
  double rhoninv[grid->nx][grid->ny][grid->nz]; // Rho Inversion
  #pragma omp parallel shared(mhd->rho,mhd->rhoi,mhd->rhoe,mhd->rhon,grid,rhoinv,rhoiinv,rhoeinv,rhoninv) \
                       private(i,j,k,tid,stime)
  {
    tid=omp_get_thread_num();
    printf("-----  Thread %d Checking in!\n",tid);
    #pragma omp barrier
    if (tid == 0)
    {
      stime=clock();
      printf("-----1) Calculating leap helpers");
    }
    #pragma omp for
    for(i=0;i<grid->nx;i++)
    {
      for(j=0;j<grid->ny;j++)
      {
        for(k=0;k<grid->nz;k++)
        {
          //      rho's
          rhoinv[i][j][k]=1./mhd->rho[i][j][k];
          rhoiinv[i][j][k]=1./mhd->rhoi[i][j][k];
          rhoeinv[i][j][k]=1./mhd->rhoe[i][j][k];
          rhoninv[i][j][k]=1./mhd->rhon[i][j][k];
        }
      }
    }
    if (tid == 0)
    {
      printf("........%04.2f [s] -----\n",(clock()-stime)/CLOCKS_PER_SEC);
      stime=clock();
    }
    #pragma omp barrier
  }/*-- End Parallel Region --*/
}

Now I've tried default(shared) and shared(mhd) but neither show any signs of improvement. Could it be that since the arrays are allocated

mhd->rho=(double ***)newarray(nx,ny,nz,sizeof(double));

That by declaring the structure or the pointer to the element of the structure that I'm not actually sharing the memory just the pointers to it? Oh and nx=389 ny=7 and nz=739 in this example. Execution time for this section in serial is 0.23 [s] and 0.79 [s] for 8 threads.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

一生独一 2024-10-06 13:55:17

我的问题归结为一个真正简单的错误......clock()。虽然我确实通过仅让特定线程计算时间来保护我的计时算法,但我忘记了关于 Clock() 的一件重要事情……它返回挂钟时间,即总处理器时间(活动线程的总和)。我需要调用的是 omp_get_wtime()。这样做时,我突然发现代码的许多部分都得到了加速。作为记录,我已经修改了我的代码以包括

#ifdef _OPENMP
    #include <omp.h>
    #define TIMESCALE 1
#else
    #define omp_get_thread_num() 0
    #define omp_get_num_procs() 0
    #define omp_get_num_threads() 1
    #define omp_set_num_threads(bob) 0
    #define omp_get_wtime() clock()
    #define TIMESCALE CLOCKS_PER_SEC
#endif

我的计时算法现在是

    #pragma omp barrier
    if (tid == 0)
    {
        stime=omp_get_wtime();
        printf("-----1) Calculating leap helpers");
    }
    #pragma omp for
    for(i=0;i<grid->nx;i++)
    {
        for(j=0;j<grid->ny;j++)
        {
            for(k=0;k<grid->nz;k++)
            {
                //      rho's
                rhoinv[i][j][k]=1./mhd->rho[i][j][k];
                rhoiinv[i][j][k]=1./mhd->rhoi[i][j][k];
                rhoeinv[i][j][k]=1./mhd->rhoe[i][j][k];
                rhoninv[i][j][k]=1./mhd->rhon[i][j][k];
                //  1./(gamma-1.)
                gaminv[i][j][k]=1./(mhd->gamma[i][j][k]-1.);
                gamiinv[i][j][k]=1./(mhd->gammai[i][j][k]-1.);
                gameinv[i][j][k]=1./(mhd->gammae[i][j][k]-1.);
                gamninv[i][j][k]=1./(mhd->gamman[i][j][k]-1.);
            }
        }
    }
    if (tid == 0)
    {
        printf("........%04.2f [s] -----\n",(omp_get_wtime()-stime)/TIMESCALE);
        stime=omp_get_wtime();
        printf("-----2) Calculating leap helpers");
    }

My issue boiled down to a real simple mistake....clock(). While I did protect my timing algorithm by only having a specific thread calculate the time, I forgot one important thing about clock()...it returns wall clock time which is the total processor time (summation over the active threads). What I needed to be calling was omp_get_wtime(). Doing this I suddenly see a speedup for many sections of my code. For the record I've modified my code to include

#ifdef _OPENMP
    #include <omp.h>
    #define TIMESCALE 1
#else
    #define omp_get_thread_num() 0
    #define omp_get_num_procs() 0
    #define omp_get_num_threads() 1
    #define omp_set_num_threads(bob) 0
    #define omp_get_wtime() clock()
    #define TIMESCALE CLOCKS_PER_SEC
#endif

And my timing algorithm is now

    #pragma omp barrier
    if (tid == 0)
    {
        stime=omp_get_wtime();
        printf("-----1) Calculating leap helpers");
    }
    #pragma omp for
    for(i=0;i<grid->nx;i++)
    {
        for(j=0;j<grid->ny;j++)
        {
            for(k=0;k<grid->nz;k++)
            {
                //      rho's
                rhoinv[i][j][k]=1./mhd->rho[i][j][k];
                rhoiinv[i][j][k]=1./mhd->rhoi[i][j][k];
                rhoeinv[i][j][k]=1./mhd->rhoe[i][j][k];
                rhoninv[i][j][k]=1./mhd->rhon[i][j][k];
                //  1./(gamma-1.)
                gaminv[i][j][k]=1./(mhd->gamma[i][j][k]-1.);
                gamiinv[i][j][k]=1./(mhd->gammai[i][j][k]-1.);
                gameinv[i][j][k]=1./(mhd->gammae[i][j][k]-1.);
                gamninv[i][j][k]=1./(mhd->gamman[i][j][k]-1.);
            }
        }
    }
    if (tid == 0)
    {
        printf("........%04.2f [s] -----\n",(omp_get_wtime()-stime)/TIMESCALE);
        stime=omp_get_wtime();
        printf("-----2) Calculating leap helpers");
    }
陌若浮生 2024-10-06 13:55:17

这里重要的一点可能是循环的上限。由于您使用grid->nz等,openMP无法知道它们是否会在每次迭代中发生变化。将这些值加载到局部变量中并将其用于循环条件。

An important point here could be your upper bound of your loops. Since you use grid->nz etc openMP can't know if they will change or not for each iteration. Load these values in local variables and use these for the loop condition.

遥远的绿洲 2024-10-06 13:55:17

嗯,你还使用双打和除法。你能把除法化为乘法吗?

浮点单元在内核之间共享,并且除法在完成之前没有确定的周期数(与乘法相反)。因此,您最终会序列化以访问 fp 单元。

我确信如果您使用整数类型或乘法,您会看到加速。

Well, you are also using doubles and division. Can you make the division into multiplication?

The floating point unit is shared among the cores and divisions do not have a deterministic number of cycles till completion (as opposed to multiplication). So you end up serializing for accessing the fp unit.

I'm sure that if you use integral types or multiplication, you'll see a speedup.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文