OpenMP C 并行化算法

发布于 2024-10-19 08:33:06 字数 814 浏览 6 评论 0原文

在《使用 OpenMP》一书中，有一个 C 语言内存访问错误的示例，我认为这是我尝试并行高斯算法的主要问题。

这个例子看起来像这样：

k= 0 ;    
for( int j=0; j<n ; j++)
  for(int i = 0; i<n; i++)
       a[i][j] = a[i][j] - a[i][k]*a[k][j] ;

所以，我确实理解为什么这会导致错误的内存访问。在 C 中，二维数组按行存储，在每 i 步中，都会将新行从内存复制到缓存。

我正在尝试为此找到解决方案，但我没有得到很好的加速。我的尝试的效果很小。

有人可以给我提示我能做什么吗？

最简单的方法是交换 for 循环，但我想按列进行。

第二次尝试：

for( int j=0; j<n-1 ; j+=2)
  for(int i = 0; i<n; i++)
  {
     a[i][j] = a[i][j] - a[i][k]*a[k][j] ;
     a[i][j+1] = a[i][j+1] - a[i][k]*a[k][j+1] ;
  }

根本没有改变。

第三次尝试：

for( int j=0; j<n ; j++)
{  
  d= a[k][j] ;
  for(int i = 0; i<n; i++)
  {
    e = a[i][k] ;
    a[i][j] = a[i][j] - e*d ;
  }
}

非常

感谢Stepp

原文

in the book "Using OpenMP" is an example for bad memory access in C and I think this is the main problem in my attempt to parallelism the gaussian algorithm.

The example looks something like this:

k= 0 ;    
for( int j=0; j<n ; j++)
  for(int i = 0; i<n; i++)
       a[i][j] = a[i][j] - a[i][k]*a[k][j] ;

So, I do understand why this causes a bad memory access. In C a 2d array is stored by rows and here in every i step a new row will be copied from memory to cache.

I am trying to find a solution for this, but im not getting a good speed up. The effects of my attempts are minor.

Can someone give me a hint what I can do?

The easiest way would be to swap the for loops, but I want to do it columnwise.

The second attempt:

for( int j=0; j<n-1 ; j+=2)
  for(int i = 0; i<n; i++)
  {
     a[i][j] = a[i][j] - a[i][k]*a[k][j] ;
     a[i][j+1] = a[i][j+1] - a[i][k]*a[k][j+1] ;
  }

didn't make a difference at all.

The third attempt:

for( int j=0; j<n ; j++)
{  
  d= a[k][j] ;
  for(int i = 0; i<n; i++)
  {
    e = a[i][k] ;
    a[i][j] = a[i][j] - e*d ;
  }
}

Thx alot

Greets Stepp

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

半世晨晓 2024-10-26 08:33:06

使用平面数组代替，例如：

#define A(i,j) A[i+j*ldA]

for( int j=0; j<n ; j++)
{  
  d= A(k,j) ;
  ...
}

use flat array instead, eg:

#define A(i,j) A[i+j*ldA]

for( int j=0; j<n ; j++)
{  
  d= A(k,j) ;
  ...
}

回复收藏 0 原文

谎言月老 2024-10-26 08:33:06

正如您所指出的，您的循环顺序将导致每次迭代出现缓存缺失。因此，只需交换循环语句的顺序：

for (int i = 0; i < n; i++)       // now "i" is first
  for (int j = 0; j < n; j++)
       a[i][j] = a[i][j] - a[i][k]*a[k][j];

这将修复 a 中的行并仅改变列，这意味着您的内存访问将是连续的。

Your loop order will cause a cache miss on every iteration, as you point out. So just swap the order of the loop statements:

for (int i = 0; i < n; i++)       // now "i" is first
  for (int j = 0; j < n; j++)
       a[i][j] = a[i][j] - a[i][k]*a[k][j];

This will fix the row in a and vary just the columns, which means your memory accesses will be contiguous.

回复收藏 0 原文

尴尬癌患者 2024-10-26 08:33:06

此内存访问问题仅与 CACHE 使用有关，与 Openmp 无关。
一般来说，为了充分利用缓存，您应该访问连续的内存位置。还要记住，如果两个或多个线程访问同一内存区域，那么可能会出现“错误剪切”问题，迫使缓存不必要地重新加载。
参见示例：
http://software.intel。 com/en-us/articles/avoiding-and-identifying-false-sharing-among-threads/

回复收藏 0 原文

~没有更多了~