OpenMP C 并行化算法
在《使用 OpenMP》一书中,有一个 C 语言内存访问错误的示例,我认为这是我尝试并行高斯算法的主要问题。
这个例子看起来像这样:
k= 0 ;
for( int j=0; j<n ; j++)
for(int i = 0; i<n; i++)
a[i][j] = a[i][j] - a[i][k]*a[k][j] ;
所以,我确实理解为什么这会导致错误的内存访问。在 C 中,二维数组按行存储,在每 i 步中,都会将新行从内存复制到缓存。
我正在尝试为此找到解决方案,但我没有得到很好的加速。我的尝试的效果很小。
有人可以给我提示我能做什么吗?
最简单的方法是交换 for 循环,但我想按列进行。
第二次尝试:
for( int j=0; j<n-1 ; j+=2)
for(int i = 0; i<n; i++)
{
a[i][j] = a[i][j] - a[i][k]*a[k][j] ;
a[i][j+1] = a[i][j+1] - a[i][k]*a[k][j+1] ;
}
根本没有改变。
第三次尝试:
for( int j=0; j<n ; j++)
{
d= a[k][j] ;
for(int i = 0; i<n; i++)
{
e = a[i][k] ;
a[i][j] = a[i][j] - e*d ;
}
}
非常
感谢Stepp
in the book "Using OpenMP" is an example for bad memory access in C and I think this is the main problem in my attempt to parallelism the gaussian algorithm.
The example looks something like this:
k= 0 ;
for( int j=0; j<n ; j++)
for(int i = 0; i<n; i++)
a[i][j] = a[i][j] - a[i][k]*a[k][j] ;
So, I do understand why this causes a bad memory access. In C a 2d array is stored by rows and here in every i step a new row will be copied from memory to cache.
I am trying to find a solution for this, but im not getting a good speed up. The effects of my attempts are minor.
Can someone give me a hint what I can do?
The easiest way would be to swap the for loops, but I want to do it columnwise.
The second attempt:
for( int j=0; j<n-1 ; j+=2)
for(int i = 0; i<n; i++)
{
a[i][j] = a[i][j] - a[i][k]*a[k][j] ;
a[i][j+1] = a[i][j+1] - a[i][k]*a[k][j+1] ;
}
didn't make a difference at all.
The third attempt:
for( int j=0; j<n ; j++)
{
d= a[k][j] ;
for(int i = 0; i<n; i++)
{
e = a[i][k] ;
a[i][j] = a[i][j] - e*d ;
}
}
Thx alot
Greets Stepp
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
使用平面数组代替,例如:
use flat array instead, eg:
正如您所指出的,您的循环顺序将导致每次迭代出现缓存缺失。因此,只需交换循环语句的顺序:
这将修复
a
中的行并仅改变列,这意味着您的内存访问将是连续的。Your loop order will cause a cache miss on every iteration, as you point out. So just swap the order of the loop statements:
This will fix the row in
a
and vary just the columns, which means your memory accesses will be contiguous.此内存访问问题仅与 CACHE 使用有关,与 Openmp 无关。
一般来说,为了充分利用缓存,您应该访问连续的内存位置。还要记住,如果两个或多个线程访问同一内存区域,那么可能会出现“错误剪切”问题,迫使缓存不必要地重新加载。
参见示例:
http://software.intel。 com/en-us/articles/avoiding-and-identifying-false-sharing-among-threads/
This memory access problem is just related to CACHE usage not to Openmp.
To make a good use of cache in general you should access contiguous memory locations. Remember also that if two or more threads are accessing the same memory area then you can have a "false shearing" problem forcing cache to be reloaded unnecessarily.
See for example:
http://software.intel.com/en-us/articles/avoiding-and-identifying-false-sharing-among-threads/