我们可以使用OpenMP卸载双指指向GPU
我是OpenMP的新手,我正在尝试与这样的双循环并行化一个简单的代码:
for (int i=0; i<n; i++){
for (int j=0; j<n; j++){
c[i][j] = a[i][j] + b[i][j];
}
}
a
的数据类型, b 和c
代码>是并且必须留下double **
。
我试图将代码转换为此:
#pragma omp target teams distribute parallel for collapse(2)\
map(to: a[0:n][0:n],b[0:n][0:n]) map(from: c[0:n][0:n])
for (int i=0; i<n; i++){
for (int j=0; j<n; j++){
c[i][j] = a[i][j] + b[i][j];
}
}
但是我会中止(核心倾倒)
,有人可以帮我吗?
I am new to OpenMP and I am trying to parallelize a simple code with a double loop like this one:
for (int i=0; i<n; i++){
for (int j=0; j<n; j++){
c[i][j] = a[i][j] + b[i][j];
}
}
The data types of a
,b
and c
are and must stay double**
.
I tried to convert my code to this:
#pragma omp target teams distribute parallel for collapse(2)\
map(to: a[0:n][0:n],b[0:n][0:n]) map(from: c[0:n][0:n])
for (int i=0; i<n; i++){
for (int j=0; j<n; j++){
c[i][j] = a[i][j] + b[i][j];
}
}
But I get Aborted (core dumped)
, could somebody help me please?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
data:image/s3,"s3://crabby-images/d5906/d59060df4059a6cc364216c4d63ceec29ef7fe66" alt="扫码二维码加入Web技术交流群"
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
首先,请注意,像GCC这样的编译器通过打印消息明确显示问题:
数组部分在“ MAP”''子句中不连续
。如评论中所述,您需要手动转移目标设备上的每个连续块,因为OpenMP仅支持连续的数组/结构。
以下代码应起作用,但非常低效,因此不要在应用程序中使用它,除非目标是基于IT:
请注意,请注意每个人都应为每个目标创建一个内核这是可怕的行,但无论如何,它仍然非常效率(有关更多信息,请参见上述评论)。请注意,如果在编译时知道块的数量(并且相对较小),则可以使用OpenMP映射器。
First of all, note that compilers like GCC show explicitly the problem by printing the message:
array section is not contiguous in 'map' clause
.As said in the comments, you need to transfer every contiguous blocks on the target device manually since OpenMP only support contiguous arrays/structures.
The following code should work but it is very inefficient, so do not use it in an application unless the goal is to benchmark it:
Note that is should create a kernel for each line which is awful but it would be still very inefficient without that anyway (see the above comments for more information). Note that OpenMP mappers may be used if the number of blocks is known at compile time (and is relatively small).