如何惯用地打破嵌套并行 (OpenMP) Fortran 循环？

发布于 2024-09-04 22:12:02 字数 913 浏览 12 评论 0原文

这是顺序代码：

do i = 1, n
   do j = i+1, n
      if ("some_condition(i,j)") then
         result = "here's result"
         return
      end if
   end do
end do

是否有一种更清晰的方法来同时执行外循环的迭代，除了：

  !$OMP PARALLEL private(i,j)
  !$OMP DO 
  do i = 1, n     
     !$OMP FLUSH(found)
     if (found) goto 10
     do j = i+1, n        
        if ("some_condition(i,j)") then
           !$OMP CRITICAL
           !$OMP FLUSH(found)
           if (.not.found) then           
              found = .true.
              result = "here's result"
           end if
           !$OMP FLUSH(found)
           !$OMP END CRITICAL
           goto 10
        end if
     end do
10   continue
  end do
  !$OMP END DO NOWAIT
  !$OMP END PARALLEL

i-loop 上的迭代顺序可以是任意的，只要 some < code>result 已找到（只要满足 "some_condition"，运行之间的变化并不重要）。

原文

Here's sequential code:

do i = 1, n
   do j = i+1, n
      if ("some_condition(i,j)") then
         result = "here's result"
         return
      end if
   end do
end do

Is there a cleaner way to execute iterations of the outer loop concurrently other than:

  !$OMP PARALLEL private(i,j)
  !$OMP DO 
  do i = 1, n     
     !$OMP FLUSH(found)
     if (found) goto 10
     do j = i+1, n        
        if ("some_condition(i,j)") then
           !$OMP CRITICAL
           !$OMP FLUSH(found)
           if (.not.found) then           
              found = .true.
              result = "here's result"
           end if
           !$OMP FLUSH(found)
           !$OMP END CRITICAL
           goto 10
        end if
     end do
10   continue
  end do
  !$OMP END DO NOWAIT
  !$OMP END PARALLEL

The order of iterations over i-loop may be arbitrary as long as some result is found (it doesn't matter if it changes from run to run as long as it satisfies "some_condition").

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

嘴硬脾气大 2024-09-11 22:12:02

看来您的顺序代码具有依赖性，使其不适合并行化。假设 i & 有多个值。 j 使“某些条件”为真 - 那么 i & 的执行顺序是j do 循环确定首先找到这些条件中的哪一个并设置 result 的值，之后 return 语句结束对“某些条件”为 true 的其他情况 i,j 的搜索。在顺序代码中，do 循环始终以相同的顺序执行，因此程序的操作是确定性的，并且 i 和 i 的值相同。总能找到使“某些条件”为真的 j 。在并发版本中，各种循环 i 以非确定性顺序执行，因此从运行到运行不同的 i 值可能是找到真正的“某些条件”的第一个 i 值。

也许作为一名程序员，您知道 i & 只有一个值。 j 导致真正的“某些条件”？在这种情况下，短路执行似乎没问题。但 OpenMP 规范表示“除了 DO 语句之外，关联循环中的任何语句都不会导致分支
循环之外”，因此不允许内循环中的某些内容中止输出循环。如果总是只有一个真实的“某些条件”，则可以删除“返回”并浪费 CPU在找到一种情况后，让线程查找“某种条件”是否成立，这可能仍然比使用缩放器“结果”变量更快，但它仍然可能不合规，因为它依赖于如果您需要找到“某些条件”为 true 的 i 的最小值，您可以将其更改为“减少”，对结果求和，或将结果返回为维度 (n) 的一维数组。，您可以使用 Fortran 内在函数 minloc 从数组结果中获取该值。

具有许多“flush”和“ritic”指令的解决方案可能不会比顺序版本更快。

更新：基于澄清。多个结果是可能的，并且任何结果都可以，一种并行方法是返回多个结果并让顺序代码选择一个——将“结果”放入一维数组而不是缩放器中。您可以短路内部 j 循环，因为它与“omp do”指令不“关联”，因此“结果”只需为 1D，根据 i 的范围确定尺寸。所以像这样：

program test1

integer :: i, j
integer, parameter :: n = 10
integer, dimension (n) :: result

result = -999

!omp parallel default (shared) private (i, j)
!omp do
do i = 1, n
   inner: do j = i+1, n
      if ( mod (i+j,14) == 0 ) then
         result (i) = i
         exit inner
      end if
   end do inner
end do
!omp end do
!omp end parallel

write (*, *) 'All results'
write (*, *) result

write (*, *)
write (*, *) 'One result'
write (*, *) result ( maxloc (result, 1) )

end program test1

It seems that your sequential code has a dependency that makes it unsuitable to being made parallel. Suppose that there are multiple values of i & j that make "some condition" true -- then the order of execution of the i & j do loops determines which of these conditions is found first and sets the value of result, after which the return statement ends the search for additional cases i,j that "some condition" is true. In the sequential code, the do loops always execute in the same order, so the operation of the program is deterministic and identical values of i & j that make "some condition" true will always be found. In a concurrent version, various loops i execute in non-deterministic order, so that from run to run different values of i might be the first i-value that finds a true "some condition".

Perhaps you as a programmer know that there is only one value of i & j that results in a true "some condition"? In that case short-circuiting the execution would seem OK. But the OpenMP spec says that "No statement in the associated loops other than the DO statements can cause a branch
out of the loops" so having the something in the inner loop abort the output loop isn't allowed. If it is the case that there is always only one true "some condition", you could just remove the "return" and waste CPU time by having threads look for "some condition" is true after the one case has been found. That might still be faster than a sequential program. With a scaler "result" variable, it still probably non-compliant, having an dependency on the order of execution. You could change it in to a "reduction", summing the result, or return result as 1-D array of dimension (n). If you need to find the smallest value of i that has "some condition" true, you could obtain that from an array result using the Fortran instrinsic function minloc.

A solution with many "flush" and "critical" directives may not be faster than the sequential version.

UPDATE: Based on the clarification that multiple results are possible and that any will do, one parallel method would be to return mutiple results and let sequential code pick one out -- make "result" into a 1D array rather than a scaler. You are allowed to short-circuit the inner j-loop because it is not "associated" with the "omp do" directive, so "result" need only be 1D, dimensioned according to the range of i. So something like this:

program test1

integer :: i, j
integer, parameter :: n = 10
integer, dimension (n) :: result

result = -999

!omp parallel default (shared) private (i, j)
!omp do
do i = 1, n
   inner: do j = i+1, n
      if ( mod (i+j,14) == 0 ) then
         result (i) = i
         exit inner
      end if
   end do inner
end do
!omp end do
!omp end parallel

write (*, *) 'All results'
write (*, *) result

write (*, *)
write (*, *) 'One result'
write (*, *) result ( maxloc (result, 1) )

end program test1

回复收藏 0 原文

久随 2024-09-11 22:12:02

另一种方法完全是使用 TASK 构造，它是 OpenMP 3.0 的一部分。您似乎想要做的是将循环划分为多个线程，计算直到任何线程找到答案，然后让所有线程停止。问题是，让所有线程检查共享标志的必要性是（a）降低你的性能，（b）导致你进入带有 BREAKS 和 CYCLES 的丑陋循环。

我认为 @MSB 的回答就如何调整现有方法提供了非常好的建议。但是，解决该问题的一种更自然的方法可能是让程序创建多个任务（可能为最内层循环的每次迭代创建一个任务）并将这些任务分派给工作线程。一旦任何线程报告成功，就可以向所有线程发送终结任务，并且您的程序可以继续。

当然，这需要对程序进行更多的重写，并且可能会使顺序执行变得更糟。它肯定会要求您的 OpenMP 实现支持该标准的 v3.0。

您在这方面可能需要比我所能提供的更多帮助，我自己才刚刚开始使用 OpenMP TASKS。

回复收藏 0 原文

眉目亦如画i 2024-09-11 22:12:02

看来 $OMP DO 不允许提前跳出循环。另一种选择可能是手动实现它。

为每个线程提供固定的连续索引范围以处理

以下OpenMP 指南：轻松进行多线程编程C++：

  results = "invalid_value"

  !$OMP PARALLEL private(i,j,thread_num,num_threads,start,end)

  thread_num = OMP_GET_THREAD_NUM()
  num_threads = OMP_GET_NUM_THREADS()
  start = thread_num * n / num_threads + 1
  end = (thread_num + 1) * n / num_threads

  outer: do i = start, end
     !$OMP FLUSH(found)             
     if (found) exit outer
     do j = i+1, n
        if ("some_condition") then
           found = .true.
           !$OMP FLUSH(found)
           results(thread_num+1) = "here's result"
           exit outer
        end if
     end do
  end do outer

  !$OMP END PARALLEL

  ! extract `result` from `results` if any
  do i = 1, size(results)
     if (results(i).ne."invalid_value") result = results(i)
  end do

更新：用exit替换goto，引入基于results数组“https://stackoverflow.com/questions/2979760/how-to-break-out-of-a-nested-parallel-openmp-fortran-loop-idiomatically/2981267#2981267">@MSB 的回答。

如果存在解决方案，则由于更早退出，此方法比 $OMP DO 更快。

一次给每个线程一次迭代来处理

使用任务指令（由 @High Performance Mark)：

  !$OMP PARALLEL
  !$OMP SINGLE
  !$OMP TASK UNTIED
          ! "untied" allows other threads to generate tasks
  do i = 1, n ! i is private
     !$OMP TASK ! implied "flush"
     task:     do j = i+1, n ! i is firstprivate, j is private       
        if (found) exit task
        if ("some_condition(i,j)") then
           !$OMP CRITICAL
           result = "here's result" ! result is shared              
           found = .true.           ! found is shared
           !$OMP END CRITICAL ! implied "flush"
           exit task
        end if
     end do task
     !$OMP END TASK 
  end do 
  !$OMP END TASK
  !$OMP END SINGLE
  !$OMP END PARALLEL

在我的测试中，此变体比带有外部循环的版本快 2 倍。

It seems $OMP DO doesn't allow break out of the loop earlier. An alternative might be to implement it by hand.

Give each thread fixed continuous range of indices to process

Following Guide into OpenMP: Easy multithreading programming for C++:

  results = "invalid_value"

  !$OMP PARALLEL private(i,j,thread_num,num_threads,start,end)

  thread_num = OMP_GET_THREAD_NUM()
  num_threads = OMP_GET_NUM_THREADS()
  start = thread_num * n / num_threads + 1
  end = (thread_num + 1) * n / num_threads

  outer: do i = start, end
     !$OMP FLUSH(found)             
     if (found) exit outer
     do j = i+1, n
        if ("some_condition") then
           found = .true.
           !$OMP FLUSH(found)
           results(thread_num+1) = "here's result"
           exit outer
        end if
     end do
  end do outer

  !$OMP END PARALLEL

  ! extract `result` from `results` if any
  do i = 1, size(results)
     if (results(i).ne."invalid_value") result = results(i)
  end do

UPDATE: replaced goto by exit, introduced results array based on @M. S. B.'s answer.

If solution exists this approach is faster then $OMP DO due to earlier exit.

Give each thread one iteration at a time to process

Using task directive (suggested by @High Performance Mark):

  !$OMP PARALLEL
  !$OMP SINGLE
  !$OMP TASK UNTIED
          ! "untied" allows other threads to generate tasks
  do i = 1, n ! i is private
     !$OMP TASK ! implied "flush"
     task:     do j = i+1, n ! i is firstprivate, j is private       
        if (found) exit task
        if ("some_condition(i,j)") then
           !$OMP CRITICAL
           result = "here's result" ! result is shared              
           found = .true.           ! found is shared
           !$OMP END CRITICAL ! implied "flush"
           exit task
        end if
     end do task
     !$OMP END TASK 
  end do 
  !$OMP END TASK
  !$OMP END SINGLE
  !$OMP END PARALLEL

This variant is 2 times faster on my tests than the version with the outer-loop.

回复收藏 0 原文

~没有更多了~