用于独立任务的 Fortran 和 OpenMP 线程组

发布于 2025-01-20 23:47:46 字数 2657 浏览 2 评论 0原文

我需要使用 OpenMP 运行两个独立的任务。其中一个比另一个更复杂，因此最好分割可用线程，以便更复杂的任务使用更多线程。这两个任务完成后，我需要使用它们的输出。我不完全确定这是否可以用 OpenMP 来完成，所以任何建议都会非常有用。

这是试图说明我需要什么。有两个独立的子例程，具有单独的输入和输出。子例程 mysub2 比 mysub1 更复杂。它有多个嵌套循环，因此有更多线程运行它会带来更多好处。在 6 个线程中，我想同时分配其中 2 个线程执行 mysub1，并将其中 4 个线程分配给 mysub2。获得每个子程序输出 z1 和 z2 后，它们都用于计算 z3。

在这次尝试中，我尝试将线程 0 和 1 分配给任务 1，将其他 4 个线程分配给任务 2。显然，这不会按预期工作，因为它运行 mysub1 两次，mysub2 四次，但我不知道如何实现我需要的。

module mymod
implicit none
contains
    subroutine mysub1(x1,y1,z1)
        ! Element-wise product of vectors
        real,intent(in)    :: x1(:),y1(:)
        real,intent(out)   :: z1(size(x1))
        integer            :: i
        !$omp parallel do private(i)
        do i = 1,size(x1)
            z1(i) = x1(i) * y1(i)
        end do
        !$omp end parallel do
        print *, 'Done with mysub1'
    end subroutine mysub1
    
    subroutine mysub2(x2,y2,z2)
        ! Matrix multiplication
        real,intent(in)    :: x2(:,:),y2(:,:)
        real,intent(out)   :: z2(size(x2,1),size(y2,2))
        integer            :: i,j
        !$omp parallel do private(i,j)
        do i = 1,size(x2,1)
            do j = 1,size(y2,2)
                z2(i,j) = dot_product(x2(i,:), y2(:,j))
            end do
        end do
        !$omp end parallel do
        print *, 'Done with mysub2'
    end subroutine mysub2 
end module mymod


program main
    use omp_lib
    use mymod
    implicit none
    integer           :: tid
    integer,parameter :: m = 2
    integer,parameter :: n = 3
    integer,parameter :: p = 4
    real              :: x1(m),y1(m),z1(m)
    real              :: x2(m,n),y2(n,p),z2(m,p),z3
    
    ! Setting total number of threads to 6
    call omp_set_num_threads(6)
    
    ! Assigning arbitrary values for illustration purposes
    x1 = 1.0
    y1 = 2.0
    x2 = 3.0
    y2 = 4.0    
    
    !$omp parallel private(tid)
        ! Getting thread number
        tid = omp_get_thread_num()
    
        if ((tid == 0) .or. (tid == 1)) then
            ! Task 1 to be executed in two threads, tid = 0,1
            call mysub1(x1,y1,z1)
        else
            ! Task 2 to be executed in four threads, tid = 2,3,4,5
            call mysub2(x2,y2,z2)
        end if    
    !$omp end parallel
        
    ! Using z1 and z2 (serially, no need to parallelize)
    z3 = sum(z1) + sum(z2)
    print *, 'Final output', z3
        
end program main

当然，这只是一个例子。我知道我不需要使用 mysub2 来进行矩阵乘法。我只是想说明 mysub2 更复杂，因此，最好为其使用更多线程，而不必粘贴我拥有的数百行实际代码。

原文

I need to run two independent tasks using OpenMP. One of them is way more involved than the other, so it would be ideal to split the available threads such that the more complicated task uses more of them. After these two tasks are finished, I need to use both of their outputs. I am not entirely sure if this can be done with OpenMP, so any suggestion would be very useful.

This is an attempt to illustrate what I need. There are two independent suboutines with separate inputs and outputs. Subroutine mysub2 is more complex than mysub1. It has multiple nested loops, so it would benefit more from having more threads running it. Out of 6 threads, I would like to assign 2 of them to execute mysub1, and 4 of them to mysub2, simultaneously. After getting each subroutine outputs, z1 and z2, both of them are used to compute z3.

In this attempt I was trying to assign threads 0 and 1 to task 1, and the other 4 to task 2. Obviously, this doesn't work as intended because it runs mysub1 twice and mysub2 four times, but I have no idea how to achieve what I need.

module mymod
implicit none
contains
    subroutine mysub1(x1,y1,z1)
        ! Element-wise product of vectors
        real,intent(in)    :: x1(:),y1(:)
        real,intent(out)   :: z1(size(x1))
        integer            :: i
        !$omp parallel do private(i)
        do i = 1,size(x1)
            z1(i) = x1(i) * y1(i)
        end do
        !$omp end parallel do
        print *, 'Done with mysub1'
    end subroutine mysub1
    
    subroutine mysub2(x2,y2,z2)
        ! Matrix multiplication
        real,intent(in)    :: x2(:,:),y2(:,:)
        real,intent(out)   :: z2(size(x2,1),size(y2,2))
        integer            :: i,j
        !$omp parallel do private(i,j)
        do i = 1,size(x2,1)
            do j = 1,size(y2,2)
                z2(i,j) = dot_product(x2(i,:), y2(:,j))
            end do
        end do
        !$omp end parallel do
        print *, 'Done with mysub2'
    end subroutine mysub2 
end module mymod


program main
    use omp_lib
    use mymod
    implicit none
    integer           :: tid
    integer,parameter :: m = 2
    integer,parameter :: n = 3
    integer,parameter :: p = 4
    real              :: x1(m),y1(m),z1(m)
    real              :: x2(m,n),y2(n,p),z2(m,p),z3
    
    ! Setting total number of threads to 6
    call omp_set_num_threads(6)
    
    ! Assigning arbitrary values for illustration purposes
    x1 = 1.0
    y1 = 2.0
    x2 = 3.0
    y2 = 4.0    
    
    !$omp parallel private(tid)
        ! Getting thread number
        tid = omp_get_thread_num()
    
        if ((tid == 0) .or. (tid == 1)) then
            ! Task 1 to be executed in two threads, tid = 0,1
            call mysub1(x1,y1,z1)
        else
            ! Task 2 to be executed in four threads, tid = 2,3,4,5
            call mysub2(x2,y2,z2)
        end if    
    !$omp end parallel
        
    ! Using z1 and z2 (serially, no need to parallelize)
    z3 = sum(z1) + sum(z2)
    print *, 'Final output', z3
        
end program main

Of course, this is just an example. I know I don't need to use mysub2 to do matrix multiplication. I'm just trying to illustrate that mysub2 is more complex and hence, it would be ideal to use more threads for it, without having to paste several hundred lines of the actual code I have.

分享到QQ

分享到微博