OpenMP 中的循环重塑和 matmul

发布于 2025-01-09 06:09:31 字数 1525 浏览 1 评论 0原文

我正在调试一些并行代码，发现重塑操作搞乱了 OpenMP。这是重现该问题的演示。我还不太熟悉使用 OpenMP，所以我想知道我在这里做错了什么的原因，以及是否有更好的方法来做事情（即如何最好地将 reshape 和 matmul 嵌套在 do 循环中））。我已将 OpenBLAS 视为一种潜在的解决方案，但首先想知道原因。预先感谢您

program unittest

    complex*16, save, dimension(10,10) :: testmat
    integer :: i
    real :: t0, t1, t2

    !$call OMP_set_num_threads(12);
    !$call OMP_set_dynamic(.FALSE.);
    testmat = 0.d0;

    call cpu_time(t0);
    !$OMP parallel
    !$OMP DO
    do i=1,1000000
         testmat = reshape(reshape(testmat,(/100,1/)),(/10,10/));
    end do
    !$OMP END DO
    !$OMP end parallel
    call cpu_time(t1);
    do i=1,1000000
         testmat = reshape(reshape(testmat,(/100,1/)),(/10,10/));
    end do
    call cpu_time(t2);
    print *, 'parallel time, ', t1-t0, 's, single thread time, ', t2-t1, 's'

end program unittest

在 MinGW 上用 gfortran 编译。我的机器上的输出

（并行）为 10.01 s （单线程）0.328 s

CPU 寄存器在并行情况下总体使用率低于 20%，这可能意味着有什么东西阻碍了 OpenMP？

====================

编辑：

谢谢。一些澄清，以下是好的，例如，并行版本的运行速度并不慢（两者完成的时间大约相同），

    !$OMP parallel private(testmat2)
    !$OMP DO
    do i=1,1000000
        testmat2 = testmat * 10.d0;
    end do
    !$OMP END DO
    !$OMP end parallel

但是并行版本的运行速度比单线程慢得多（需要并行时间比单次多 50 倍）

    !$OMP parallel private(testmat2)
    !$OMP DO
    do i=1,1000000
        testmat2 = reshape(reshape(testmat,(/100,1/)),(/10,10/));
    end do
    !$OMP END DO
    !$OMP end parallel

那么……导致这种情况的 reshape 有何特别之处？

原文

I was debugging some piece of parallel code and found a reshape operation messed up OpenMP. This is a demo to reproduce the issue. I am not very familiar with using OpenMP yet so I'd like to know the reason about what am I doing wrong here, and if there is a better way to do things, (i.e. how best to have reshape and matmul nested in do loops). I have read OpenBLAS as a potential solution but would first like to know why. Thank you in advance

program unittest

    complex*16, save, dimension(10,10) :: testmat
    integer :: i
    real :: t0, t1, t2

    !$call OMP_set_num_threads(12);
    !$call OMP_set_dynamic(.FALSE.);
    testmat = 0.d0;

    call cpu_time(t0);
    !$OMP parallel
    !$OMP DO
    do i=1,1000000
         testmat = reshape(reshape(testmat,(/100,1/)),(/10,10/));
    end do
    !$OMP END DO
    !$OMP end parallel
    call cpu_time(t1);
    do i=1,1000000
         testmat = reshape(reshape(testmat,(/100,1/)),(/10,10/));
    end do
    call cpu_time(t2);
    print *, 'parallel time, ', t1-t0, 's, single thread time, ', t2-t1, 's'

end program unittest

Compiled with gfortran on MinGW. Output on my machine is

(with parallel) 10.01 s
(single thread) 0.328 s

CPU registers less than 20% usage overall for the parallel case which probably means something is holding up OpenMP?

====================

Edit:

Thank you. Some clarification, the following is okay-ish, as in, the parallel version does not run slower (both completes around the same amount of time)

    !$OMP parallel private(testmat2)
    !$OMP DO
    do i=1,1000000
        testmat2 = testmat * 10.d0;
    end do
    !$OMP END DO
    !$OMP end parallel

but this runs much slower in parallel than on single thread (takes 50x more time in parallel than single)

    !$OMP parallel private(testmat2)
    !$OMP DO
    do i=1,1000000
        testmat2 = reshape(reshape(testmat,(/100,1/)),(/10,10/));
    end do
    !$OMP END DO
    !$OMP end parallel

So... what is special about reshape that causes this?

分享到QQ

分享到微博