OpenMP C++双核笔记本电脑的并行性能优于八核集群

发布于 2024-11-11 07:07:04 字数 1689 浏览 1 评论 0原文

首先，OpenMP 显然只在集群中的一个主板上运行，在这种情况下，每个主板都有两个 2GHz 的四核 Xeons E5405 及其运行的 Scientific Linux 5.3（2009 年发布，基于 Red Hat）。另一方面，我的笔记本电脑配备 core2duo T7300，主频为 2GHz，运行 Windows 7。两台机器都没有超线程。

主要问题是，我的 OOP 代码通常在两个系统中串行运行大约 2 分钟，但是当我在嵌套循环中实现 OpenMP 时，它会在我的笔记本电脑中实现预期的时间减少（当使用 2 个线程时），并且显着减少服务器中的时间增加（例如，使用两个线程大约 5 分钟）。

有两个类，“立方体”和“空间”。空间包含立方体的三维数组 (20x20x20)，我尝试并行化的代码是一个三向嵌套循环，它为每个立方体调用立方体的成员函数。该成员函数具有三个参数（双精度），并根据每个多维数据集的私有变量进行一些计算。

inline void space::cubes_refresh(const double vsx, const double vsy, const double vsz) {
int loopx, loopy, loopz;
#pragma omp parallel private(loopx, loopy, loopz)
{
    #pragma omp for schedule(guided,1) nowait 
    for(loopx=0 ; loopx<cubes_w ; loopx++) {
        for(loopy=0 ; loopy<cubes_h ; loopy++) {
            for(loopz=0 ; loopz<cubes_d ; loopz++) {
                // Refreshing the values in source
                if ( (loopx==source_x)&&(loopy==source_y)&&(loopz==source_z) )
                    cube_array[loopx][loopy][loopz].refresh(0.0,0.0,vsz);
                // refresh everything else
                else
                    cube_array[loopx][loopy][loopz].refresh(0.0,0.0,0.0);
            }
        }
    }   // End of loop
}

我不知道问题可能出在哪里，正如我之前所说，在我的笔记本电脑中，我看到了预期的性能改进，但服务器中完全相同的代码却明显变差。这些是我在笔记本电脑中使用的标志（已尝试使用完全相同的标志，但什么也没有）：

g++ -std=c++98 -fopenmp -O3 -Wl,--enable-auto-import -pedantic main.cpp -o parallel_openmp

在服务器中：

g++ -std=c++98 -fopenmp -O3 -W -pedantic main.cpp -o parallel_openmp

我正在运行 gcc 版本 4.5.0，服务器正在运行 4.1.2，我不知道服务器中的 OpenMP 版本，因为我不知道如何检查它，我认为是 3.0 之前的版本，因为循环崩溃不起作用。这可能是问题所在吗？

原文

First of all, OpenMP obviously only runs in one of the motherboards in the cluster, in this case each motherboard has two quad-core Xeons E5405 at 2GHz and its running Scientific Linux 5.3 (released in 2009, red hat based). My laptop on the other hand a has core2duo T7300 at 2GHz running windows 7. No hyperthreading in either machine.

The main problem is that I have OOP code that generally runs for around 2min in serial in both systems, but when I implement OpenMP in a nested loop it experieces an expected reduction in time in my laptop (when 2 threads are used) and a significant increase in time in the server (around 5min with two threads, for example).

There are two classes, "cube" and "space". Space contains a three dimensional array (20x20x20) of cubes and the code that I am trying to parallelise is a three way nested loop that calls a member function of cube for each of the cubes. This member function has three arguments (doubles) and does some calculations based on the private variables of each cube.

inline void space::cubes_refresh(const double vsx, const double vsy, const double vsz) {
int loopx, loopy, loopz;
#pragma omp parallel private(loopx, loopy, loopz)
{
    #pragma omp for schedule(guided,1) nowait 
    for(loopx=0 ; loopx<cubes_w ; loopx++) {
        for(loopy=0 ; loopy<cubes_h ; loopy++) {
            for(loopz=0 ; loopz<cubes_d ; loopz++) {
                // Refreshing the values in source
                if ( (loopx==source_x)&&(loopy==source_y)&&(loopz==source_z) )
                    cube_array[loopx][loopy][loopz].refresh(0.0,0.0,vsz);
                // refresh everything else
                else
                    cube_array[loopx][loopy][loopz].refresh(0.0,0.0,0.0);
            }
        }
    }   // End of loop
}

I don't know where the problem could be, as I have said before, in my laptop I see an expected improvement in performance, but exactly the same code in the server does significantly worse.
These are the flags I use in my laptop (have tried using exactly the same flags, but nothing):

g++ -std=c++98 -fopenmp -O3 -Wl,--enable-auto-import -pedantic main.cpp -o parallel_openmp

And in the server:

g++ -std=c++98 -fopenmp -O3 -W -pedantic main.cpp -o parallel_openmp

I'm running gcc version 4.5.0 and the server is running 4.1.2, I don' know the OpenMP version in the server as I don't know how to check it, I think is a version before 3.0 as the collapse in loops does not work. Could this be the problem?

分享到QQ

分享到微博