使用 OpenMP 时发生内存泄漏

发布于 2024-10-05 20:19:29 字数 2159 浏览 2 评论 0原文

下面的测试用例在使用 OpenMP 时在 32 位机器上的“post MT 部分”消息后面的循环中出现内存不足（抛出 std::bad_alloc），但是，如果 OpenMP 的 #pragmas 被注释掉，则代码将运行直到完成都很好，所以看起来当在并行线程中分配内存时，它没有正确释放，因此我们耗尽了内存。

问题是下面的内存分配和删除代码是否有问题，或者这是 gcc v4.2.2 或 OpenMP 中的错误？我也尝试了 gcc v4.3 并遇到了同样的失败。

int main(int argc, char** argv)
{
    std::cout << "start " << std::endl;

    {
            std::vector<std::vector<int*> > nts(100);
            #pragma omp parallel
            {
                    #pragma omp for
                    for(int begin = 0; begin < int(nts.size()); ++begin) {
                            for(int i = 0; i < 1000000; ++i) {
                                    nts[begin].push_back(new int(5));
                            }
                    }
            }

    std::cout << "  pre delete " << std::endl;
            for(int begin = 0; begin < int(nts.size()); ++begin) {
                    for(int j = 0; j < nts[begin].size(); ++j) {
                            delete nts[begin][j];
                    }
            }
    }
    std::cout << "post MT section" << std::endl;
    {
            std::vector<std::vector<int*> > nts(100);
            int begin, i;
            try {
              for(begin = 0; begin < int(nts.size()); ++begin) {
                    for(i = 0; i < 2000000; ++i) {
                            nts[begin].push_back(new int(5));
                    }
              }
            } catch (std::bad_alloc &e) {
                    std::cout << e.what() << std::endl;
                    std::cout << "begin: " << begin << " i: " << i << std::endl;
                    throw;
            }
            std::cout << "pre delete 1" << std::endl;

            for(int begin = 0; begin < int(nts.size()); ++begin) {
                    for(int j = 0; j < nts[begin].size(); ++j) {
                            delete nts[begin][j];
                    }
            }
    }

    std::cout << "end of prog" << std::endl;

    char c;
    std::cin >> c;

    return 0;
}

原文

The below test case runs out of memory on 32 bit machines (throwing std::bad_alloc) in the loop following the "post MT section" message when OpenMP is used, however, if the #pragmas for OpenMP are commented out, the code runs through to completion fine, so it appears that when the memory is allocated in parallel threads, it does not free correctly and thus we run out of memory.

Question is whether there is something wrong with the memory allocation and deletion code below or is this a bug in gcc v4.2.2 or OpenMP? I also tried gcc v4.3 and got same failure.

int main(int argc, char** argv)
{
    std::cout << "start " << std::endl;

    {
            std::vector<std::vector<int*> > nts(100);
            #pragma omp parallel
            {
                    #pragma omp for
                    for(int begin = 0; begin < int(nts.size()); ++begin) {
                            for(int i = 0; i < 1000000; ++i) {
                                    nts[begin].push_back(new int(5));
                            }
                    }
            }

    std::cout << "  pre delete " << std::endl;
            for(int begin = 0; begin < int(nts.size()); ++begin) {
                    for(int j = 0; j < nts[begin].size(); ++j) {
                            delete nts[begin][j];
                    }
            }
    }
    std::cout << "post MT section" << std::endl;
    {
            std::vector<std::vector<int*> > nts(100);
            int begin, i;
            try {
              for(begin = 0; begin < int(nts.size()); ++begin) {
                    for(i = 0; i < 2000000; ++i) {
                            nts[begin].push_back(new int(5));
                    }
              }
            } catch (std::bad_alloc &e) {
                    std::cout << e.what() << std::endl;
                    std::cout << "begin: " << begin << " i: " << i << std::endl;
                    throw;
            }
            std::cout << "pre delete 1" << std::endl;

            for(int begin = 0; begin < int(nts.size()); ++begin) {
                    for(int j = 0; j < nts[begin].size(); ++j) {
                            delete nts[begin][j];
                    }
            }
    }

    std::cout << "end of prog" << std::endl;

    char c;
    std::cin >> c;

    return 0;
}

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

清音悠歌 2024-10-12 20:19:29

将第一个 OpenMP 循环从 1000000 更改为 2000000 将导致相同的错误。这表明内存不足问题与 OpenMP 堆栈限制有关。

尝试在 bash 中将 OpenMP 堆栈限制设置为 unlimit

ulimit -s unlimited

您还可以更改 OpenMP 环境变量 OMP_STACKSIZE 并将其设置为 100MB 或更多。

更新 1：我将第一个循环更改为

{
    std::vector<std::vector<int*> > nts(100);
    #pragma omp for schedule(static) ordered
    for(int begin = 0; begin < int(nts.size()); ++begin) {
        for(int i = 0; i < 2000000; ++i) {
            nts[begin].push_back(new int(5));
        }
    }

    std::cout << "  pre delete " << std::endl;
    for(int begin = 0; begin < int(nts.size()); ++begin) {
        for(int j = 0; j < nts[begin].size(); ++j) {
            delete nts[begin][j]
        }
    }
}

然后，我在主线程上的 i=1574803 处收到内存错误。

更新 2：如果您使用英特尔编译器，您可以将以下内容添加到代码顶部，它将解决问题（前提是您有足够的内存来承受额外的开销）。

std::cout << "Previous stack size " << kmp_get_stacksize_s() << std::endl;
kmp_set_stacksize_s(1000000000);
std::cout << "Now stack size " << kmp_get_stacksize_s() << std::endl;

更新 3：为了完整性，就像另一位成员提到的那样，如果您正在执行一些数值计算，最好在单个 new float[1000000] 中预分配所有内容，而不是使用 OpenMP 进行 1000000 次分配。这也适用于分配对象。

Changing the first OpenMP loop from 1000000 to 2000000 will cause the same error. This indicates that the out of memory problem is with OpenMP stack limit.

Try setting the OpenMP stack limit to unlimit in bash with

ulimit -s unlimited

You can also change the OpenMP environment variable OMP_STACKSIZE and setting it to 100MB or more.

UPDATE 1: I change the first loop to

{
    std::vector<std::vector<int*> > nts(100);
    #pragma omp for schedule(static) ordered
    for(int begin = 0; begin < int(nts.size()); ++begin) {
        for(int i = 0; i < 2000000; ++i) {
            nts[begin].push_back(new int(5));
        }
    }

    std::cout << "  pre delete " << std::endl;
    for(int begin = 0; begin < int(nts.size()); ++begin) {
        for(int j = 0; j < nts[begin].size(); ++j) {
            delete nts[begin][j]
        }
    }
}

Then, I get a memory error at i=1574803 on the Main thread.

UPDATE 2: If you are using the Intel compiler, you can add the following to the top of your code and it will solve the problem (providing you have enough memory for the extra overhead).

std::cout << "Previous stack size " << kmp_get_stacksize_s() << std::endl;
kmp_set_stacksize_s(1000000000);
std::cout << "Now stack size " << kmp_get_stacksize_s() << std::endl;

UPDATE 3: For completeness, like mentioned by another member, if you are performing some numerical computation, it is best to preallocate everything in a single new float[1000000] instead of using OpenMP to do 1000000 allocations. This applies to allocating objects as well.

回复收藏 0 原文