使用 OpenMP 时发生内存泄漏
下面的测试用例在使用 OpenMP 时在 32 位机器上的“post MT 部分”消息后面的循环中出现内存不足(抛出 std::bad_alloc),但是,如果 OpenMP 的 #pragmas 被注释掉,则代码将运行直到完成都很好,所以看起来当在并行线程中分配内存时,它没有正确释放,因此我们耗尽了内存。
问题是下面的内存分配和删除代码是否有问题,或者这是 gcc v4.2.2 或 OpenMP 中的错误?我也尝试了 gcc v4.3 并遇到了同样的失败。
int main(int argc, char** argv)
{
std::cout << "start " << std::endl;
{
std::vector<std::vector<int*> > nts(100);
#pragma omp parallel
{
#pragma omp for
for(int begin = 0; begin < int(nts.size()); ++begin) {
for(int i = 0; i < 1000000; ++i) {
nts[begin].push_back(new int(5));
}
}
}
std::cout << " pre delete " << std::endl;
for(int begin = 0; begin < int(nts.size()); ++begin) {
for(int j = 0; j < nts[begin].size(); ++j) {
delete nts[begin][j];
}
}
}
std::cout << "post MT section" << std::endl;
{
std::vector<std::vector<int*> > nts(100);
int begin, i;
try {
for(begin = 0; begin < int(nts.size()); ++begin) {
for(i = 0; i < 2000000; ++i) {
nts[begin].push_back(new int(5));
}
}
} catch (std::bad_alloc &e) {
std::cout << e.what() << std::endl;
std::cout << "begin: " << begin << " i: " << i << std::endl;
throw;
}
std::cout << "pre delete 1" << std::endl;
for(int begin = 0; begin < int(nts.size()); ++begin) {
for(int j = 0; j < nts[begin].size(); ++j) {
delete nts[begin][j];
}
}
}
std::cout << "end of prog" << std::endl;
char c;
std::cin >> c;
return 0;
}
The below test case runs out of memory on 32 bit machines (throwing std::bad_alloc) in the loop following the "post MT section" message when OpenMP is used, however, if the #pragmas for OpenMP are commented out, the code runs through to completion fine, so it appears that when the memory is allocated in parallel threads, it does not free correctly and thus we run out of memory.
Question is whether there is something wrong with the memory allocation and deletion code below or is this a bug in gcc v4.2.2 or OpenMP? I also tried gcc v4.3 and got same failure.
int main(int argc, char** argv)
{
std::cout << "start " << std::endl;
{
std::vector<std::vector<int*> > nts(100);
#pragma omp parallel
{
#pragma omp for
for(int begin = 0; begin < int(nts.size()); ++begin) {
for(int i = 0; i < 1000000; ++i) {
nts[begin].push_back(new int(5));
}
}
}
std::cout << " pre delete " << std::endl;
for(int begin = 0; begin < int(nts.size()); ++begin) {
for(int j = 0; j < nts[begin].size(); ++j) {
delete nts[begin][j];
}
}
}
std::cout << "post MT section" << std::endl;
{
std::vector<std::vector<int*> > nts(100);
int begin, i;
try {
for(begin = 0; begin < int(nts.size()); ++begin) {
for(i = 0; i < 2000000; ++i) {
nts[begin].push_back(new int(5));
}
}
} catch (std::bad_alloc &e) {
std::cout << e.what() << std::endl;
std::cout << "begin: " << begin << " i: " << i << std::endl;
throw;
}
std::cout << "pre delete 1" << std::endl;
for(int begin = 0; begin < int(nts.size()); ++begin) {
for(int j = 0; j < nts[begin].size(); ++j) {
delete nts[begin][j];
}
}
}
std::cout << "end of prog" << std::endl;
char c;
std::cin >> c;
return 0;
}
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
将第一个 OpenMP 循环从 1000000 更改为 2000000 将导致相同的错误。这表明内存不足问题与 OpenMP 堆栈限制有关。
尝试在 bash 中将 OpenMP 堆栈限制设置为 unlimit
您还可以更改 OpenMP 环境变量 OMP_STACKSIZE 并将其设置为 100MB 或更多。
更新 1:我将第一个循环更改为
然后,我在主线程上的 i=1574803 处收到内存错误。
更新 2:如果您使用英特尔编译器,您可以将以下内容添加到代码顶部,它将解决问题(前提是您有足够的内存来承受额外的开销)。
更新 3:为了完整性,就像另一位成员提到的那样,如果您正在执行一些数值计算,最好在单个 new float[1000000] 中预分配所有内容,而不是使用 OpenMP 进行 1000000 次分配。这也适用于分配对象。
Changing the first OpenMP loop from 1000000 to 2000000 will cause the same error. This indicates that the out of memory problem is with OpenMP stack limit.
Try setting the OpenMP stack limit to unlimit in bash with
You can also change the OpenMP environment variable OMP_STACKSIZE and setting it to 100MB or more.
UPDATE 1: I change the first loop to
Then, I get a memory error at i=1574803 on the Main thread.
UPDATE 2: If you are using the Intel compiler, you can add the following to the top of your code and it will solve the problem (providing you have enough memory for the extra overhead).
UPDATE 3: For completeness, like mentioned by another member, if you are performing some numerical computation, it is best to preallocate everything in a single new float[1000000] instead of using OpenMP to do 1000000 allocations. This applies to allocating objects as well.
我在其他地方发现了这个问题,没有使用 OpenMP,而只是使用 pthreads。多线程时的额外内存消耗似乎是标准内存分配器的典型行为。通过切换到 Hoard 分配器,额外的内存消耗就会消失。
I found this issue elsewhere seen without OpenMP but just using pthreads. The extra memory consumption when multi-threaded appears to be typical behavior for the standard memory allocator. By switching to the Hoard allocator the extra memory consumption goes away.
为什么使用
int*
作为内部向量成员?这是非常浪费的 - 每个vector
条目都有 4 个字节(严格来说是sizeof(int)
)的数据和 2-3 倍的堆控制结构。仅使用vector
尝试一下,看看它是否运行得更好。我不是 OpenMP 专家,但这种用法因其不对称性而显得很奇怪 - 您在并行部分中填充向量并在非并行代码中清除它们。无法告诉你这是否错误,但它“感觉”是错误的。
Why are you using
int*
as the inner vector member? That's very wasteful - you have 4 bytes (sizeof(int)
, strictly) of data and 2-3 times more again of heap control structure for everyvector
entry. Try this just usingvector<int>
and see if it runs better.I'm not an OpenMP expert but this usage seems weird in its asymmetry - you fill the vectors in parallel section and clear them in non-parallel code. Cannot tell you whether that's wrong, but it 'feels' wrong.