Valgrind/massif未检测到Pthread TLS灾难中的内存释放
症状: 我将TLS密钥分配给destructor,创建一束线程,然后将TLS键传递给每个线程。每个线程都会分配内存并将其指针设置为TLS,TLS Destructor会处理内存。我等待线程在App退出之前完成。 该应用程序是在Valgrind/massif下运行的,该应用程序报告该内存未划分。
int main(int argc, char **argv)
{
pthread_key_t* key = new pthread_key_t();
pthread_key_create(key, my_destructor);
pthread_t threads[32000];
for(int i=0; i<32000; ++i)
pthread_create(&threads[i], NULL, my_thread, key);
for(int i=0; i<32000; ++i)
pthread_join(threads[i], NULL);
return 0;
}
在线程跑步者中,我分配内存并在TLS中进行设置:
extern "C" void* my_thread(void* p)
{
pthread_setspecific(*(pthread_key_t*)p, malloc(100));
return NULL;
}
在TLS Destructor中,我发布内存:
extern "C" void my_destructor(void *p)
{
free(p);
}
我在Valgrind/massif 3.19下运行以下选项:
--tool=massif
--heap=yes
--pages-as-heap=yes
--log-file=/tmp/my.log
--massif-out-file=/tmp/my.massif.log
然后我运行MS_PRINT/TMP/MY MY MY MY MY .massif.log
。我收到的泄漏如下:
| ->01.75% (67,108,864B) 0x76F92D0: new_heap (in /usr/lib64/libc-2.17.so)
| | ->01.75% (67,108,864B) 0x76F98D3: arena_get2.isra.3 (in /usr/lib64/libc-2.17.so)
| | ->01.75% (67,108,864B) 0x76FF77D: malloc (in /usr/lib64/libc-2.17.so)
| | ->01.75% (67,108,864B) 0x410300: my_thread (threadsT.cpp:136)
| | ...
| | <skipped by author>
| | ...
| |
| ->00.00% (73,728B) in 1+ places, all below ms_print's threshold (01.00%)
...虽然我不希望报告的任何东西泄漏。
我将仪器添加到my_destructor中,并手动验证了:
- 它被调用了,确实
- 使内存交易了,因为应该做
一些我在这里很明显做错了,这会使Valgrind/Massif报告这些? 是从TLS驱动器调用时无法检测到内存DEATLOCATION的阀/地段限制吗?
用GCC 4.9.4
在Red Hat Enterprise Linux服务器版本7.9(Maipo)
上构建和运行。
Symptoms:
I allocate TLS key with a destructor, create a bundle of threads and pass the TLS key to each thread. Each thread allocates memory and sets its pointer in TLS, the TLS destructor deallocates memory. I wait for threads to finish before app exits.
The app is run under valgrind/massif that reports this memory not deallocated.
int main(int argc, char **argv)
{
pthread_key_t* key = new pthread_key_t();
pthread_key_create(key, my_destructor);
pthread_t threads[32000];
for(int i=0; i<32000; ++i)
pthread_create(&threads[i], NULL, my_thread, key);
for(int i=0; i<32000; ++i)
pthread_join(threads[i], NULL);
return 0;
}
In the thread runner I allocate the memory and set it up in the TLS:
extern "C" void* my_thread(void* p)
{
pthread_setspecific(*(pthread_key_t*)p, malloc(100));
return NULL;
}
In the TLS destructor, I release the memory:
extern "C" void my_destructor(void *p)
{
free(p);
}
I run this under valgrind/massif 3.19 with the following options:
--tool=massif
--heap=yes
--pages-as-heap=yes
--log-file=/tmp/my.log
--massif-out-file=/tmp/my.massif.log
Then I run ms_print /tmp/my.massif.log
. I am getting the leaks reported like the following:
| ->01.75% (67,108,864B) 0x76F92D0: new_heap (in /usr/lib64/libc-2.17.so)
| | ->01.75% (67,108,864B) 0x76F98D3: arena_get2.isra.3 (in /usr/lib64/libc-2.17.so)
| | ->01.75% (67,108,864B) 0x76FF77D: malloc (in /usr/lib64/libc-2.17.so)
| | ->01.75% (67,108,864B) 0x410300: my_thread (threadsT.cpp:136)
| | ...
| | <skipped by author>
| | ...
| |
| ->00.00% (73,728B) in 1+ places, all below ms_print's threshold (01.00%)
...while I would not expect anything reported leaked at all.
I added the instrumentation to my_destructor and manually verified that:
- it is invoked, indeed
- it deallocates the memory, as it is supposed to do
Is there something apparent I am doing wrong here that makes valgrind/massif report these?
Is it a valgrind/massif limitation that it cannot detect the memory deallocation when invoked from TLS destructors?
Building and running that with gcc 4.9.4
on Red Hat Enterprise Linux Server release 7.9 (Maipo)
.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
第二个答案,这次集中在“泄漏”方面。
massif实际上并不是泄漏检测器。它用于分析堆的使用。
如果我编译了示例(带有320个线程),那么最后,我将获得约8900万个字节。 由75%的竞技场组成的
Malloc使用的竞技场是从start_thread
9%pthread_create
15%加载共享库,
这对我来说似乎都不是一个问题。我假设start_thread内存是pthread堆栈缓存。
如果我使用massif分析malloc/new,那么最后一个样本是
A second answer, this time concentrating on the 'leak' aspect.
Massif isn't really a leak detector. It's for profiling heap use.
If I compile the example (with 320 threads) then at the end I get about 89 million bytes still allocated. That is made up of
75% the arena used by malloc called from start_thread
9% pthread_create
15% loading shared libraries
None of that looks like much of a concern to me. I assume that the start_thread memory is the pthread stack cache.
If I use massif for profiling malloc/new, then the last sample is
您应该检查线程创建的返回状态。您不太可能成功创建32000个线程。
一些valgrind来源:
假设这是AMD64 Linux,我相信默认的pthread堆栈大小为800万。这意味着您需要256 gbytes才能进行堆栈内存。您的机器有那么多吗?
请尝试以下
pthread_attr_setstacksize
将堆栈大小设置为pthread_stack_min
(16k)。即使在上面的情况下,您仍然可能会达到一些valgrind限制,例如VG_N_SEGMENTS。
之类的消息
如果您看到“ Valgrind:Fatal:VG_N_Segments” 。
增加它并重建。
退出。
现在
You should check the return status for your thread creation. It's unlikely that you are succeeding in creating 32000 threads.
A bit of Valgrind source:
Assuming that this is amd64 Linux, I believe that the default pthread stack size is 8Mbytes. That means you need 256Gbytes for stack memory. Does your machine have that much?
Please try the following
pthread_attr_setstacksize
to set the stack sizes toPTHREAD_STACK_MIN
(16k).Even with the above you may still hit some Valgrind limits such as VG_N_SEGMENTS.
If you see a message like
"Valgrind: FATAL: VG_N_SEGMENTS is too low.
Increase it and rebuild.
Exiting now."
Then you will need to rebuild Valgrind with an increased limit.