什么是安全的最大堆栈大小或如何衡量堆栈的使用?
我有一个具有多个工作线程的应用程序,每个核心一个。在现代 8 核机器上,我有 8 个这样的线程。我的应用程序加载了许多插件,它们也有自己的工作线程。由于该应用程序使用巨大的内存块(照片,例如 200 MB),因此我遇到了内存碎片问题(32 位应用程序)。问题是每个线程分配 {$MAXSTACKSIZE ...} 的地址空间。它不使用物理内存,而是使用地址空间。 我将 MAXSTACKSIZE 从 1 MB 减少到 128 KB,它似乎有效,但我不知道是否已接近极限。是否有可能测量实际使用了多少堆栈?
I have an app with a number of worker threads, one for each core. On a modern 8 core machine, I have 8 of these threads. My app loads many plugins, which also have their own worker threads. Because the app uses huge blocks of memory (photos, e.g. 200 MB) I have a memory fragmentation problem (32 bit app). The problem is that every thread allocates the {$MAXSTACKSIZE ...} of address space. It's not using the physical memory but the address space.
I reduced the MAXSTACKSIZE from 1 MB to 128 KB, and it seems to work, but I don't know if I'm near to the limit. Is there any possibility to measure how much stack is really used?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(6)
用它来计算为当前线程堆栈提交的内存量:
我没有的另一个想法。
Use this to compute the amount of memory committed for the current thread's stack:
Another idea I don't have.
为了完整起见,我添加了 中提供的
ComfilledStackSize
函数的版本opc0de 的答案用于确定使用的堆栈量,该答案适用于 x86 32 位和 64 位版本的 Windows(opc0de 的功能仅适用于 Win32)。opc0de 的函数从 Window 的线程信息块 (TIB) 查询堆栈基址和最低提交堆栈基址的地址)。 x86 和 x64 之间有两个区别:
FS
段寄存器指向,但在 Win64 上由GS
指向(请参见 此处)另外请注意,BASM 代码中存在细微差别,因为在 x64 上,需要
abs
以使汇编器使用相对于段的绝对偏移量登记。因此,同时适用于 Win32 和 Win64 版本的版本如下所示:
For the sake of completeness, I am adding a version of the
CommittedStackSize
function provided in opc0de's answer for determining the amount of used stack that will work both for x86 32- and 64-bit versions of Windows (opc0de's function is for Win32 only).opc0de's function queries the address of the base of the stack and the lowest committed stack base from Window's Thread Information Block (TIB). There are two differences among x86 and x64:
FS
segment register on Win32, but by theGS
on Win64 (see here)Additionally note that there is a small difference in the BASM code, because on x64,
abs
is required to make the assembler use an absolute offset from the the segment register.Therefore, a version that will work on both Win32 and Win64 version looks like this:
我记得几年前我在 init 时用零填充了所有可用的堆栈空间,并在 deinit 时从末尾开始计算连续的零。如果您让您的应用程序完成探测运行的步伐,这会产生一个良好的“高水位线”。
当我回来不能移动时,我会挖出代码。
更新:好的,这个(古老的)代码演示了原理:(
来自 http://webtweakers .com/swag/MEMORY/0018.PAS.html)
我依稀记得当时和 Kim Kokkonen 一起工作过,我认为原始代码来自他。
这种方法的好处是性能损失为零,并且在程序运行期间没有分析操作。仅在关闭时,循环直到找到更改值的代码才会占用 CPU 周期。 (我们稍后在汇编中编写了该代码。)
I remember i FillChar'd all available stack space with zeroes upon init years ago, and counted the contiguous zeroes upon deinit, starting from the end. This yielded a good 'high water mark', provided you send your app through its paces for probe runs.
I'll dig out the code when i am back nonmobile.
Update: OK the principle is demonstrated in this (ancient) code:
(From http://webtweakers.com/swag/MEMORY/0018.PAS.html)
I faintly remember having worked with Kim Kokkonen at that time, and I think the original code is from him.
The good thing about this approach is you have zero performance penalty and no profiling operation during the program run. Only upon shutdown the loop-until-changed-value-found code eats up CPU cycles. (We coded that one in assembly later.)
即使所有 8 个线程都接近使用其 1MB 堆栈,那也只是 8MB 虚拟内存。 IIRC,线程的默认初始堆栈大小为 64K,除非达到进程线程堆栈限制,否则会在页面错误时增加,此时我假设您的进程将因“堆栈溢出”消息框而停止:((
我担心减少进程堆栈限制 $MAXSTACKSIZE 不会太大地缓解您的碎片/分页问题(如果有的话),以便您的大型照片应用程序的驻留页面集更大并且减少
多少 。总体而言,您的进程中是否存在线程?任务管理器可以显示这一点
。
马丁
Even if all 8 threads were to come close to using their 1MB of stack, that's only 8MB of virtual memory. IIRC, the default initial stack size for threads is 64K, increasing upon page-faults unless the process thread-stack limit is reached, at which point I assume your process will be stopped with a 'Stack overflow' messageBox :((
I fear that reducing the process stack limit $MAXSTACKSIZE will not alleviate your fragmentation/paging issue much, if anything. You need more RAM so that the resident page set of your mega-photo-app is bigger & so thrashing reduced.
How many threads are there, overall, on average, in your process? Task manager can show this.
Rgds,
Martin
虽然我确信您可以减少应用程序中的线程堆栈大小,但我认为这并不能解决问题的根本原因。您现在使用的是 8 核机器,但是在 16 核或 32 核等上会发生什么。
使用 32 位 Delphi,您的最大地址空间为 4GB,因此这确实在某种程度上限制了您。您可能需要为部分或全部线程使用较小的堆栈,但在足够大的机器上仍然会遇到问题。
如果您帮助您的应用程序更好地扩展到更大的计算机,您可能需要采取以下步骤之一:
Whilst I am sure that you can reduce the thread stacksize in your app, I don't think it will address the root cause of the problem. You are using an 8 core machine now, but what happens on a 16 core, or a 32 core etc.
With 32 bit Delphi you have a maximum address space of 4GB and so this does limit you to some degree. You may well need to use smaller stacks for some or all of your threads, but you will still face problems on a big enough machine.
If you help your app scale better to larger machines you may need to take one or other of the following steps:
减少 $MAXSTACKSIZE 不起作用,因为 Windows 总是将线程堆栈对齐到 1Mb(?)。
防止碎片的一种(可能?)方法是在创建线程之前保留(而不是分配!)虚拟内存(使用 VirtualAlloc)。并在线程运行后释放它。这样 Windows 就无法使用线程的保留空间,因此您将拥有一些连续的内存。
或者您可以为大照片制作自己的内存管理器:保留大量虚拟内存并手动从该池中分配内存。 (需要自己维护一个已用和已用内存的列表)。
至少,这是一个理论,不知道是否真的有效......
Reducing $MAXSTACKSIZE won't work because Windows will always align thread stack to 1Mb (?).
One (possible?) way to prevent fragmentation is to reserve (not alloc!) virtual memory (with VirtualAlloc) before creating threads. And release it after the threads are running. This way Windows cannot use the reserved space for the threads so you will have some continuous memory.
Or you could make your own memory manager for large photo's: reserve a lot virtual memory and alloc memory from this pool by hand. (you need to maintain a list of used and used memory yourself).
At least, that's a theory, don't know if it really works...