什么是安全的最大堆栈大小或如何衡量堆栈的使用?

发布于 2024-11-10 05:02:39 字数 259 浏览 4 评论 0原文

我有一个具有多个工作线程的应用程序,每个核心一个。在现代 8 核机器上,我有 8 个这样的线程。我的应用程序加载了许多插件,它们也有自己的工作线程。由于该应用程序使用巨大的内存块(照片,例如 200 MB),因此我遇到了内存碎片问题(32 位应用程序)。问题是每个线程分配 {$MAXSTACKSIZE ...} 的地址空间。它不使用物理内存,而是使用地址空间。 我将 MAXSTACKSIZE 从 1 MB 减少到 128 KB,它似乎有效,但我不知道是否已接近极限。是否有可能测量实际使用了多少堆栈?

I have an app with a number of worker threads, one for each core. On a modern 8 core machine, I have 8 of these threads. My app loads many plugins, which also have their own worker threads. Because the app uses huge blocks of memory (photos, e.g. 200 MB) I have a memory fragmentation problem (32 bit app). The problem is that every thread allocates the {$MAXSTACKSIZE ...} of address space. It's not using the physical memory but the address space.
I reduced the MAXSTACKSIZE from 1 MB to 128 KB, and it seems to work, but I don't know if I'm near to the limit. Is there any possibility to measure how much stack is really used?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(6

空心空情空意 2024-11-17 05:02:39

用它来计算为当前线程堆栈提交的内存量:

function CommittedStackSize: Cardinal;
asm
  mov eax,[fs:$4] // base of the stack, from the Thread Environment Block (TEB)
  mov edx,[fs:$8] // address of lowest committed stack page
                  // this gets lower as you use more stack
  sub eax,edx
end;

我没有的另一个想法。

Use this to compute the amount of memory committed for the current thread's stack:

function CommittedStackSize: Cardinal;
asm
  mov eax,[fs:$4] // base of the stack, from the Thread Environment Block (TEB)
  mov edx,[fs:$8] // address of lowest committed stack page
                  // this gets lower as you use more stack
  sub eax,edx
end;

Another idea I don't have.

雾里花 2024-11-17 05:02:39

为了完整起见,我添加了 中提供的 ComfilledStackSize 函数的版本opc0de 的答案用于确定使用的堆栈量,该答案适用于 x86 32 位和 64 位版本的 Windows(opc0de 的功能仅适用于 Win32)。

opc0de 的函数从 Window 的线程信息块 (TIB) 查询堆栈基址和最低提交堆栈基址的地址)。 x86 和 x64 之间有两个区别:

  • TIB 在 Win32 上由 FS 段寄存器指向,但在 Win64 上由 GS 指向(请参见 此处)
  • 结构中项目的绝对偏移量不同(主要是因为有些项目是指针,即 4在 Win32/64 上分别为 8 字节和 8 字节)

另外请注意,BASM 代码中存在细微差别,因为在 x64 上,需要 abs 以使汇编器使用相对于段的绝对偏移量登记。

因此,同时适用于 Win32 和 Win64 版本的版本如下所示:

{$IFDEF MSWINDOWS}
function CommittedStackSize: NativeUInt;
//NB: Win32 uses FS, Win64 uses GS as base for Thread Information Block.
asm
 {$IFDEF WIN32}
  mov eax, [fs:04h] // TIB: base of the stack
  mov edx, [fs:08h] // TIB: lowest committed stack page
  sub eax, edx      // compute difference in EAX (=Result)
 {$ENDIF}
 {$IFDEF WIN64}
  mov rax, abs [gs:08h] // TIB: base of the stack
  mov rdx, abs [gs:10h] // TIB: lowest committed stack page
  sub rax, rdx          // compute difference in RAX (=Result)
 {$ENDIF}
{$ENDIF}
end;

For the sake of completeness, I am adding a version of the CommittedStackSize function provided in opc0de's answer for determining the amount of used stack that will work both for x86 32- and 64-bit versions of Windows (opc0de's function is for Win32 only).

opc0de's function queries the address of the base of the stack and the lowest committed stack base from Window's Thread Information Block (TIB). There are two differences among x86 and x64:

  • TIB is pointed to by the FS segment register on Win32, but by the GS on Win64 (see here)
  • The absolute offsets of items in the structure differ (mostly because some items are pointers, i.e. 4 bytes and 8 bytes on Win32/64, respectively)

Additionally note that there is a small difference in the BASM code, because on x64, abs is required to make the assembler use an absolute offset from the the segment register.

Therefore, a version that will work on both Win32 and Win64 version looks like this:

{$IFDEF MSWINDOWS}
function CommittedStackSize: NativeUInt;
//NB: Win32 uses FS, Win64 uses GS as base for Thread Information Block.
asm
 {$IFDEF WIN32}
  mov eax, [fs:04h] // TIB: base of the stack
  mov edx, [fs:08h] // TIB: lowest committed stack page
  sub eax, edx      // compute difference in EAX (=Result)
 {$ENDIF}
 {$IFDEF WIN64}
  mov rax, abs [gs:08h] // TIB: base of the stack
  mov rdx, abs [gs:10h] // TIB: lowest committed stack page
  sub rax, rdx          // compute difference in RAX (=Result)
 {$ENDIF}
{$ENDIF}
end;
如梦亦如幻 2024-11-17 05:02:39

我记得几年前我在 init 时用零填充了所有可用的堆栈空间,并在 deinit 时从末尾开始计算连续的零。如果您让您的应用程序完成探测运行的步伐,这会产生一个良好的“高水位线”。

当我回来不能移动时,我会挖出代码。

更新:好的,这个(古老的)代码演示了原理:(

{***********************************************************
  StackUse - A unit to report stack usage information

  by Richard S. Sadowsky
  version 1.0 7/18/88
  released to the public domain

  Inspired by a idea by Kim Kokkonen.

  This unit, when used in a Turbo Pascal 4.0 program, will
  automatically report information about stack usage.  This is very
  useful during program development.  The following information is
  reported about the stack:

  total stack space
  Unused stack space
  Stack spaced used by your program

  The unit's initialization code handles three things, it figures out
  the total stack space, it initializes the unused stack space to a
  known value, and it sets up an ExitProc to automatically report the
  stack usage at termination.  The total stack space is calculated by
  adding 4 to the current stack pointer on entry into the unit.  This
  works because on entry into a unit the only thing on the stack is the
  2 word (4 bytes) far return value.  This is obviously version and
  compiler specific.

  The ExitProc StackReport handles the math of calculating the used and
  unused amount of stack space, and displays this information.  Note
  that the original ExitProc (Sav_ExitProc) is restored immediately on
  entry to StackReport.  This is a good idea in ExitProc in case a
  runtime (or I/O) error occurs in your ExitProc!

  I hope you find this unit as useful as I have!

************************************************************)

{$R-,S-} { we don't need no stinkin range or stack checking! }
unit StackUse;

interface

var
  Sav_ExitProc     : Pointer; { to save the previous ExitProc }
  StartSPtr        : Word;    { holds the total stack size    }

implementation

{$F+} { this is an ExitProc so it must be compiled as far }
procedure StackReport;

{ This procedure may take a second or two to execute, especially }
{ if you have a large stack. The time is spent examining the     }
{ stack looking for our init value ($AA). }

var
  I                : Word;

begin
  ExitProc := Sav_ExitProc; { restore original exitProc first }

  I := 0;
  { step through stack from bottom looking for $AA, stop when found }
  while I < SPtr do
    if Mem[SSeg:I] <> $AA then begin
      { found $AA so report the stack usage info }
      WriteLn('total stack space : ',StartSPtr);
      WriteLn('unused stack space: ', I);
      WriteLn('stack space used  : ',StartSPtr - I);
      I := SPtr; { end the loop }
    end
    else
      inc(I); { look in next byte }
end;
{$F-}


begin
  StartSPtr := SPtr + 4; { on entry into a unit, only the FAR return }
                         { address has been pushed on the stack.     }
                         { therefore adding 4 to SP gives us the     }
                         { total stack size. }
  FillChar(Mem[SSeg:0], SPtr - 20, $AA); { init the stack   }
  Sav_ExitProc := ExitProc;              { save exitproc    }
  ExitProc     := @StackReport;          { set our exitproc }
end.

来自 http://webtweakers .com/swag/MEMORY/0018.PAS.html

我依稀记得当时和 Kim Kokkonen 一起工作过,我认为原始代码来自他。

这种方法的好处是性能损失为零,并且在程序运行期间没有分析操作。仅在关闭时,循环直到找到更改值的代码才会占用 CPU 周期。 (我们稍后在汇编中编写了该代码。)

I remember i FillChar'd all available stack space with zeroes upon init years ago, and counted the contiguous zeroes upon deinit, starting from the end. This yielded a good 'high water mark', provided you send your app through its paces for probe runs.

I'll dig out the code when i am back nonmobile.

Update: OK the principle is demonstrated in this (ancient) code:

{***********************************************************
  StackUse - A unit to report stack usage information

  by Richard S. Sadowsky
  version 1.0 7/18/88
  released to the public domain

  Inspired by a idea by Kim Kokkonen.

  This unit, when used in a Turbo Pascal 4.0 program, will
  automatically report information about stack usage.  This is very
  useful during program development.  The following information is
  reported about the stack:

  total stack space
  Unused stack space
  Stack spaced used by your program

  The unit's initialization code handles three things, it figures out
  the total stack space, it initializes the unused stack space to a
  known value, and it sets up an ExitProc to automatically report the
  stack usage at termination.  The total stack space is calculated by
  adding 4 to the current stack pointer on entry into the unit.  This
  works because on entry into a unit the only thing on the stack is the
  2 word (4 bytes) far return value.  This is obviously version and
  compiler specific.

  The ExitProc StackReport handles the math of calculating the used and
  unused amount of stack space, and displays this information.  Note
  that the original ExitProc (Sav_ExitProc) is restored immediately on
  entry to StackReport.  This is a good idea in ExitProc in case a
  runtime (or I/O) error occurs in your ExitProc!

  I hope you find this unit as useful as I have!

************************************************************)

{$R-,S-} { we don't need no stinkin range or stack checking! }
unit StackUse;

interface

var
  Sav_ExitProc     : Pointer; { to save the previous ExitProc }
  StartSPtr        : Word;    { holds the total stack size    }

implementation

{$F+} { this is an ExitProc so it must be compiled as far }
procedure StackReport;

{ This procedure may take a second or two to execute, especially }
{ if you have a large stack. The time is spent examining the     }
{ stack looking for our init value ($AA). }

var
  I                : Word;

begin
  ExitProc := Sav_ExitProc; { restore original exitProc first }

  I := 0;
  { step through stack from bottom looking for $AA, stop when found }
  while I < SPtr do
    if Mem[SSeg:I] <> $AA then begin
      { found $AA so report the stack usage info }
      WriteLn('total stack space : ',StartSPtr);
      WriteLn('unused stack space: ', I);
      WriteLn('stack space used  : ',StartSPtr - I);
      I := SPtr; { end the loop }
    end
    else
      inc(I); { look in next byte }
end;
{$F-}


begin
  StartSPtr := SPtr + 4; { on entry into a unit, only the FAR return }
                         { address has been pushed on the stack.     }
                         { therefore adding 4 to SP gives us the     }
                         { total stack size. }
  FillChar(Mem[SSeg:0], SPtr - 20, $AA); { init the stack   }
  Sav_ExitProc := ExitProc;              { save exitproc    }
  ExitProc     := @StackReport;          { set our exitproc }
end.

(From http://webtweakers.com/swag/MEMORY/0018.PAS.html)

I faintly remember having worked with Kim Kokkonen at that time, and I think the original code is from him.

The good thing about this approach is you have zero performance penalty and no profiling operation during the program run. Only upon shutdown the loop-until-changed-value-found code eats up CPU cycles. (We coded that one in assembly later.)

深居我梦 2024-11-17 05:02:39

即使所有 8 个线程都接近使用其 1MB 堆栈,那也只是 8MB 虚拟内存。 IIRC,线程的默认初始堆栈大小为 64K,除非达到进程线程堆栈限制,否则会在页面错误时增加,此时我假设您的进程将因“堆栈溢出”消息框而停止:((

我担心减少进程堆栈限制 $MAXSTACKSIZE 不会太大地缓解您的碎片/分页问题(如果有的话),以便您的大型照片应用程序的驻留页面集更大并且减少

多少 。总体而言,您的进程中是否存在线程?任务管理器可以显示这一点


马丁

Even if all 8 threads were to come close to using their 1MB of stack, that's only 8MB of virtual memory. IIRC, the default initial stack size for threads is 64K, increasing upon page-faults unless the process thread-stack limit is reached, at which point I assume your process will be stopped with a 'Stack overflow' messageBox :((

I fear that reducing the process stack limit $MAXSTACKSIZE will not alleviate your fragmentation/paging issue much, if anything. You need more RAM so that the resident page set of your mega-photo-app is bigger & so thrashing reduced.

How many threads are there, overall, on average, in your process? Task manager can show this.

Rgds,
Martin

落在眉间の轻吻 2024-11-17 05:02:39

虽然我确信您可以减少应用程序中的线程堆栈大小,但我认为这并不能解决问题的根本原因。您现在使用的是 8 核机器,但是在 16 核或 32 核等上会发生什么。

使用 32 位 Delphi,您的最大地址空间为 4GB,因此这确实在某种程度上限制了您。您可能需要为部分或全部线程使用较小的堆栈,但在足够大的机器上仍然会遇到问题。

如果您帮助您的应用程序更好地扩展到更大的计算机,您可能需要采取以下步骤之一:

  1. 避免创建明显多于内核的线程。使用插件可用的线程池架构。如果没有 .net 环境的帮助,您将只能针对 Windows 线程池 API 进行最佳编码。也就是说,必须有一个好的 Delphi 包装器可用。
  2. 处理内存分配模式。如果您的线程正在分配 200MB 区域内的连续块,那么这将对您的分配器造成过度的压力。我发现通常最好将如此大量的内存分配在较小的固定大小的块中。此方法可以解决您遇到的碎片问题。

Whilst I am sure that you can reduce the thread stacksize in your app, I don't think it will address the root cause of the problem. You are using an 8 core machine now, but what happens on a 16 core, or a 32 core etc.

With 32 bit Delphi you have a maximum address space of 4GB and so this does limit you to some degree. You may well need to use smaller stacks for some or all of your threads, but you will still face problems on a big enough machine.

If you help your app scale better to larger machines you may need to take one or other of the following steps:

  1. Avoid creating significantly more threads than cores. Use a thread pool architecture that is available to your plug-ins. Without the benefit of the .net environment to make this easy you will be best coding against the Windows thread pool API. That said, there must be a good Delphi wrapper available.
  2. Deal with the memory allocation patterns. If your threads are allocating contiguous blocks in the region of 200MB then this is going to cause undue stress on your allocator. I have found that it is often best to allocate such large amounts of memory in smaller, fixed size blocks. This approach works around the fragmentation problems you are encountering.
静水深流 2024-11-17 05:02:39

减少 $MAXSTACKSIZE 不起作用,因为 Windows 总是将线程堆栈对齐到 1Mb(?)。

防止碎片的一种(可能?)方法是在创建线程之前保留(而不是分配!)虚拟内存(使用 VirtualAlloc)。并在线程运行后释放它。这样 Windows 就无法使用线程的保留空间,因此您将拥有一些连续的内存。

或者您可以为大照片制作自己的内存管理器:保留大量虚拟内存并手动从该池中分配内存。 (需要自己维护一个已用和已用内存的列表)。

至少,这是一个理论,不知道是否真的有效......

Reducing $MAXSTACKSIZE won't work because Windows will always align thread stack to 1Mb (?).

One (possible?) way to prevent fragmentation is to reserve (not alloc!) virtual memory (with VirtualAlloc) before creating threads. And release it after the threads are running. This way Windows cannot use the reserved space for the threads so you will have some continuous memory.

Or you could make your own memory manager for large photo's: reserve a lot virtual memory and alloc memory from this pool by hand. (you need to maintain a list of used and used memory yourself).

At least, that's a theory, don't know if it really works...

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文