Windows内存分配问题

发布于 2024-09-09 05:12:31 字数 1331 浏览 7 评论 0原文

我目前正在研究 Windows 下的 malloc() 实现。但在我的研究中,我偶然发现了一些让我困惑的事情:

首先,我知道在 API 级别,Windows 主要使用 HeapAlloc()VirtualAlloc() 调用分配内存。我从此处了解到malloc()的 Microsoft 实现 (包含在 CRT - C 运行时)基本上为块 > HeapAlloc() 调用 HeapAlloc() 480 字节,并以其他方式管理使用 VirtualAlloc() 分配的特殊区域以进行小分配,以防止碎片。

嗯,这一切都很好。但还有 malloc() 的其他实现,例如 nedmalloc< /a>,声称比 Microsoft 的 malloc 快 125%。

所有这些让我想知道一些事情:

  1. 为什么我们不能只为小块调用 HeapAlloc() ?在碎片方面表现是否不佳(例如通过“首次适应”而不是“最佳适应”)?

    • 实际上,有什么方法可以了解各种 API 分配调用的幕后情况吗?这会很有帮助。
  2. 是什么让 nedmalloc 比 Microsoft 的 malloc 快得多?

  3. 从上面的内容来看,我的印象是 HeapAlloc()/VirtualAlloc() 非常慢,而 malloc() 则快得多 code> 偶尔调用它们一次,然后自行管理分配的内存。这个假设是真的吗?或者只是因为碎片而需要 malloc() “包装器”? 人们会认为像这样的系统调用会很快 - 或者至少会考虑一些想法以提高它们的效率。

    • 如果这是真的,为什么会这样?
  4. 平均而言,典型的 malloc 调用执行多少(一个数量级)内存读/写(可能是已分配段数量的函数)?我直觉上会说,对于普通程序来说,它是数十次,对吗?

I am currently looking into malloc() implementation under Windows. But in my research I have stumbled upon things that puzzled me:

First, I know that at the API level, windows uses mostly the HeapAlloc() and VirtualAlloc() calls to allocate memory. I gather from here that the Microsoft implementation of malloc() (that which is included in the CRT - the C runtime) basically calls HeapAlloc() for blocks > 480 bytes and otherwise manage a special area allocated with VirtualAlloc() for small allocations, in order to prevent fragmentation.

Well that is all good and well. But then there are other implementation of malloc(), for instance nedmalloc, which claim to be up to 125% faster than Microsoft's malloc.

All this makes me wonder a few things:

  1. Why can't we just call HeapAlloc() for small blocks? Does is perform poorly in regard to fragmentation (for example by doing "first-fit" instead of "best-fit")?

    • Actually, is there any way to know what is going under the hood of the various API allocation calls? That would be quite helpful.
  2. What makes nedmalloc so much faster than Microsoft's malloc?

  3. From the above, I got the impression that HeapAlloc()/VirtualAlloc() are so slow that it is much faster for malloc() to call them only once in a while and then to manage the allocated memory itself. Is that assumption true? Or is the malloc() "wrapper" just needed because of fragmentation? One would think that system calls like this would be quick - or at least that some thoughts would have been put into them to make them efficient.

    • If it is true, why is it so?
  4. On average, how many (an order of magnitude) memory reads/write are performed by a typical malloc call (probably a function of the number of already allocated segments)? I would intuitively says it's in the tens for an average program, am I right?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

穿透光 2024-09-16 05:12:31
  1. 调用 HeapAlloc 听起来并不跨平台。 MS 可以根据需要自由更改其实现;建议远离。 :)
  2. 它可能更有效地使用内存池,就像 Loki 库使用其“小对象分配器”
  3. 堆分配一样,本质上是通用目的,但通过任何实现总是很慢。分配器越“专业”,速度就越快。这让我们回到第二点,它处理内存池(以及特定于您的应用程序的分配大小)。
  4. 不知道。
  1. Calling HeapAlloc doesn't sound cross-platform. MS is free to change their implementation when they wish; advise to stay away. :)
  2. It is probably using memory pools more effectively, much like the Loki library does with its "small object allocator"
  3. Heap allocations, which are general purpose by nature, are always slow via any implementation. The more "specialized" the allocator, the faster it will be. This returns us to point #2, which deals with memory pools (and the allocation sizes used that are specific to your application).
  4. Don't know.
千纸鹤 2024-09-16 05:12:31

从上面的内容中,我得到的印象是 HeapAlloc()/VirtualAlloc() 非常慢,因此 malloc() 偶尔调用它们一次然后自行管理分配的内存要快得多。这个假设正确吗?

操作系统级系统调用是为了管理进程的整个内存空间而设计和优化的。使用它们为整数分配 4 个字节确实不是最理想的 - 通过管理库代码中的微小分配并让操作系统针对更大的分配进行优化,您可以获得更好的整体性能和内存使用情况。至少据我了解。

From the above, I got the impression that HeapAlloc()/VirtualAlloc() are so slow that it is much faster for malloc() to call them only once in a while and then to manage the allocated memory itself. Is that assumption true?

The OS-level system calls are designed and optimized for managing the entire memory space of processes. Using them to allocate 4 bytes for an integer is indeed suboptimal - you get overall better performance and memory usage by managing tiny allocations in library code, and letting the OS optimize for larger allocations. At least as far as I understand it.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文