如何避免堆碎片?

发布于 2024-07-07 11:27:16 字数 233 浏览 6 评论 0原文

我目前正在进行一个医学图像处理项目,需要大量内存。 我可以采取什么措施来避免堆碎片并加快对已加载到内存中的图像数据的访问速度?

该应用程序是用 C++ 编写的,在 Windows XP 上运行。

编辑:应用程序对图像数据进行一些预处理,例如重新格式化、计算查找表、提取感兴趣的子图像...应用程序在处理过程中需要大约 2 GB RAM,其中大约1.5 GB 可用于图像数据。

I'm currently working on a project for medical image processing, that needs a huge amount of memory. Is there anything I can do to avoid heap fragmentation and to speed up access of image data that has already been loaded into memory?

The application has been written in C++ and runs on Windows XP.

EDIT: The application does some preprocessing with the image data, like reformatting, calculating look-up-tables, extracting sub images of interest ... The application needs about 2 GB RAM during processing, of which about 1,5 GB may be used for the image data.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(9

悟红尘 2024-07-14 11:27:17

答案是有的,但在不了解问题细节的情况下很难概括。

我假设是 32 位 Windows XP。

尽量避免需要 100MB 的连续内存,如果你运气不好,一些随机的 dll 会通过你的可用地址空间在不方便的地方加载自己,从而迅速减少非常大的连续内存区域。 根据您需要的 API,这种情况可能很难预防。 令人惊讶的是,除了一些“正常”内存使用之外,仅分配几个 400MB 内存块就会让您无处可分配最终的“小”40MB 块。

另一方面,一次预分配合理大小的块。 10MB 左右的量级是一个很好的折衷块大小。 如果您能够设法将数据分区为这种大小的块,您将能够相当有效地填充地址空间。

如果您仍然会耗尽地址空间,则需要能够基于某种缓存算法将块调入和调出。 选择正确的块进行分页将在很大程度上取决于您的处理算法,并且需要仔细分析。

选择将内容分页到何处是另一个决定。 您可能决定将它们写入临时文件。 您还可以研究 Microsoft 的地址窗口扩展 API。 无论哪种情况,您都需要在应用程序设计中小心清理所有指向即将被调出的内容的指针,否则将会发生非常糟糕的事情。

祝你好运!

There are answers, but it's difficult to be general without knowing the details of the problem.

I'm assuming 32-bit Windows XP.

Try to avoid needing 100s of MB of contiguous memory, if you are unlucky, a few random dlls will load themselves at inconventient points through your available address space rapidly cutting down very large areas of contiguous memory. Depending on what APIs you need, this can be quite hard to prevent. It can be quite surprising how just allocating a couple of 400MB blocks of memory in addition to some 'normal' memory usage can leave you with nowhere to allocate a final 'little' 40MB block.

On the other hand, do preallocate reasonable size chunks at a time. Of the order of 10MB or so is a good compromise block size. If you can manage to partition your data into this sort of size chunks, you'll be able to fill the address space reasonably efficiently.

If you're still going to run out of address space, you're going to need to be able to page blocks in and out based on some sort of caching algorithm. Choosing the right blocks to page out is going to depend very much on your processing algortihm and will need careful analysis.

Choosing where to page things out to is another decision. You might decide to just write them to temporary files. You could also investigate Microsoft's Address Windowing Extenstions API. In either case you need to be careful in your application design to clean up any pointers that are pointing to something that is about to be paged out otherwise really bad things(tm) will happen.

Good Luck!

最初的梦 2024-07-14 11:27:17

如果您要对大型图像矩阵执行操作,您可能需要考虑一种称为“平铺”的技术。 这个想法通常是将图像加载到内存中,以便同一连续的字节块不会包含一行中的像素,而是包含 2D 空间中的一个正方形。 其背后的基本原理是,您将在 2D 中而不是在一条扫描线上执行更多彼此更接近的操作。

这不会减少内存使用,但可能会对页面交换和性能产生巨大影响。

If you are going to be performing operations on a large image matrix, you might want to consider a technique called "tiling". The idea is generally to load the image in memory so that the same contiguous block of bytes would not contain pixels in one line, but rather of a square in 2D space. The rationale behind this is that you would do more operations that are closer to each other in 2D rather than on one scan line.

This is not going to reduce your memory use, but may have a huge impact on page swapping and performance.

锦欢 2024-07-14 11:27:17

如果没有关于问题的更多信息(例如语言),您可以做的一件事是通过重用分配来避免分配搅动,而不是分配、操作和释放。 诸如 dlmalloc 之类的分配器比 Win32 堆更好地处理碎片。

Without much more information about the problem (for example language), one thing you can do is to avoid allocation churn by reusing allocations and not allocate, operate and free. Allocator such as dlmalloc handles fragmentation better than Win32 heaps.

七秒鱼° 2024-07-14 11:27:17

这里您将遇到的是虚拟地址范围限制,对于 32b Windows,该范围最多为您提供 2 GB。 您还应该意识到,使用 DirectX 或 OpenGL 等图形 API 会将这 2 GB 的大部分用于帧缓冲区、纹理和类似数据。

对于 32b 应用程序来说 1.5-2 GB 是很难实现的。 最优雅的方法是使用 64b 操作系统和 64b 应用程序。 即使使用 64b 操作系统和 32b 应用程序,只要您使用 LARGE_ADDRESS_AWARE,这也可能有些可行。

但是,由于您需要存储图像数据,您也可以使用 文件映射为内存存储 - 这可以通过这样一种方式来完成:您可以提交和访问内存,但根本不使用任何虚拟地址。

What you will be hitting here is virtual address range limit, which with 32b Windows gives you at most 2 GB. You should be also aware that using a graphical API like DirectX or OpenGL will use extensive portions of those 2 GB for frame buffer, textures and similar data.

1.5-2 GB for a 32b application is quite hard to achieve. The most elegant way to do this is to use 64b OS and 64b application. Even with 64b OS and 32b application this may be somewhat viable, as long as you use LARGE_ADDRESS_AWARE.

However, as you need to store image data, you may also be able to work around this by using File Mapping as a memory store - this can be done in such a way that you have a memory committed and accessible, but not using any virtual addresses at all.

可是我不能没有你 2024-07-14 11:27:17

猜测这里您的意思是避免碎片,而不是避免碎片整理。 还猜测您正在使用非托管语言(可能是 c 或 C++)。 我建议您分配大块内存,然后从分配的内存块中提供堆分配。 该内存池由于包含大内存块,因此不易产生碎片。 总而言之,您应该实现一个自定义内存分配器。

请参阅此处的一些一般想法。

Guessing here that you meant avoid fragmentation and not avoid defragmentation. Also guessing that you are working with a non managed language (c or C++ probably). I would suggest that you allocate large chunks of memory and then serve heap allocations from the allocated memory blocks. This pool of memory because contains large blocks of memory is lessely prone to fragmentation. To sum up you should implement a custom memory allocator.

See some general ideas on this here.

只是一片海 2024-07-14 11:27:17

我猜你正在使用非托管的东西,因为在托管平台中,系统(垃圾收集器)负责处理碎片。

对于 C/C++,您可以使用默认分配器以外的其他分配器。 (stackowerflow 上已经有一些关于分配器的线程)。

此外,您还可以创建自己的数据存储。 例如,在我当前正在进行的项目中,我们有一个用于位图的自定义存储(池)(我们将它们存储在一大块连续的内存中),因为我们有很多位图,并且我们跟踪堆当碎片太大时,对其进行碎片整理。

I gues you're using something unmanaged, because in managed platforms the system (garbage collector) takes care of fragmentation.

For C/C++ you can use some other allocator, than the default one. (there were alrady some threads about allocators on stackowerflow).

Also, you can create your own data storage. For example, in the project I'm currently working on, we have a custom storage (pool) for bitmaps (we store them in a large contigous hunk of memory), because we have a lot of them, and we keep track of heap fragmentation and defragment it when the fragmentation is to big.

叶落知秋 2024-07-14 11:27:17

您可能需要实施手动内存管理。 图像数据寿命长吗? 如果没有,那么您可以使用 apache Web 服务器使用的模式:分配大量内存并将它们包装到内存池中。 将这些池作为函数中的最后一个参数传递,以便它们可以使用池来满足分配临时内存的需要。 一旦调用链完成,池中的所有内存都应该不再使用,因此您可以清理内存区域并再次使用它。 分配速度很快,因为它们只意味着向指针添加一个值。 释放速度非常快,因为您将立即释放非常大的内存块。

如果您的应用程序是多线程的,您可能需要将池存储在线程本地存储中,以避免跨线程通信开销。

You might need to implement manual memory management. Is the image data long lived? If not, then you can use the pattern used by apache web server: allocate large amounts of memory and wrap them into memory pools. Pass those pools as the last argument in functions, so they can use the pool to satisfy the need to allocate temporary memory. Once the call chain is finished, all the memory in the pool can should be no longer used, so you can scrub the memory area and used it again. Allocations are fast, since they only mean adding a value to a pointer. Deallocation is really fast, since you will free very large blocks of memory at once.

If your application is multithreaded, you might need to store the pool in thread local storage, to avoid cross-thread communication overhead.

轮廓§ 2024-07-14 11:27:17

如果您可以准确地隔离那些可能分配大块的位置,则可以(在 Windows 上)直接调用 VirtualAlloc 而无需通过内存管理器。 这将避免普通内存管理器内的碎片。

这是一个简单的解决方案,不需要您使用自定义内存管理器。

If you can isolate exactly those places where you're likely to allocate large blocks, you can (on Windows) directly call VirtualAlloc instead of going through the memory manager. This will avoid fragmentation within the normal memory manager.

This is an easy solution and it doesn't require you to use a custom memory manager.

美胚控场 2024-07-14 11:27:16

如果您正在进行医学图像处理,则可能会一次分配大块(512x512,每像素图像 2 字节)。 如果您在图像缓冲区的分配之间分配较小的对象,则会出现碎片。

对于这个特定的用例来说,编写自定义分配器并不一定很困难。 您可以对 Image 对象使用标准 C++ 分配器,但对于像素缓冲区,您可以使用全部在 Image 对象内管理的自定义分配。 这是一个快速而肮脏的概述:

  • 使用结构的静态数组,每个结构都有:
    • 可以容纳 N 个图像的可靠内存块 - 分块将有助于控制碎片 - 尝试将初始 N 设为 5 左右
    • 一个并行的布尔数组,指示相应的图像是否正在使用
  • 要分配,请搜索空缓冲区的数组并设置其标志
    • 如果没有找到,则将新结构附加到数组末尾
  • 要取消分配,请在数组中找到相应的缓冲区并清除布尔标志。

这只是一个简单的想法,有很大的变化空间。 主要技巧是避免释放和重新分配图像像素缓冲区。

If you are doing medical image processing it is likely that you are allocating big blocks at a time (512x512, 2-byte per pixel images). Fragmentation will bite you if you allocate smaller objects between the allocations of image buffers.

Writing a custom allocator is not necessarily hard for this particular use-case. You can use the standard C++ allocator for your Image object, but for the pixel buffer you can use custom allocation that is all managed within your Image object. Here's a quick and dirty outline:

  • Use a static array of structs, each struct has:
    • A solid chunk of memory that can hold N images -- the chunking will help control fragmentation -- try an initial N of 5 or so
    • A parallel array of bools indicating whether the corresponding image is in use
  • To allocate, search the array for an empty buffer and set its flag
    • If none found, append a new struct to the end of the array
  • To deallocate, find the corresponding buffer in the array(s) and clear the boolean flag

This is just one simple idea with lots of room for variation. The main trick is to avoid freeing and reallocating the image pixel buffers.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文