当前位置：文江博客话题详情

memory-management c++ std memory-alignment allocator

令人信服的自定义 C++ 示例分配器？

发布于 2024-07-19 17:23:34 字数 180 浏览 15 评论 0原文

有哪些真正充分的理由放弃 std::allocator 而转而使用自定义解决方案？您是否遇到过对于正确性、性能、可扩展性等绝对必要的情况？有什么真正聪明的例子吗？

自定义分配器一直是标准库的一个功能，但我不太需要。我只是想知道这里是否有人可以提供一些令人信服的例子来证明他们的存在。

收藏 0

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

评论（18）

小姐丶请自重 2024-07-26 17:23:35

正如我此处提到的，我发现英特尔 TBB 的自定义 STL 分配器显着改进只需将单个更改

std::vector<T>

为

std::vector<T,tbb::scalable_allocator<T> >

即可提高多线程应用程序的性能（这是切换分配器以使用 TBB 漂亮的线程私有堆的快速便捷的方法；请参阅本文档第 59 页）

As I mention here, I've seen Intel TBB's custom STL allocator significantly improve performance of a multithreaded app simply by changing a single

std::vector<T>

to

std::vector<T,tbb::scalable_allocator<T> >

(this is a quick and convenient way of switching the allocator to use TBB's nifty thread-private heaps; see page 59 in this document)

回复收藏 0 原文

请你别敷衍 2024-07-26 17:23:35

自定义分配器可以发挥作用的一个领域是游戏开发，尤其是在游戏机上，因为它们只有少量内存并且没有交换。在此类系统上，您需要确保对每个子系统都有严格的控制，以便非关键系统无法窃取关键系统的内存。其他诸如池分配器之类的东西可以帮助减少内存碎片。您可以在以下位置找到有关该主题的长篇详细论文：

EASTL——艺电标准模板库

回复收藏 0 原文

听不够的曲调 2024-07-26 17:23:35

我正在开发一个 mmap 分配器，它允许向量使用来自
内存映射文件。目标是让向量使用的存储
直接在mmap映射的虚拟内存中。我们的问题是
改进将大文件（>10GB）读取到内存中的能力，无需复制
开销，因此我需要这个自定义分配器。

到目前为止我已经有了自定义分配器的框架
（源自 std::allocator），我认为这是一个很好的开始
指向编写自己的分配器。随意使用这段代码
无论您想要什么方式：

#include <memory>
#include <stdio.h>

namespace mmap_allocator_namespace
{
        // See StackOverflow replies to this answer for important commentary about inheriting from std::allocator before replicating this code.
        template <typename T>
        class mmap_allocator: public std::allocator<T>
        {
public:
                typedef size_t size_type;
                typedef T* pointer;
                typedef const T* const_pointer;

                template<typename _Tp1>
                struct rebind
                {
                        typedef mmap_allocator<_Tp1> other;
                };

                pointer allocate(size_type n, const void *hint=0)
                {
                        fprintf(stderr, "Alloc %d bytes.\n", n*sizeof(T));
                        return std::allocator<T>::allocate(n, hint);
                }

                void deallocate(pointer p, size_type n)
                {
                        fprintf(stderr, "Dealloc %d bytes (%p).\n", n*sizeof(T), p);
                        return std::allocator<T>::deallocate(p, n);
                }

                mmap_allocator() throw(): std::allocator<T>() { fprintf(stderr, "Hello allocator!\n"); }
                mmap_allocator(const mmap_allocator &a) throw(): std::allocator<T>(a) { }
                template <class U>                    
                mmap_allocator(const mmap_allocator<U> &a) throw(): std::allocator<T>(a) { }
                ~mmap_allocator() throw() { }
        };
}

要使用它，请按如下方式声明一个 STL 容器：

using namespace std;
using namespace mmap_allocator_namespace;

vector<int, mmap_allocator<int> > int_vec(1024, 0, mmap_allocator<int>());

例如，它可以用于在分配内存时进行记录。什么是必要的
是重新绑定结构，否则向量容器使用超类分配/释放
方法。

更新：内存映射分配器现已在 https://github.com/johannesthoma/mmap_allocator 上提供，并且LGPL。请随意将其用于您的项目。

I am working on a mmap-allocator that allows vectors to use memory from
a memory-mapped file. The goal is to have vectors that use storage that
are directly in the virtual memory mapped by mmap. Our problem is to
improve reading of really large files (>10GB) into memory with no copy
overhead, therefore I need this custom allocator.

So far I have the skeleton of a custom allocator
(which derives from std::allocator), I think it is a good starting
point to write own allocators. Feel free to use this piece of code
in whatever way you want:

#include <memory>
#include <stdio.h>

namespace mmap_allocator_namespace
{
        // See StackOverflow replies to this answer for important commentary about inheriting from std::allocator before replicating this code.
        template <typename T>
        class mmap_allocator: public std::allocator<T>
        {
public:
                typedef size_t size_type;
                typedef T* pointer;
                typedef const T* const_pointer;

                template<typename _Tp1>
                struct rebind
                {
                        typedef mmap_allocator<_Tp1> other;
                };

                pointer allocate(size_type n, const void *hint=0)
                {
                        fprintf(stderr, "Alloc %d bytes.\n", n*sizeof(T));
                        return std::allocator<T>::allocate(n, hint);
                }

                void deallocate(pointer p, size_type n)
                {
                        fprintf(stderr, "Dealloc %d bytes (%p).\n", n*sizeof(T), p);
                        return std::allocator<T>::deallocate(p, n);
                }

                mmap_allocator() throw(): std::allocator<T>() { fprintf(stderr, "Hello allocator!\n"); }
                mmap_allocator(const mmap_allocator &a) throw(): std::allocator<T>(a) { }
                template <class U>                    
                mmap_allocator(const mmap_allocator<U> &a) throw(): std::allocator<T>(a) { }
                ~mmap_allocator() throw() { }
        };
}

To use this, declare an STL container as follows:

using namespace std;
using namespace mmap_allocator_namespace;

vector<int, mmap_allocator<int> > int_vec(1024, 0, mmap_allocator<int>());

It can be used for example to log whenever memory is allocated. What is neccessary
is the rebind struct, else the vector container uses the superclasses allocate/deallocate
methods.

Update: The memory mapping allocator is now available at https://github.com/johannesthoma/mmap_allocator and is LGPL. Feel free to use it for your projects.

回复收藏 0 原文

世界和平 2024-07-26 17:23:35

使用 GPU 或其他协处理器时，有时以特殊方式在主内存中分配数据结构是有益的。这种分配内存的特殊方式可以在自定义分配器中以方便的方式实现。

使用加速器时，通过加速器运行时进行自定义分配会很有用，原因如下：

通过自定义分配，加速器运行时或驱动程序会收到内存块的通知
，此外操作系统还可以确保分配的内存块是页式的。操作系统的虚拟内存子系统可能无法在内存内或从内存中移动或删除页面
锁定（有些人称之为固定内存），也就是说，如果 1. 和 2. 保持并且之间存在数据传输，请求页面锁定内存块和加速器，运行时可以直接访问主内存中的数据，因为它知道数据在哪里，并且可以确定操作系统没有移动/删除它，
这可以节省可能发生的一次内存复制使用以非页面锁定方式分配的内存：必须将主内存中的数据复制到页面锁定暂存区域，加速器可以初始化数据传输（通过 DMA）

回复收藏 0 原文

⒈起吃苦の倖褔 2024-07-26 17:23:35

我正在使用一个 MySQL 存储引擎，其代码使用 C++。我们使用自定义分配器来使用 MySQL 内存系统，而不是与 MySQL 竞争内存。它允许我们确保我们使用的内存是用户配置 MySQL 使用的内存，而不是“额外的”。

回复收藏 0 原文

最后的乘客 2024-07-26 17:23:35

使用自定义分配器来使用内存池而不是堆可能很有用。这只是众多例子之一。

对于大多数情况来说，这无疑是一种不成熟的优化。但它在某些情况下（嵌入式设备、游戏等）非常有用。

回复收藏 0 原文

波浪屿的海角声 2024-07-26 17:23:35

我还没有使用自定义 STL 分配器编写 C++ 代码，但我可以想象用 C++ 编写的 Web 服务器，它使用自定义分配器自动删除响应 HTTP 请求所需的临时数据。生成响应后，自定义分配器可以立即释放所有临时数据。

自定义分配器（我已经使用过）的另一个可能的用例是编写单元测试来证明函数的行为不依赖于其输入的某些部分。自定义分配器可以用任何模式填充内存区域。

回复收藏 0 原文

撧情箌佬 2024-07-26 17:23:35

我在这里使用自定义分配器；您甚至可能会说它是为了解决其他自定义动态内存管理问题。

背景：我们有 malloc、calloc、free 的重载，以及运算符 new 和 delete 的各种变体，链接器很乐意让 STL 为我们使用这些。这让我们可以执行诸如自动小对象池、泄漏检测、分配填充、自由填充、使用哨兵填充分配、某些分配的缓存行对齐以及延迟释放等操作。

问题是，我们在嵌入式环境中运行——没有足够的内存来实际长时间正确地进行泄漏检测计算。至少，标准 RAM 中没有——通过自定义分配函数，在其他地方还有另一堆 RAM 可用。

解决方案：编写一个使用扩展堆的自定义分配器，并仅在内存泄漏跟踪架构的内部使用它......其他一切都默认为执行泄漏跟踪的正常 new/delete 重载。这避免了跟踪器跟踪本身（并且还提供了一些额外的打包功能，我们知道跟踪器节点的大小）。

出于同样的原因，我们还使用它来保存函数成本分析数据；为每个函数调用和返回以及线程切换编写一个条目可能会很快变得昂贵。自定义分配器再次为我们提供了更大的调试内存区域中更小的分配。

回复收藏 0 原文

老娘不死你永远是小三 2024-07-26 17:23:35

自定义分配器是在释放内存之前安全擦除内存的合理方法。

template <class T>
class allocator
{
public:
    using value_type    = T;

    allocator() noexcept {}
    template <class U> allocator(allocator<U> const&) noexcept {}

    value_type*  // Use pointer if pointer is not a value_type*
    allocate(std::size_t n)
    {
        return static_cast<value_type*>(::operator new (n*sizeof(value_type)));
    }

    void
    deallocate(value_type* p, std::size_t) noexcept  // Use pointer if pointer is not a value_type*
    {
        OPENSSL_cleanse(p, n);
        ::operator delete(p);
    }
};
template <class T, class U>
bool
operator==(allocator<T> const&, allocator<U> const&) noexcept
{
    return true;
}
template <class T, class U>
bool
operator!=(allocator<T> const& x, allocator<U> const& y) noexcept
{
    return !(x == y);
}

推荐使用 Hinnant 的分配器样板：
https://howardhinnant.github.io/allocator_boilerplate.html)

A custom allocator is a reasonable way to securely erase memory before it is deallocated.

template <class T>
class allocator
{
public:
    using value_type    = T;

    allocator() noexcept {}
    template <class U> allocator(allocator<U> const&) noexcept {}

    value_type*  // Use pointer if pointer is not a value_type*
    allocate(std::size_t n)
    {
        return static_cast<value_type*>(::operator new (n*sizeof(value_type)));
    }

    void
    deallocate(value_type* p, std::size_t) noexcept  // Use pointer if pointer is not a value_type*
    {
        OPENSSL_cleanse(p, n);
        ::operator delete(p);
    }
};
template <class T, class U>
bool
operator==(allocator<T> const&, allocator<U> const&) noexcept
{
    return true;
}
template <class T, class U>
bool
operator!=(allocator<T> const& x, allocator<U> const& y) noexcept
{
    return !(x == y);
}

Recommend using allocator boilerplate by Hinnant:
https://howardhinnant.github.io/allocator_boilerplate.html)

回复收藏 0 原文

踏雪无痕 2024-07-26 17:23:35

我正在使用自定义分配器来计算程序一部分中的分配/解除分配数量并测量所需的时间。还有其他方法可以实现这一点，但这种方法对我来说非常方便。特别有用的是，我可以仅将自定义分配器用于容器的子集。

回复收藏 0 原文

我们的影子 2024-07-26 17:23:35

一种基本情况：在编写必须跨模块（EXE/DLL）边界工作的代码时，必须确保分配和删除仅发生在一个模块中。

我遇到这个问题的地方是 Windows 上的插件架构。例如，如果您跨 DLL 边界传递 std::string，则该字符串的任何重新分配都发生在它源自的堆中，而不是 DLL 中可能不同的堆中，这一点至关重要*。

*实际上比这更复杂，就好像您动态链接到 CRT 一样，这无论如何都可能起作用。但是，如果每个 DLL 都有一个到 CRT 的静态链接，那么您将进入一个痛苦的世界，其中不断发生幻象分配错误。

回复收藏 0 原文

冷情 2024-07-26 17:23:35

Andrei Alexandrescu 在 CppCon 2015 上关于分配器的演讲的强制链接：

https://www.youtube.com/watch? v=LIb3L4vKZ7U

好处是，仅仅设计它们就能让你想到如何使用它们:-)

回复收藏 0 原文

人事已非 2024-07-26 17:23:35

不久前，我发现这个解决方案对我非常有用： Fast C++11 allocator for STL 容器。它在 VS2017 (~5x) 和 GCC (~7x) 上略微加快了 STL 容器的速度。它是一个基于内存池的专用分配器。仅由于您所要求的机制，它才能与 STL 容器一起使用。

回复收藏 0 原文

我的鱼塘能养鲲 2024-07-26 17:23:35

我个人使用 Loki::Allocator / SmallObject 来优化小对象的内存使用 - 如果您必须处理适量的小对象（1 到 256 字节），它会显示出良好的效率和令人满意的性能。如果我们谈论分配适量的许多不同大小的小对象，它的效率可能比标准 C++ 的 new/delete 分配高约 30 倍。此外，还有一个名为“QuickHeap”的 VC 特定解决方案，它带来了最佳性能（分配和释放操作只需读取和写入正在分配/返回到堆的块的地址，分别在高达 99.(9)% 的情况下— 取决于设置和初始化），但代价是显着的开销 — 每个范围需要两个指针，每个新内存块需要一个额外的指针。如果您不需要大量的对象大小（它为每个对象大小（从 1 到 1023 字节）创建一个单独的池），那么它是处理创建和删除的大量 (10 000++) 对象的最快解决方案在当前的实现中，因此初始化成本可能会削弱整体性能的提升，但可以在应用程序进入其性能关键阶段之前继续分配/取消分配一些虚拟对象。

标准 C++ new/delete 实现的问题在于，它通常只是 C malloc/free 分配的包装器，并且它适用于较大的内存块，例如 1024+ 字节。它在性能方面有显着的开销，有时还需要额外的内存用于映射。因此，在大多数情况下，自定义分配器的实现方式可以最大限度地提高性能和/或最大限度地减少分配小（≤1024 字节）对象所需的额外内存量。

回复收藏 0 原文

柳若烟 2024-07-26 17:23:35

对于共享内存来说，至关重要的是，不仅容器头，而且它包含的数据都存储在共享内存中。

Boost::Interprocess 的分配器就是一个很好的例子。但是，正如您可以阅读这里这一切还不足以使所有STL容器共享内存兼容（由于不同进程中的映射偏移量不同，指针可能会“中断”）。

回复收藏 0 原文

酷炫老祖宗 2024-07-26 17:23:35

在图形模拟中，我见过用于

std::allocator 不直接支持的对齐约束的自定义分配器。
通过对短期（仅此帧）和长期分配使用单独的池来最大限度地减少碎片。

回复收藏 0 原文

柒七 2024-07-26 17:23:35

OP 要求提供一个使用自定义分配器的充分理由。各种答案中已经给出了许多很好的个人理由。不过，可以提出一个通用的论点。

任何通用分配器都必须处理各种不同的使用模式或分配趋势，这些模式涵盖了多种可能性。构建一个通用解决方案相当具有挑战性，该解决方案在平均情况下表现良好，同时在某些非常具体的情况下也不会表现极差。使用模式的多样性以及它们之间的极端差异往往会对通用分配器解决方案可以实现的性能产生某种边界限制。

但是，如果我们知道特定的使用趋势强>并且可以确保分配器仅在某些有限且明确定义的情况下使用，有可能产生更好的结果，甚至在禁止使用通用解决方案的情况下使用容器（例如嵌入式和有限系统）。

为了用一个例子来说明这一点（我使用了一个自定义分配方案并取得了巨大成功）：假设......

一系列对象是基于某些发现/评估/转换创建的，并且立即交叉连接。
然后，根据外部情况，在紧密循环中使用这些对象一段时间，
我们知道，在给定时间点之后，所有这些对象都不能再使用。然后，

自定义分配器可以声明一些大块，最好是紧密分配在一起，放置将所有对象放入该区域，将所有清理作为无操作来实现，从而省略任何内部管理基础设施；相反，分配的块将被放弃。

回复收藏 0 原文

风尘浪孓 2024-07-26 17:23:35

我使用这些的一个例子是处理资源非常有限的嵌入式系统。假设您有 2k 空闲内存，并且您的程序必须使用其中的一些内存。您需要将 4-5 个序列存储在不在堆栈上的某个位置，此外您还需要非常精确地访问这些内容的存储位置，在这种情况下您可能需要编写自己的分配器。默认实现可能会产生内存碎片，如果您没有足够的内存并且无法重新启动程序，这可能是不可接受的。

我正在进行的一个项目是在一些低功耗芯片上使用 AVR-GCC。我们必须存储 8 个长度可变但最大值已知的序列。内存管理的标准库实现是一个薄包装器malloc/free 通过在每个分配的内存块前面加上一个指向该分配的内存块末尾的指针来跟踪放置项目的位置。当分配新的内存块时，标准分配器必须遍历每个内存块，以找到适合所请求的内存大小的下一个可用块。在桌面平台上，这对于这几个项目来说非常快，但您必须记住，相比之下，其中一些微控制器非常慢且原始。此外，内存碎片问题是一个大问题，这意味着我们别无选择，只能采取不同的方法。

所以我们所做的就是实现我们自己的内存池。每个内存块都足够大，可以容纳我们需要的最大序列。这会提前分配固定大小的内存块并标记当前正在使用哪些内存块。我们通过保留一个 8 位整数来做到这一点，其中每一位代表是否使用了某个块。我们在这里权衡了内存使用量，试图使整个过程更快，在我们的例子中，这是合理的，因为我们正在推动这个微控制器芯片接近其最大处理能力。

还有很多时候，我可以看到在嵌入式系统的上下文中编写自己的自定义分配器，例如，如果序列的内存不在主内存中，那么这些平台。

One example of I time I have used these was working with very resource constrained embedded systems. Lets say you have 2k of ram free and your program has to use some of that memory. You need to store say 4-5 sequences somewhere that's not on the stack and additionally you need to have very precise access over where these things get stored, this is a situation where you might want to write your own allocator. The default implementations can fragment the memory, this might be unacceptable if you don't have enough memory and cannot restart your program.

One project I was working on was using AVR-GCC on some low powered chips. We had to store 8 sequences of variable length but with a known maximum. The standard library implementation of the memory management is a thin wrapper around malloc/free which keeps track of where to place items with by prepending every allocated block of memory with a pointer to just past the end of that allocated piece of memory. When allocating a new piece of memory the standard allocator has to walk over each of the pieces of memory to find the next block that is available where the requested size of memory will fit. On a desktop platform this would be very fast for this few items but you have to keep in mind that some of these microcontrollers are very slow and primitive in comparison. Additionally the memory fragmentation issue was a massive problem that meant we really had no choice but to take a different approach.

So what we did was to implement our own memory pool. Each block of memory was big enough to fit the largest sequence we would need in it. This allocated fixed sized blocks of memory ahead of time and marked which blocks of memory were currently in use. We did this by keeping one 8 bit integer where each bit represented if a certain block was used. We traded off memory usage here for attempting to make the whole process faster, which in our case was justified as we were pushing this microcontroller chip close to it's maximum processing capacity.

There's a number of other times I can see writing your own custom allocator in the context of embedded systems, for example if the memory for the sequence isn't in main ram as might frequently be the case on these platforms.

回复收藏 0 原文

~没有更多了~

关于作者

怎樣才叫好

暂无简介

文章

评论

26 人气

关注发私信

相关话题

热门标签

操作系统程序设计 IT运维 Linux系统管理 JavaScript 服务器应用 solaris C/C++ PHP Shell BSD Vue.js aix Oracle Python HTML 系统管理 HTML5 CSS 前端

推荐作者

櫻之舞

文章 0 评论 0

弥枳

文章 0 评论 0

m2429

文章 0 评论 0

寻找一个思念的角度

文章 0 评论 0

野却迷人

文章 0 评论 0

我怀念的。

文章 0 评论 0

友情链接

我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的隐私政策了解更多相关信息。单击 接受 或继续使用网站，即表示您同意使用 Cookies 和您的相关数据。

原文