C/C++ 的多线程内存分配器
我目前有大量的多线程服务器应用程序,并且我正在寻找一个好的多线程内存分配器。
到目前为止,我在以下两者之间左右为难:
- Sun 的 umem
- 谷歌的 tcmalloc
- 英特尔的线程构建块分配器
- Emery Berger 的 hoard
从我发现的 hoard 可能是最快的,但今天之前我没有听说过它,所以我怀疑它是否真的和看起来一样好。 有人有尝试这些分配器的个人经验吗?
I currently have heavily multi-threaded server application, and I'm shopping around for a good multi-threaded memory allocator.
So far I'm torn between:
- Sun's umem
- Google's tcmalloc
- Intel's threading building blocks allocator
- Emery Berger's hoard
From what I've found hoard might be the fastest, but I hadn't heard of it before today, so I'm skeptical if its really as good as it seems. Anyone have personal experience trying out these allocators?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(8)
真正判断哪种内存分配器适合您的应用程序的唯一方法是尝试一些。 提到的所有分配器都是由聪明人编写的,并且将在一个特定的微基准测试上击败其他分配器。 如果您的应用程序一整天所做的就是在线程 A 中 malloc 一个 8 字节块并在线程 B 中释放它,并且根本不需要处理其他任何事情,那么您可能可以编写一个内存分配器来击败任何一个到目前为止列出的那些。 它对于其他方面来说不会很有用。 :)
我在我工作的地方有一些使用 Hoard 的经验(足够多,因此在最近的 3.8 版本中解决了一个更隐蔽的错误,因为这种经验而被发现)。 这是一个非常好的分配器 - 但对您来说有多好取决于您的工作量。 而且你确实必须支付 Hoard 费用(尽管它不是太贵)才能在商业项目中使用它而不需要对你的代码进行 GPL 许可。
稍作修改的 ptmalloc2 已经成为 glibc 的 malloc 背后的分配器已有相当长一段时间了,因此它得到了令人难以置信的广泛使用和测试。 如果稳定性比一切都重要,那么它可能是一个不错的选择,但你没有在列表中提到它,所以我假设它已经被淘汰了。 对于某些工作负载来说,这是很糟糕的 - 但对于任何通用 malloc 来说也是如此。
如果您愿意为此付费(根据我的经验,价格合理),SmartHeap SMP也是一个不错的选择。 提到的大多数其他分配器都被设计为可以 LD_PRELOAD 的直接 malloc/free new/delete 替代品。 SmartHeap 也可以这样使用,但它还包括一个完整的与分配相关的 API,可让您根据自己的喜好微调分配器。 在我们所做的测试中(同样,非常针对特定应用程序),当作为直接 malloc 替代品时,SmartHeap 的性能与 Hoard 大致相同; 两者之间的真正区别在于定制程度。 您需要分配器的通用性越低,您就能获得更好的性能。
根据您的用例,通用多线程分配器可能根本不是您想要使用的; 如果你不断地 malloc & 释放大小相同的对象时,您可能只想编写一个简单的slab分配器。 Linux 内核中的多个地方使用了平板分配,符合该描述。 (我会给你一些更有用的链接,但我是一名“新用户”,Stack Overflow 已决定不允许新用户在一个答案中提供太多帮助。Google 可以提供帮助不过,表现得足够好。)
The only way to really tell which memory allocator is right for your application is to try a few out. All of the allocators mentioned were written by smart folks and will beat the others on one particular microbenchmark or another. If all your application does all day long is malloc one 8 byte chunk in thread A and free it in thread B, and doesn't need to handle anything else at all, you could probably write a memory allocator that beats the pants off any of those listed so far. It just won't be very useful for much else. :)
I have some experience using Hoard where I work (enough so that one of the more obscure bugs addressed in the recent 3.8 release was found as a result of that experience). It's a very good allocator - but how good, for you, depends on your workload. And you do have to pay for Hoard (though it's not too expensive) in order to use it in a commercial project without GPL'ing your code.
A very slightly adapted ptmalloc2 has been the allocator behind glibc's malloc for quite a while now, and so it's incredibly widely used and tested. If stability is important above all things, it might be a good choice, but you didn't mention it in your list, so I'll assume it's out. For certain workloads, it's terrible - but the same is true of any general purpose malloc.
If you're willing to pay for it (and the price is reasonable, in my experience), SmartHeap SMP is also a good choice. Most of the other allocators mentioned are designed as drop-in malloc/free new/delete replacements that can be LD_PRELOAD'd. SmartHeap can be used that way as well, but it also includes an entire allocation-related API that lets you fine-tune your allocators to your heart's content. In tests that we've done (again, very specific to a particular application), SmartHeap was about the same as Hoard for performance when acting as a drop-in malloc replacement; the real difference between the two is the degree of customization. You can get better performance the less general-purpose you need your allocator to be.
And depending on your use case, a general-purpose multithreaded allocator might not be what you want to use at all; if you're constantly malloc & free'ing objects that are all the same size, you might want to just write a simple slab allocator. Slab allocation is used in several places in the Linux kernel that fit that description. (I would give you a couple more useful links, but I'm a "new user" and Stack Overflow has decided that new users are not allowed to be too helpful all in one answer. Google can help out well enough, though.)
我使用过 tcmalloc 并阅读了有关 Hoard 的文章。 两者都有相似的实现,并且都实现了相对于线程/CPU 数量的大致线性性能扩展(根据各自站点上的图表)。
所以:如果性能真的那么重要,那么就进行性能/负载测试。 否则,只需掷骰子并选择列出的其中之一(根据目标平台上的易用性进行加权)。
从 trshiv 的链接来看,Hoard、tcmalloc 和 ptmalloc 是速度上都大致相当。 总的来说,tt 看起来 ptmalloc 是为了占用尽可能少的空间而优化的,Hoard 是为了速度 + 内存使用的权衡而优化的,而 tcmalloc 是为了纯粹的速度而优化的。
I've used tcmalloc and read about Hoard. Both have similar implementations and both achieve roughly linear performance scaling with respect to the number of threads/CPUs (according to the graphs on their respective sites).
So: if performance is really that incredibly crucial, then do performance/load testing. Otherwise, just roll a dice and pick one of the listed (weighted by ease of use on your target platform).
And from trshiv's link, it looks like Hoard, tcmalloc, and ptmalloc are all roughly comparable for speed. Overall, tt looks like ptmalloc is optimized for taking as little room as possible, Hoard is optimized for a trade-off of speed + memory usage, and tcmalloc is optimized for pure speed.
我个人更喜欢并推荐 ptmalloc 作为多线程分配器。 Hoard 不错,但是在我的团队几年前对 Hoard 和 ptmalloc 进行的评估中,ptmalloc 更好。 据我所知,ptmalloc 已经存在很多年了,并且作为多线程分配器被广泛使用。
您可能会发现此比较很有用。
I personally prefer and recommend ptmalloc as a multithreaded allocator. Hoard is good, but in the evaluation my team did between Hoard and ptmalloc a few years ago, ptmalloc was better. From what I know, ptmalloc has been around for a number of years and is quite widely used as a multithreaded allocator.
You might find this comparison useful.
也许这是处理您所问问题的错误方法,但也许可以完全采用不同的策略。 如果您正在寻找一个非常快速的内存分配器,也许您应该问为什么您需要花费所有时间来分配内存,而您也许可以摆脱变量的堆栈分配。 堆栈分配虽然更烦人,但如果做得正确,可以避免大量互斥争用,并避免代码中出现奇怪的内存损坏问题。 此外,碎片可能会减少,这可能会有所帮助。
Maybe this is the wrong way to approach what you are asking, but maybe a different tactic could be employed altogether. If you are looking for a really fast memory allocator maybe you should ask why you need to be spending all that time allocating memory when you could perhaps just get away with stack allocation of variables. Stack allocation, while way more annoying, done right could save you lots in the way of mutex contention, as well as keeping strange memory corruption issues out of your code. Also, you potentially have less fragmentation which could help.
我们在几年前工作的一个项目中使用了hoard。 看起来效果很好。 我对其他分配者没有经验。 尝试不同的方法并进行负载测试应该很容易,不是吗?
We used hoard on a project where I worked a few years ago. It seemed to work great. I have no experience iwth the other allocators. It should be pretty easy to try different ones and do load testing, no?
您可以尝试 ltalloc (具有快速池分配器速度的通用全局内存分配器)。
You can try ltalloc (general purpose global memory allocator with speed of fast pool allocator).
可能对您的问题的答复很晚,但是
如果您的性能出现问题,为什么还要进行 malloc 呢?
更好的方法是在初始化时对大内存窗口进行 malloc,然后提出一个
轻量级内存管理器
,它将在运行时释放内存块
。这可以避免堆扩展时发生系统调用的任何可能性。
Probably a late response to your question , but
why to do mallocs if you have performance hick ups ?
Better way would be to do a malloc of a big memory window at the initialization and then come up with a
light weight Memory manager
that wouldlease out the memory chunks at run time
.This avoids any possibility of system calls if your heap expansion.
locklessinc 分配器非常好,如果您有疑问,开发人员会及时回复。 他写了一篇关于所使用的一些优化技巧的文章,读起来很有趣: http://locklessinc.com /articles/allocator_tricks/。 我过去曾使用过它,效果非常好。
The locklessinc allocator is very good and the developer is responsive if you have questions. There's an article he wrote about some of the optimization tricks used, it's an interesting read: http://locklessinc.com/articles/allocator_tricks/. I've used it in the past with excellent results.