英特尔线程构建模块中的tbb::scalable_allocator
实际上在幕后做什么?
这肯定是有效的。 我刚刚使用它通过更改单个 std::vector 来将应用程序的执行时间缩短 25%(并且在 4 核系统上看到 CPU 利用率从 ~200% 增加到 350%)。 T>
到 std::vector >
。 另一方面,在另一个应用程序中,我看到它使已经很大的内存消耗加倍,并将内容发送到交换城市。
英特尔自己的文档并没有透露太多内容(例如,常见问题解答)。 在我自己深入研究它的代码之前,谁能告诉我它使用了什么技巧?
更新:第一次使用 TBB 3.0,并从可扩展_分配器中看到了我迄今为止最好的加速。 将单个 vector
更改为 vector >
将某些东西的运行时间从 85 秒减少到 35 秒(Debian Lenny、Core2、测试中使用 TBB 3.0)。
What does the tbb::scalable_allocator
in Intel Threading Building Blocks actually do under the hood ?
It can certainly be effective. I've just used it to take 25% off an apps' execution time (and see an increase in CPU utilization from ~200% to 350% on a 4-core system) by changing a single std::vector<T>
to std::vector<T,tbb::scalable_allocator<T> >
. On the other hand in another app I've seen it double an already large memory consumption and send things to swap city.
Intel's own documentation doesn't give a lot away (e.g a short section at the end of this FAQ). Can anyone tell me what tricks it uses before I go and dig into its code myself ?
UPDATE: Just using TBB 3.0 for the first time, and seen my best speedup from scalable_allocator yet. Changing a single vector<int>
to a vector<int,scalable_allocator<int> >
reduced the runtime of something from 85s to 35s (Debian Lenny, Core2, with TBB 3.0 from testing).
发布评论
评论(2)
有一篇关于分配器的好论文: 英特尔线程构建模块中可扩展多核软件的基础
我有限的经验:我使用 tbb::scalable_allocator 为我的 AI 应用程序重载了全局 new/delete 。 但时间概况几乎没有变化。 不过我没有比较内存使用情况。
There is a good paper on the allocator: The Foundations for Scalable Multi-core Software in Intel Threading Building Blocks
My limited experience: I overloaded the global new/delete with the tbb::scalable_allocator for my AI application. But there was little change in the time profile. I didn't compare the memory usage though.
您提到的解决方案针对 Intel CPU 进行了优化。 它结合了特定的 CPU 机制来提高性能。
不久前,我发现了另一个非常有用的解决方案:STL 容器的快速 C++11 分配器。 它在 VS2017 (~5x) 和 GCC (~7x) 上略微加快了 STL 容器的速度。 它使用内存池进行元素分配,这使得它对所有平台都非常有效。
The solution you mentioned is optimized for Intel CPUs. It incorporates specific CPU mechanisms to improve performance.
Sometime ago I found another very useful solution: Fast C++11 allocator for STL containers. It slightly speeds up STL containers on VS2017 (~5x) as well as on GCC (~7x). It uses memory pool for elements allocation which makes it extremely effective for all platofrms.