矢量的小字符串优化?

发布于 2024-08-20 13:34:08 字数 321 浏览 3 评论 0原文

我知道几个(全部?)STL 实现实现了“小字符串”优化,其中字符串不是存储通常的 3 个指针(用于开始、结束和容量),而是将实际字符数据存储在用于指针的内存中,如果 sizeof(characters) <= sizeof(指针)。我所处的情况是,我有很多元素大小 <= sizeof(pointer) 的小向量。我不能使用固定大小的数组,因为向量需要能够动态调整大小并且可能会变得很大。然而,向量的中值(不是平均)大小仅为 4-12 字节。因此,适合向量的“小字符串”优化对我来说非常有用。这样的事存在吗?

我正在考虑通过简单地将向量转换为字符串来实现自己的功能,即为字符串提供向量接口。好主意吗?

I know several (all?) STL implementations implement a "small string" optimization where instead of storing the usual 3 pointers for begin, end and capacity a string will store the actual character data in the memory used for the pointers if sizeof(characters) <= sizeof(pointers). I am in a situation where I have lots of small vectors with an element size <= sizeof(pointer). I cannot use fixed size arrays, since the vectors need to be able to resize dynamically and may potentially grow quite large. However, the median (not mean) size of the vectors will only be 4-12 bytes. So a "small string" optimization adapted to vectors would be quite useful to me. Does such a thing exist?

I'm thinking about rolling my own by simply brute force converting a vector to a string, i.e. providing a vector interface to a string. Good idea?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(4

转瞬即逝 2024-08-27 13:34:08

Boost 1.58 刚刚发布,它的 Container 库有一个 small_vector 基于 LLVM SmallVector 的类。

还有一个static_vector,它的增长不能超过最初给定的大小。两个容器都只有标头。

facebook 的 folly 库也有一些很棒的容器。

它有一个 small_vector ,可以配置为一个模板参数,其作用类似于 boost 的 staticsmall 向量。它还可以配置为使用小整数类型进行内部大小簿记,考虑到它们是 facebook,这并不奇怪:)

目前正在进行使库跨平台的工作,因此 Windows/MSVC 支持有一天应该会落地......

Boost 1.58 was just released and it's Container library has a small_vector class based on the LLVM SmallVector.

There is also a static_vector which cannot grow beyond the initially given size. Both containers are header-only.

facebook's folly library also has some awesome containers.

It has a small_vector which can be configured with a template parameter to act like boost's static or small vectors. It can also be configured to use small integer types for it's internal size bookkeeping which given that they are facebook is no surprise :)

There is work in progress to make the library cross platform so Windows/MSVC support should land some day...

欢烬 2024-08-27 13:34:08

您可以借用 LLVM 的 SmallVector 实现。 (仅标头,位于 LLVM\include\llvm\ADT 中)

You can borrow the SmallVector implementation from LLVM. (header only, located in LLVM\include\llvm\ADT)

不…忘初心 2024-08-27 13:34:08

这是几年前讨论过的(该线程中的一些名称可能看起来有点熟悉:-) ),但我不知道现有的实现。我不认为我会尝试使 std::string 适应任务。对于 std::basic_string 类型的确切要求没有明确说明,但标准非常明确,它仅适用于行为类似于 char 的内容。对于本质上不同的类型,它可能仍然有效,但很难说会发生什么——它从来没有被设计用于,并且可能还没有用除小整数之外的许多类型进行测试。

完全一致的 std::vector 实现需要大量工作。但是从头开始实现一个可用的 std::vector 子集(甚至包括一个小的向量优化)通常不会非常困难。如果您包含一个小的矢量优化,我有理由确定您无法满足 std::vector 的所有要求。

特别是,交换或移动向量对象中已存储实际数据的向量意味着您需要交换/移动实际数据项,其中对 std::vector 的要求取决于它只存储指向数据的指针,因此通常1只需操作指针即可交换或移动内容,而根本不需要实际接触数据项本身。因此,即使操作数据项本身会/将要抛出,也需要能够在不抛出的情况下执行这些操作。因此,小的矢量优化将无法满足这些要求。

另一方面,如上所述,对 std::string 的要求之一是它只能存储可以操作而不抛出的项目。因此,如果 std::string 是一个可行的选择,那么实现您自己的类似矢量的容器可能也不需要太担心这些细节。


  1. 在一种情况下,即使在实际的 std::vector 中,您最终也必须交换/移动实际数据项:如果两个向量使用不同的分配器,那么您必须为对象分配空间通过该向量的分配器在目的地中。

It was discussed years ago (and a few of the names in that thread may look a bit familiar :-) ), but I don't know of an existing implementation. I don't think I'd try to adapt std::string to the task. The exact requirements on the type over which std::basic_string aren't well stated, but the standard is pretty clear that it's only intended for something that acts a lot like char. For types that are substantially different, it might still work, but it's hard to say what would happen -- it was never intended for, and probably hasn't been tested with many types other than small integers.

A fully conforming implementation of std::vector is a lot of work. But implementing a usable subset of std::vector from scratch (even including a small vector optimization) won't usually be terribly difficult. If you include a small vector optimization, I'm reasonably certain you can't meet all the requirements on std::vector though.

In particular, swapping or moving a vector where you've stored actual data in the vector object means you'll need to swap/move actual data items, where the requirements on std::vector are predicated on its storing only a pointer to the data, so it can normally1 swap or move the contents just by manipulating the pointers, without actually touching the data items themselves at all. As such, it's required to be able to do these things without throwing, even if manipulating the data items themselves would/will throw. As such, a small vector optimization will preclude meeting those requirements.

On the other hand, as noted above, one of the requirements on std::string is that it can only store items that can be manipulated without throwing. As such, if std::string is a viable option at all, implementing your own vector-like container probably won't need to worry about those details a lot either.


  1. There is one case where you end up having to swap/move actual data items, even in an actual std::vector: if the two vectors use different allocators, then you have to allocate space for the objects in the destination via that vector's allocator.
泼猴你往哪里跑 2024-08-27 13:34:08

如果 T 是 POD 类型,为什么不使用 basic_string 而不是向量?

If T is a POD type why not basic_string instead of vector??

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文