std::string 及其自动内存调整大小

发布于 2024-09-15 23:16:46 字数 302 浏览 1 评论 0原文

我对 C++ 还很陌生,但我知道你不能像 std::string 类似乎让你那样随意使用内存。例如:

std::string f = "asdf";
f += "fdsa";

字符串类如何处理变大和变小?我假设它分配默认数量的内存,如果需要更多内存,它会新建一个更大的内存块并将其自身复制到该内存块上。但是每次需要调整大小时都必须复制整个字符串,这不是效率很低吗?我真的想不出另一种方法可以做到(但显然有人这样做了)。

就此而言,所有 stdlib 类(如向量、队列、堆栈等)如何如此透明地处理增长和收缩?

I'm pretty new to C++, but I know you can't just use memory willy nilly like the std::string class seems to let you do. For instance:

std::string f = "asdf";
f += "fdsa";

How does the string class handle getting larger and smaller? I assume it allocates a default amount of memory and if it needs more, it news a larger block of memory and copies itself over to that. But wouldn't that be pretty inefficient to have to copy the whole string every time it needed to resize? I can't really think of another way it could be done (but obviously somebody did).

And for that matter, how do all the stdlib classes like vector, queue, stack, etc handle growing and shrinking so transparently?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

凝望流年 2024-09-22 23:16:46

您的分析是正确的 - 每次需要调整大小时都复制字符串效率很低。这就是为什么常见的建议不鼓励这种使用模式。使用字符串的 reserve 函数来询问它为您想要存储的内容分配足够的内存。然后进一步的操作将填充该内存。 (但是如果你的提示太小,字符串仍然会自动增长。)

容器通常还会尝试通过分配比所需更多的内存来减轻频繁重新分配的影响。一种常见的算法是,当字符串发现空间不足时,它会将其缓冲区大小加倍,而不是仅分配保存新值所需的最小值。如果字符串一次增长一个字符,则这种加倍算法会将时间复杂度降低到摊销线性时间(而不是二次时间)。它还降低了程序对内存碎片的敏感性。

Your analysis is correct — it is inefficient to copy the string every time it needs to resize. That's why common advice discourages that use pattern. Use the string's reserve function to ask it to allocate enough memory for what you intend to store in it. Then further operations will fill that memory. (But if your hint turns out to be too small, the string will still grow automatically, too.)

Containers will also usually try to mitigate the effects of frequent re-allocation by allocating more memory than they need. A common algorithm is that when a string finds that it's out of space, it doubles its buffer size instead of just allocating the minimum required to hold the new value. If the string is being grown one character at a time, this doubling algorithm reduces the time complexity to amortized linear time (instead of quadratic time). It also reduces the program's susceptibility to memory fragmentation.

寒冷纷飞旳雪 2024-09-22 23:16:46

通常,有一个加倍算法。换句话说,当它填满当前缓冲区时,它会分配一个两倍大的新缓冲区,然后复制当前数据。与通过单个分配块增长的替代方案相比,这会导致更少的分配/复制操作。

Usually, there's a doubling algorithm. In other words, when it fills the current buffer, it allocates a new buffer that's twice as big, and then copies the current data over. This results in fewer allocate/copy operations than the alternative of growing by a single allocation block.

陈独秀 2024-09-22 23:16:46

虽然我不知道 std::string 的确切实现,但大多数需要处理动态内存增长的数据结构都是通过完全按照您所说的操作来实现的 - 分配默认内存量,如果需要更多内存,则创建更大的块并复制自己。

解决明显的低效率问题的方法是分配比您需要的更多的内存。向量/字符串/列表等的已用内存与总内存之比通常称为负载因子(也用于哈希表,但含义略有不同)。通常是 1:2 的比例 - 也就是说,您分配所需内存的两倍。当空间不足时,您可以分配当前内存量两倍的新内存量并使用它。这意味着随着时间的推移,如果您继续向向量/字符串/等添加内容,则需要复制的项目越来越少(因为内存创建是指数级的,并且新项目的插入当然是线性的),因此这种内存处理方法所花费的时间并不像您想象的那么长。根据摊销分析的原则,您可以看到插入m 使用此方法将项目放入向量/字符串/列表中只是 m 的 Big-Theta,而不是 m2

Although I do not know the exact implementation of std::string, most data structures that need to handle dynamic memory growth do so by doing exactly what you say - allocate a default amount of memory, and if more is needed then create a bigger block and copy yourself over.

The way you get around the obvious inefficiency problem is to allocate more memory than you need. The ratio of used memory:total memory of a vector/string/list/etc is often called the load factor (also used for hash tables in a slightly different meaning). Usually it's a 1:2 ratio - that is, you assign twice the memory you need. When you run out of space, you assign a new amount of memory twice your current amount and use that. This means that over time, if you continue to add things to the vector/string/etc, you need to copy over the item less and less (as the memory creation is exponential, and your inserting of new items is of course linear), and so the time taken for this method of memory handling is not as large as you might think. By the principles of Amortized Analysis, you can then see that inserting m items into a vector/string/list using this method is only Big-Theta of m, not m2.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文