了解 std::string 的效率
我正在尝试更多地了解 C++ 字符串。
考虑一下
const char* cstring = "hello";
std::string string(cstring);
,
std::string string("hello");
我是否正确假设两者都将“hello”存储在应用程序的 .data 部分中,然后将字节复制到堆上由 std::string 管理的指针可以访问它们的另一个区域?
我怎样才能有效地存储一个非常长的字符串?我正在考虑一个从套接字流读取数据的应用程序。我害怕串联很多次。我可以想象使用链接列表并遍历这个列表。
琴弦已经吓坏我太久了!
任何链接、提示、解释、更多细节都会非常有帮助。
I'm trying to learn a little bit more about c++ strings.
consider
const char* cstring = "hello";
std::string string(cstring);
and
std::string string("hello");
Am I correct in assuming that both store "hello" in the .data section of an application and the bytes are then copied to another area on the heap where the pointer managed by the std::string can access them?
How could I efficiently store a really really long string? I'm kind of thinking about an application that reads in data from a socket stream. I fear concatenating many times. I could imagine using a linked list and traverse this list.
Strings have intimidated me for far too long!
Any links, tips, explanations, further details, would be extremely helpful.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(4)
我已经将字符串存储在 10 或 100 MB 范围内,没有任何问题。当然,它将主要受到可用(连续)内存/地址空间的限制。
如果您要追加/连接,有一些事情可能有助于提高效率:如果可能,尝试使用reserve()成员函数来预分配空间——即使您粗略地知道如何分配空间。最终大小可能很大,随着字符串的增长,它可以避免不必要的重新分配。
此外,许多字符串实现使用“指数增长”,这意味着它们以一定的百分比增长,而不是固定的字节大小。例如,每当需要额外空间时,它可能只是将容量加倍。通过以指数方式增加大小,执行大量串联变得更加有效。 (确切的细节取决于您的 stl 版本。)
最后,另一个选项(如果您的库支持)是使用 rope<> 模板:绳索与字符串类似,只是它们是对非常大的字符串执行操作时效率更高。特别是,“绳索以小块的形式分配,显着减少了大块引入的内存碎片问题”。有关SGI 的 STL 指南的一些其他详细信息。
I have stored strings in the 10's or 100's of MB range without issue. Naturally, it will be primarily limited by your available (contiguous) memory / address space.
If you are going to be appending / concatenating, there are a few things that may help efficiency-wise: If possible, try to use the reserve() member function to pre-allocate space-- even if you have a rough idea of how big the final size might be, it would save from unnecessary re-allocations as the string grows.
Additionally, many string implementations use "exponential growth", meaning that they grow by some percentage, rather than fixed byte size. For example, it might simply double the capacity any time additional space is needed. By increasing size exponentially, it becomes more efficient to perform lots of concatenations. (The exact details will depend on your version of stl.)
Finally, another option (if your library supports it) is to use rope<> template: Ropes are similar to strings, except that they are much more efficient when performing operations on very large strings. In particular, "ropes are allocated in small chunks, significantly reducing memory fragmentation problems introduced by large blocks". Some additional details on SGI's STL guide.
由于您正在从套接字读取字符串,因此您可以重用相同的数据包缓冲区并将它们链接在一起以表示巨大的字符串。这将避免任何不必要的复制,并且可能是最有效的解决方案。我似乎记得 ACE 库提供了这样的机制。我会尽力找到它。
编辑: ACE 具有 ACE_Message_Block,允许您以链接列表的方式存储大型消息。您几乎需要阅读 C++ 网络编程书籍才能理解这个庞大的库。 ACE 网站上的免费教程真的很糟糕。
我敢打赌 Boost.Asio 一定能够做到与 ACE 的消息块相同。 Boost.Asio 现在似乎比 ACE 拥有更大的关注度,因此我建议首先在 Boost.Asio 中寻找解决方案。如果有人可以向我们介绍 Boost.Asio 解决方案,那就太好了!
是时候尝试使用 Boost.Asio 编写一个简单的客户端-服务器应用程序,看看有什么大惊小怪的了。
Since you're reading the string from a socket, you can reuse the same packet buffers and chain them together to represent the huge string. This will avoid any needless copying and is probably the most efficient solution possible. I seem to remember that the ACE library provides such a mechanism. I'll try to find it.
EDIT: ACE has ACE_Message_Block that allows you to store large messages in a linked-list fashion. You almost need to read the C++ Network Programming books to make sense of this colossal library. The free tutorials on the ACE website really suck.
I bet Boost.Asio must be capable of doing the same thing as ACE's message blocks. Boost.Asio now seems to have a larger mindshare than ACE, so I suggest looking for a solution within Boost.Asio first. If anyone can enlighten us about a Boost.Asio solution, that would be great!
It's about time I try writing a simple client-server app using Boost.Asio to see what all the fuss is about.
我认为效率不应该成为问题。两者都会表现得足够好。
这里的决定因素是封装。
std::string
是一个比char *
更好的抽象。封装指针运算是一件好事。很多人想了很久才想出
std::string
。我认为出于毫无根据的效率原因而不使用它是愚蠢的。坚持更好的抽象和封装。I don't think efficiency should be the issue. Both will perform well enough.
The deciding factor here is encapsulation.
std::string
is a far better abstraction thanchar *
could ever be. Encapsulating pointer arithmetic is a good thing.A lot of people thought long and hard to come up with
std::string
. I think failing to use it for unfounded efficiency reasons is foolish. Stick to the better abstraction and encapsulation.您可能知道, an
std::string
实际上只是另一个basic_string的名称。
也就是说,它们是一个序列容器,内存将按顺序分配。如果您尝试使 std::string 大于可分配的可用连续内存,则可能会从 std::string 中获得异常。由于内存碎片,该阈值通常远小于总可用内存。
例如,在尝试为图像分配大型连续 3D 缓冲区时,我发现分配连续内存时出现问题。但这些问题至少在 100MB 左右的情况下不会开始发生,至少根据我的经验,在 Windows XP Pro 上(例如)。
您的字符串有这么大吗?
As you probably know, an
std::string
is really just another name forbasic_string<char>
.That said, they are a sequence container and memory will be allocated sequentially. It's possible to get an exceptions from an std::string if you try to make one bigger than the available contiguous memory that you can allocate. This threshold is typically considerably less than the total available memory due to memory fragmentation.
I've seen problems allocating contiguous memory when trying to allocate, for instance, large contiguous 3D buffers for images. But these issues don't start happening at least on the order of 100MB or so, at least in my experience, on Windows XP Pro (for instance.)
Are your strings this big?