C++ 中字符串初始化的性能
我对 C++ 中的字符串有以下问题:
1>> 哪个是更好的选择(考虑到性能),为什么?
1.
string a;
a = "hello!";
或
2.2
string *a;
a = new string("hello!");
...
delete(a);
>>
string a;
a = "less";
a = "moreeeeeee";
当较大的字符串复制到较小的字符串时,C++ 中的内存管理究竟是如何处理的? C++ 字符串是可变的吗?
I have following questions regarding strings in C++:
1>> which is a better option(considering performance) and why?
1.
string a;
a = "hello!";
OR
2.
string *a;
a = new string("hello!");
...
delete(a);
2>>
string a;
a = "less";
a = "moreeeeeee";
how exactly memory management is handled in c++ when a bigger string is copied into a smaller string? Are c++ strings mutable?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(9)
几乎没有必要或不需要说
毕竟,您(几乎)永远不会说:
您应该说
or
是的,C++ 字符串是可变的。
It is almost never necessary or desirable to say
After all, you would (almost) never say:
You should instead say
or
And yes, C++ strings are mutable.
以下所有内容都是天真的编译器会做的事情。 当然,只要不改变程序的行为,编译器就可以自由地进行任何优化。
首先,初始化 a 以包含空字符串。 (将长度设置为0,以及其他一两个操作)。 然后分配一个新值,覆盖已设置的长度值。 它还可能必须执行检查以查看当前缓冲区有多大,以及是否应该分配更多内存。
调用 new 需要操作系统和内存分配器找到空闲的内存块。 那很慢。 然后立即初始化它,这样您就不会像在第一个版本中那样两次分配任何内容或需要调整缓冲区大小。
然后发生了一些不好的事情,你忘记调用delete,并且你有内存泄漏,此外还有一个分配速度非常慢的字符串。 所以这很糟糕。
与第一种情况一样,您首先初始化 a 以包含空字符串。 然后分配一个新字符串,然后再分配另一个字符串。 其中每一个都可能需要调用 new 来分配更多内存。 每行还需要长度,并且可能需要分配其他内部变量。
通常,您会这样分配它:
一行,执行一次初始化,而不是首先默认初始化,然后分配您想要的值。
它还可以最大限度地减少错误,因为程序中的任何地方都没有无意义的空字符串。 如果该字符串存在,则它包含您想要的值。
关于内存管理,google RAII。
简而言之,字符串在内部调用 new/delete 来调整其缓冲区的大小。 这意味着您永远不需要使用new分配字符串。 字符串对象具有固定大小,并且被设计为在堆栈上分配,以便在超出范围时自动调用析构函数。 然后,析构函数保证释放所有分配的内存。 这样,您就不必在用户代码中使用 new/delete,这意味着您不会泄漏内存。
All the following is what a naive compiler would do. Of course as long as it doesn't change the behavior of the program, the compiler is free to make any optimization.
First you initialize a to contain the empty string. (set length to 0, and one or two other operations). Then you assign a new value, overwriting the length value that was already set. It may also have to perform a check to see how big the current buffer is, and whether or not more memory should be allocated.
Calling new requires the OS and the memory allocator to find a free chunk of memory. That's slow. Then you initialize it immediately, so you don't assign anything twice or require the buffer to be resized, like you do in the first version.
Then something bad happens, and you forget to call delete, and you have a memory leak, in addition to a string that is extremely slow to allocate. So this is bad.
Like in the first case, you first initialize a to contain the empty string. Then you assign a new string, and then another. Each of these may require a call to new to allocate more memory. Each line also requires length, and possibly other internal variables to be assigned.
Normally, you'd allocate it like this:
One line, perform initialization once, rather than first default-initializing, and then assigning the value you want.
It also minimizes errors, because you don't have a nonsensical empty string anywhere in your program. If the string exists, it contains the value you want.
About memory management, google RAII.
In short, string calls new/delete internally to resize its buffer. That means you never need to allocate a string with new. The string object has a fixed size, and is designed to be allocated on the stack, so that the destructor is automatically called when it goes out of scope. The destructor then guarantees that any allocated memory is freed. That way, you don't have to use new/delete in your user code, which means you won't leak memory.
您不断使用赋值而不是初始化有什么具体原因吗? 也就是说,你为什么不写
等等? 这避免了默认构造并且在语义上更有意义。 仅仅为了在堆上分配字符串而创建一个指向字符串的指针是没有意义的,即你的情况 2 没有意义并且效率稍低。
至于你的最后一个问题,是的,C++ 中的字符串是可变的,除非声明为 const。
Is there a specific reason why you constantly use assignment instead of intialization? That is, why don't you write
etc.? This avoids a default construction and just makes more sense semantically. Creating a pointer to a string just for the sake of allocating it on the heap is never meaningful, i.e. your case 2 doesn't make sense and is slightly less efficient.
As to your last question, yes, strings in C++ are mutable unless declared
const
.2 个操作:调用默认构造函数 std:string(),然后调用operator::=
只有一个操作:调用构造函数 std:string(const char*) 但不要忘记释放指针。
关于什么
字符串a(“你好”);
2 operations: calls the default constructor std:string() and then calls the operator::=
only one operation: calls the constructor std:string(const char*) but you should not forget to release your pointer.
What about
string a("hello");
在情况 1.1 中,字符串 members (包括指向数据的指针)保存在
stack
中,并且当a
时,类实例占用的内存将被释放。 code> 超出范围。在情况 1.2 中,成员的内存也是从堆动态分配的。
当您将
char*
常量分配给字符串时,包含该数据的内存将被重新分配以适应新数据。您可以通过调用 string::capacity() 来查看分配了多少内存。
当您调用
string a("hello")
时,内存会在构造函数中分配。构造函数和赋值运算符都在内部调用相同的方法来分配内存并在那里复制新数据。
In case 1.1, your string members (which include pointer to the data) are held in
stack
and the memory occupied by the class instance is freed whena
goes out of scope.In case 1.2, memory for the members is allocated dynamically from heap too.
When you assign a
char*
constant to a string, memory that will contain the data will berealloc
'ed to fit the new data.You may see how much memory is allocated by calling
string::capacity()
.When you call
string a("hello")
, memory gets allocated in the constructor.Both constructor and assignment operator call same methods internally to allocated memory and copy new data there.
如果您查看 STL 字符串类的 docs (我相信 SGI文档符合规范),许多方法都列出了复杂性保证。 我相信许多复杂性保证都是故意模糊的,以允许不同的实现。 我认为某些实现实际上使用修改时复制方法,这样将一个字符串分配给另一个字符串是一种恒定时间操作,但是当您尝试修改其中一个实例时,可能会产生意外的成本。 但不确定现代 STL 中是否仍然如此。
您还应该检查
capacity()
函数,它会告诉您在强制重新分配内存之前可以放入给定字符串实例的最大长度字符串。 如果您知道稍后将在变量中存储一个大字符串,您还可以使用reserve()
重新分配特定数量。正如其他人所说,就您的示例而言,您确实应该倾向于初始化而不是其他方法,以避免创建临时对象。
If you look at the docs for the STL string class (I believe the SGI docs are compliant to the spec), many of the methods list complexity guarantees. I believe many of the complexity guarantees are intentionally left vague to allow different implementations. I think some implementations actually use a copy-on-modify approach such that assigning one string to another is a constant-time operation, but you may incur an unexpected cost when you try to modify one of those instances. Not sure if that's still true in modern STL though.
You should also check out the
capacity()
function, which will tell you the maximum length string you can put into a given string instance before it will be forced to reallocate memory. You can also usereserve()
to cause a reallocation to a specific amount if you know you're going to be storing a large string in the variable at a later time.As others have said, as far as your examples go, you should really favor initialization over other approaches to avoid the creation of temporary objects.
很可能
比其他任何东西都快。
Most likely
is faster than anything else.
你来自 Java,对吧? 在 C++ 中,对象的处理方式(在大多数方面)与基本值类型相同。 对象可以存在于堆栈或静态存储中,并按值传递。 当您在函数中声明字符串时,无论字符串对象占用多少字节,都会在堆栈上分配。 字符串对象本身确实使用动态内存来存储实际字符,但这对您来说是透明的。 另一件要记住的事情是,当函数退出并且您声明的字符串不再在范围内时,它使用的所有内存都会被释放。 不需要垃圾收集(RAII 是你最好的朋友)。
在您的示例中:
这会将一块内存放在堆栈上并将其命名为 a,然后调用构造函数并将 a 初始化为空字符串。 编译器将“less”和“moreeeeeee”的字节存储在(我认为)exe 的 .rdata 部分中。 字符串 a 将有几个字段,例如长度字段和 char* (我大大简化了)。 当您将“less”分配给a时,将调用operator=()方法。 它动态分配内存来存储输入值,然后将其复制进去。当您稍后将“moreeeeeee”分配给a时,将再次调用operator=()方法,并在必要时重新分配足够的内存来保存新值,然后复制它到内部缓冲区。
当字符串 a 的作用域退出时,将调用字符串析构函数,并释放动态分配用于保存实际字符的内存。 然后堆栈指针递减,并且保存 a 的内存不再“位于”堆栈上。
You're coming from Java, right? In C++, objects are treated the same (in most ways) as the basic value types. Objects can live on the stack or in static storage, and be passed by value. When you declare a string in a function, that allocates on the stack however many bytes the string object takes. The string object itself does use dynamic memory to store the actual characters, but that's transparent to you. The other thing to remember is that when the function exits and the string you declared is no longer in scope, all of the memory it used is freed. No need for garbage collection (RAII is your best friend).
In your example:
This puts a block of memory on the stack and names it a, then the constructor is called and a is initialized to an empty string. The compiler stores the bytes for "less" and "moreeeeeee" in (I think) the .rdata section of your exe. String a will have a few fields, like a length field and a char* (I'm simplifying greatly). When you assign "less" to a, the operator=() method is called. It dynamically allocates memory to store the input value, then copies it in. When you later assign "moreeeeeee" to a, the operator=() method is again called and it reallocates enough memory to hold the new value if necessary, then copies it in to the internal buffer.
When string a's scope exits, the string destructor is called and the memory that was dynamically allocated to hold the actual characters is freed. Then the stack pointer is decremented and the memory that held a is no longer "on" the stack.
直接在堆中创建字符串通常不是一个好主意,就像创建基本类型一样。 这是不值得的,因为该对象可以轻松地保留在堆栈上,并且它具有高效复制所需的所有复制构造函数和赋值运算符。
std:string 本身在堆中有一个缓冲区,根据实现的不同,该缓冲区可能被多个字符串共享。
例如,使用 Microsoft 的 STL 实现,您可以这样做:
并且两个字符串将共享相同的缓冲区,直到您更改它:
这就是为什么存储 c_str() 以供以后使用非常糟糕; c_str() 仅保证有效性,直到再次调用该字符串对象。
这会导致非常讨厌的并发错误,如果您在多线程应用程序中使用它们,则需要使用定义关闭此共享功能。
Creating a string directly in the heap is usually not a good idea, just like creating base types. It's not worth it since the object can easily stay on the stack and it has all the copy constructors and assignment operator needed for an efficient copy.
The std:string itself has a buffer in heap that may be shared by several string depending on the implementation.
For instance, with Microsoft's STL implementation you could do that:
And both string would share the same buffer until you changed it:
That's why it was very bad to store the c_str() for latter use; c_str() guarantee only validity until another call to that string object is made.
This lead to very nasty concurrency bugs that required this sharing functionality to be turned off with a define if you used them in a multithreaded application.