“&s[0]”是否是指向 std::string 中的连续字符?

发布于 2024-08-16 10:47:19 字数 269 浏览 18 评论 0原文

我正在做一些维护工作,并遇到了类似以下内容:

std::string s;
s.resize( strLength );  
// strLength is a size_t with the length of a C string in it. 

memcpy( &s[0], str, strLength );

我知道如果它是 std::vector,那么使用 &s[0] 是安全的,但是这是 std::string 的安全使用吗?

I'm doing some maintenance work and ran across something like the following:

std::string s;
s.resize( strLength );  
// strLength is a size_t with the length of a C string in it. 

memcpy( &s[0], str, strLength );

I know using &s[0] would be safe if it was a std::vector, but is this a safe use of std::string?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(6

爱殇璃 2024-08-23 10:47:19

在 C++98/03 标准下,不保证 std::string 的分配是连续的,但 C++11 强制要求它是连续的。实际上,我和 都没有Herb Sutter知道一种不使用连续存储的实现。

请注意,&s[0] 始终保证按照 C++11 标准工作,即使在 0 长度字符串的情况下也是如此。如果您执行 str.begin()&*str.begin(),则无法保证,但对于 &s[0] 标准将 operator[] 定义为:

返回*(begin() + pos) if pos *(begin() + pos) size(),否则是对值为 charT()T 类型对象的引用;参考值不得修改

继续,data() 定义为:

返回:一个指针p,对于每个i,p + i == &operator[](i)[0,size()] 中。

(注意范围两端的方括号)


注意:预标准化 C++0x 不保证 &s[0] 能够使用零长度字符串(实际上,这是明确未定义的行为),并且此答案的较旧版本解释了这一点;这已在后来的标准草案中得到修复,因此答案已相应更新。

A std::string's allocation is not guaranteed to be contiguous under the C++98/03 standard, but C++11 forces it to be. In practice, neither I nor Herb Sutter know of an implementation that does not use contiguous storage.

Notice that the &s[0] thing is always guaranteed to work by the C++11 standard, even in the 0-length string case. It would not be guaranteed if you did str.begin() or &*str.begin(), but for &s[0] the standard defines operator[] as:

Returns: *(begin() + pos) if pos < size(), otherwise a reference to an object of type T with value charT(); the referenced value shall not be modified

Continuing on, data() is defined as:

Returns: A pointer p such that p + i == &operator[](i) for each i in [0,size()].

(notice the square brackets at both ends of the range)


Notice: pre-standardization C++0x did not guarantee &s[0] to work with zero-length strings (actually, it was explicitly undefined behavior), and an older revision of this answer explained this; this has been fixed in later standard drafts, so the answer has been updated accordingly.

野心澎湃 2024-08-23 10:47:19

使用安全。我认为大多数答案曾经是正确的,但标准改变了。引用 C++11 标准,basic_string 一般要求 [string.require],21.4.1.5,表示:

basic_string 对象中的类字符对象应连续存储。也就是说,对于任何 basic_string
对象 s,恒等式 &*(s.begin() + n) == &*s.begin() + n 对于 n 的所有值都成立,使得 0
<=n< s.size()。

在此之前,它说所有迭代器都是随机访问迭代器。这两个位都支持您问题的用法。 (此外,Stroustrup 显然在他的最新书中使用了它;))

这种更改不太可能是在 C++11 中完成的。我似乎记得当时为向量添加了相同的保证,该版本还获得了非常有用的 data() 指针。

希望有帮助。

It is safe to use. I think most answers were correct once, but the standard changed. Quoting from C++11 standard, basic_string general requirements [string.require], 21.4.1.5, says:

The char-like objects in a basic_string object shall be stored contiguously. That is, for any basic_string
object s, the identity &*(s.begin() + n) == &*s.begin() + n shall hold for all values of n such that 0
<= n < s.size().

A bit before that, it says that all iterators are random access iterators. Both bits support the usage of your question. (Additionally, Stroustrup apparently uses it in his newest book ;) )

It's not unlikely that this change was done in C++11. I seem to remember that the same guarantee was added then for vector, which also got the very useful data() pointer with that release.

Hope that helps.

最后的乘客 2024-08-23 10:47:19

从技术上讲,不需要,因为不需要 std::string 将其内容连续存储在内存中。

然而,在几乎所有实现中(我所知道的每个实现),内容都是连续存储的,这将“起作用”。

Technically, no, since std::string is not required to store its contents contiguously in memory.

However, in almost all implementations (every implementation of which I am aware), the contents are stored contiguously and this would "work."

掩饰不了的爱 2024-08-23 10:47:19

读者应该注意到,这个问题是在 2009 年提出的,当时 C++03 标准是当前的出版物。这个答案基于该版本的标准,其中std::string保证使用连续存储。由于这个问题不是在特定平台(如 gcc)的上下文中提出的,因此我对 OP 的平台不做任何假设 - 特别是它是否使用连续存储来存储字符串

合法的?也许,也许不是。安全的?可能是,但也可能不是。好的代码?好吧,我们不去那里......

为什么不直接做:

std::string s = str;

...或:

std::string s(str);

...或:

std::string s;
std::copy( &str[0], &str[strLen], std::back_inserter(s));

...或:

std::string s;
s.assign( str, strLen );

Readers should note that this question was asked in 2009, when the C++03 Standard was the current publication. This answer is based on that version of the Standard, in which std::strings are not guaranteed to utilize contiguous storage. Since this question was not asked in the context of a particular platform (like gcc), I make no assumptions about OP's platform -- in particular, weather or not it utilized contigious storage for the string.

Legal? Maybe, maybe not. Safe? Probably, but maybe not. Good code? Well, let's not go there...

Why not just do:

std::string s = str;

...or:

std::string s(str);

...or:

std::string s;
std::copy( &str[0], &str[strLen], std::back_inserter(s));

...or:

std::string s;
s.assign( str, strLen );

?

指尖微凉心微凉 2024-08-23 10:47:19

无论内部字符串序列是否连续存储在内存中,这通常安全。除了连续性之外,可能还有许多其他与 std::string 对象如何存储受控序列相关的实现细节。

一个真正的实际问题可能如下。 std::string 的受控序列不需要存储为以零结尾的字符串。然而,在实践中,许多(大多数?)实现选择将内部缓冲区增大 1 并将序列存储为以零结尾的字符串,因为它简化了 c_str() 方法的实现:只需返回一个指向内部缓冲区的指针就完成了。

您在问题中引用的代码不会做出任何努力将数据复制到内部缓冲区中以零终止。很可能它根本不知道对于 std::string 的实现是否需要零终止。它很可能依赖于调用resize后用零填充的内部缓冲区,因此实现为零终止符分配的额外字符可以方便地预先设置为零。所有这些都是实现细节,这意味着该技术依赖于一些相当脆弱的假设。

换句话说,在某些实现中,您可能必须使用 strcpy 而不是 memcpy 来强制数据进入受控序列。而在其他一些实现中,您必须使用 memcpy 而不是 strcpy

This is generally not safe, regardless of whether the internal string sequence is stored in memory continuously or not. There's might be many other implementation details related to how the controlled sequence is stored by std::string object, besides the continuity.

A real practical problem with that might be the following. The controlled sequence of std::string is not required to be stored as a zero-terminated string. However, in practice, many (most?) implementations choose to oversize the internal buffer by 1 and store the sequence as a zero-terminated string anyway because it simplifies the implementation of c_str() method: just return a pointer to the internal buffer and you are done.

The code you quoted in your question does not make any effort to zero-terminate the data is copied into the internal buffer. Quite possibly it simply doesn't know whether zero-termination is necessary for this implementation of std::string. Quite possibly it relies on the internal buffer being filled with zeros after the call to resize, so the extra character allocated for the zero-terminator by the implementation is conveniently pre-set to zero. All this is an implementation detail, meaning that this technique depends on some rather fragile assumptions.

In other words, in some implementations, you'd probably have to use strcpy, not memcpy to force the data into the controlled sequence like that. While in some other implementations you'd have to use memcpy and not strcpy.

旧城空念 2024-08-23 10:47:19

该代码可能会工作,但更多的是运气而不是判断,它对无法保证的实现做出了假设。我建议确定代码的有效性是无关紧要的,而这是一种毫无意义的过度复杂化,很容易简化为:

std::string s( str ) ;

或者如果分配给现有的 std::string 对象,则: 然后

s = str ;

让 std::string 本身确定如何实现结果。如果您打算诉诸这种废话,那么您最好不要使用 std::string 并坚持使用,因为您将重新引入与 C 字符串相关的所有危险。

The code might work, but more by luck than judgement, it makes assumptions about the implementation that are not guaranteed. I suggest determining the validity of the code is irrelevant while it is a pointless over complication that is easily reduced to just:

std::string s( str ) ;

or if assigning to an existing std::string object, just:

s = str ;

and then let std::string itself determine how to achieve the result. If you are going to resort to this sort of nonsense, then you may as well not be using std::string and stick to since you are reintroducing all the dangers associated with C strings.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文