C 和 C++ 中字符串文字的类型是什么?

发布于 2024-08-21 08:08:21 字数 121 浏览 2 评论 0原文

C 中字符串文字的类型是什么?是char *还是const char *还是const char * const

那么 C++ 呢?

What is the type of string literal in C? Is it char * or const char * or const char * const?

What about C++?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(4

雅心素梦 2024-08-28 08:08:21

在 C 中,字符串文字的类型是 char[] - 根据类型,它不是 const,但修改内容是未定义的行为。此外,具有相同内容(或足够相同内容)的 2 个不同字符串文字可能会或可能不会共享相同的数组元素。

来自 C99 标准 6.4.5/5“字符串文字 - 语义”:

在翻译阶段 7,值为零的字节或代码被附加到由一个或多个字符串文字产生的每个多字节字符序列。然后,使用多字节字符序列来初始化静态存储持续时间和长度足以包含该序列的数组。对于字符串文字,数组元素的类型为 char,并使用多字节字符序列的各个字节进行初始化;对于宽字符串文字,数组元素的类型为 wchar_t,并使用宽字符序列进行初始化...

如果这些数组的元素具有适当的值,则未指定这些数组是否不同。如果程序尝试修改此类数组,则行为未定义。

在 C++ 中,“普通字符串文字的类型为 'array of n const char”(来自 2.13.4/1“字符串文字”)。但 C++ 标准中有一种特殊情况,可以使指向字符串文字的指针轻松转换为非 const 限定指针(4.2/2“数组到指针的转换”):

不是宽字符串文字的字符串文字 (2.13.4) 可以转换为“指向 char 的指针”类型的右值;宽字符串文字可以转换为“指向 wchar_t 的指针”类型的右值。

附带说明一下 - 因为 C/C++ 中的数组很容易转换为指针,所以字符串文字通常可以在指针上下文中使用,就像 C/C++ 中的任何数组一样。


额外的社论:接下来的内容实际上主要是我对 C 和 C++ 标准对字符串文字类型所做选择的基本原理的猜测。因此,请持保留态度(但如果您有更正或其他详细信息,请发表评论):

我认为 C 标准选择将字符串文字设置为非 const 类型,因为过去(现在)有太多代码需要能够使用指向文字的非 const 限定 char 指针。当添加 const 限定符时(如果我没记错的话,这是在 ANSI 标准化时间左右完成的,但在 K&RC 已经积累了大量现有代码很久之后),如果他们指向字符串文字只能分配给 char const* 类型,而无需进行强制转换,几乎每个现有程序都需要更改。这不是接受标准的好方法...

我相信对 C++ 进行的更改,即字符串文字是 const 限定的主要是为了支持允许文字字符串更适当地匹配采用“的重载” char const*”参数。我认为也希望弥补类型系统中的一个已知漏洞,但该漏洞很大程度上是由数组到指针转换的特殊情况打开的。

标准的附录 D 指出“不推荐使用从 const 到非常量字符串字面值的隐式转换 (4.2)”,但我认为很多代码仍然会被破坏,以至于编译器实现者或标准委员会愿意真正拔掉插头(除非可以设计出其他一些聪明的技术 - 但这样漏洞就会回来,不是吗?)。

In C the type of a string literal is a char[] - it's not const according to the type, but it is undefined behavior to modify the contents. Also, 2 different string literals that have the same content (or enough of the same content) might or might not share the same array elements.

From the C99 standard 6.4.5/5 "String Literals - Semantics":

In translation phase 7, a byte or code of value zero is appended to each multibyte character sequence that results from a string literal or literals. The multibyte character sequence is then used to initialize an array of static storage duration and length just sufficient to contain the sequence. For character string literals, the array elements have type char, and are initialized with the individual bytes of the multibyte character sequence; for wide string literals, the array elements have type wchar_t, and are initialized with the sequence of wide characters...

It is unspecified whether these arrays are distinct provided their elements have the appropriate values. If the program attempts to modify such an array, the behavior is undefined.

In C++, "An ordinary string literal has type 'array of n const char'" (from 2.13.4/1 "String literals"). But there's a special case in the C++ standard that makes pointer to string literals convert easily to non-const-qualified pointers (4.2/2 "Array-to-pointer conversion"):

A string literal (2.13.4) that is not a wide string literal can be converted to an rvalue of type “pointer to char”; a wide string literal can be converted to an rvalue of type “pointer to wchar_t”.

As a side note - because arrays in C/C++ convert so readily to pointers, a string literal can often be used in a pointer context, much as any array in C/C++.


Additional editorializing: what follows is really mostly speculation on my part about the rationale for the choices the C and C++ standards made regarding string literal types. So take it with a grain of salt (but please comment if you have corrections or additional details):

I think that the C standard chose to make string literal non-const types because there was (and is) so much code that expects to be able to use non-const-qualified char pointers that point to literals. When the const qualifier got added (which if I'm not mistaken was done around ANSI standardization time, but long after K&R C had been around to accumulate a ton of existing code) if they made pointers to string literals only able to be be assigned to char const* types without a cast nearly every program in existence would have required changing. Not a good way to get a standard accepted...

I believe the change to C++ that string literals are const qualified was done mainly to support allowing a literal string to more appropriately match an overload that takes a "char const*" argument. I think that there was also a desire to close a perceived hole in the type system, but the hole was largely opened back up by the special case in array-to-pointer conversions.

Annex D of the standard indicates that the "implicit conversion from const to non-const qualification for string literals (4.2) is deprecated", but I think so much code would still break that it'll be a long time before compiler implementers or the standards committee are willing to actually pull the plug (unless some other clever technique can be devised - but then the hole would be back, wouldn't it?).

め可乐爱微笑 2024-08-28 08:08:21

C 字符串文字的类型为 char [n],其中 n 等于字符数 + 1,以说明字符串末尾的隐式零。

该数组将被静态分配;它不是 const,但修改它是未定义的行为。

如果它具有指针类型 char * 或不完整类型 char [],则 sizeof 无法按预期工作。

将字符串文字设为 const 是 C++ 习惯用法,不属于任何 C 标准。

A C string literal has type char [n] where n equals number of characters + 1 to account for the implicit zero at the end of the string.

The array will be statically allocated; it is not const, but modifying it is undefined behaviour.

If it had pointer type char * or incomplete type char [], sizeof could not work as expected.

Making string literals const is a C++ idiom and not part of any C standard.

皓月长歌 2024-08-28 08:08:21

由于各种历史原因,C 中的字符串文字始终为 char[] 类型。

早期(在 C90 中),有人指出修改字符串文字会调用未定义的行为。

不过,他们并没有禁止此类修改,也没有制作更有意义的字符串文字 const char[] 。这是出于与旧代码的向后兼容性的原因。如果您修改了字符串文字,某些旧操作系统(尤其是 DOS)不会提出抗议,因此存在大量此类代码。

如今,即使在最新的 C 标准中,C 仍然存在此缺陷。

C++ 继承了 C 的同样的缺陷,但在后来的 C++ 标准中,他们最终将字符串文字设为 const(在 C++03 中标记为过时,最终在 C++11 中修复)。

For various historical reasons, string literals were always of type char[] in C.

Early on (in C90), it was stated that modifying a string literal invokes undefined behavior.

They didn't ban such modifications though, nor did they make string literals const char[] which would have made more sense. This was for backwards-compatibility reasons with old code. Some old OS (most notably DOS) didn't protest if you modified string literals, so there was plenty of such code around.

C still has this defect today, even in the most recent C standard.

C++ inherited the same very same defect from C, but in later C++ standards, they have finally made string literals const (flagged obsolete in C++03, finally fixed in C++11).

月棠 2024-08-28 08:08:21

它们曾经是 char[] 类型。现在它们的类型是const char[]

They used to be of type char[]. Now they are of type const char[].

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文