为什么 C 和 C++ 中存在多字符文字?

发布于 2024-09-27 19:18:48 字数 847 浏览 5 评论 0原文

我不知道 C 和 C++ 允许 多字符文字:不是“c”(C 中的 int 类型和 char< /em> 在 C++ 中),但是 'tralivali' (int 类型!)

enum
{
    ActionLeft = 'left',
    ActionRight = 'right',
    ActionForward = 'forward',
    ActionBackward = 'backward'
};

标准说:

C99 6.4.4.4p10:“ 整数字符常量包含 多个字符(例如“ab”), 或包含字符或转义符 不映射到的序列 单字节执行字符,是 实现定义的。”

我发现它们广泛用于 C4 引擎但我认为当我们谈论与平台无关的序列化时,它们也可能会令人困惑,因为它们看起来像字符串。那么,多字符文字的使用范围是什么,它们有什么用途吗? C++ 只是为了与 C 代码兼容吗?它们作为 goto 运算符是否被认为是一个不好的功能?

I didn't know that C and C++ allow multicharacter literal: not 'c' (of type int in C and char in C++), but 'tralivali' (of type int!)

enum
{
    ActionLeft = 'left',
    ActionRight = 'right',
    ActionForward = 'forward',
    ActionBackward = 'backward'
};

Standard says:

C99 6.4.4.4p10: "The value of an
integer character constant containing
more than one character (e.g., 'ab'),
or containing a character or escape
sequence that does not map to a
single-byte execution character, is
implementation-defined."

I found they are widely used in C4 engine. But I suppose they are not safe when we are talking about platform-independent serialization. Thay can be confusing also because look like strings. So what is multicharacter literals scope of usage, are they useful for something? Are they in C++ just for compatibility with C code? Are they considered to be a bad feature as goto operator or not?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(5

病毒体 2024-10-04 19:18:48

它可以更轻松地在内存转储中挑选出值。

示例:

enum state { waiting, running, stopped };

vs.

enum state { waiting = 'wait', running = 'run.', stopped = 'stop' };

在以下语句之后进行内存转储:

s = stopped;

可能看起来像:

00 00 00 02 . . . .

在第一种情况下, vs:

73 74 6F 70 s t o p

使用多字符文字。 (当然,它是说“stop”还是“pots”取决于字节顺序)

It makes it easier to pick out values in a memory dump.

Example:

enum state { waiting, running, stopped };

vs.

enum state { waiting = 'wait', running = 'run.', stopped = 'stop' };

a memory dump after the following statement:

s = stopped;

might look like:

00 00 00 02 . . . .

in the first case, vs:

73 74 6F 70 s t o p

using multicharacter literals. (of course whether it says 'stop' or 'pots' depends on byte ordering)

抹茶夏天i‖ 2024-10-04 19:18:48

我不知道它的使用范围有多大,但“实现定义”对我来说是一个很大的危险信号。据我所知,这可能意味着实现可以选择忽略您的字符指定,并根据需要只分配正常的递增值。它可能会做一些“更好”的事情,但是您不能依赖跨编译器(甚至编译器版本)的这种行为。至少“goto”有可预测的(如果不受欢迎的话)行为......

无论如何,这是我的2c。

编辑:关于“实现定义”:

来自 Bjarne Stroustrup 的 C++ 术语表

实现定义 - C++ 语义的一个方面,为每个实现定义,而不是在标准中为每个实现指定。一个例子是 int 的大小(必须至少为 16 位,但可以更长)。尽可能避免实现定义的行为。另请参见:未定义。 TC++PL C.2。

还...

未定义 - C++ 语义的一个方面,不需要合理的行为。一个例子是取消引用值为零的指针。避免未定义的行为。另请参阅:实现定义。 TC++PL C.2。

我相信这意味着评论是正确的:它至少应该编译,尽管没有指定除此之外的任何内容。另请注意定义中的建议。

I don't know how extensively this is used, but "implementation-defined" is a big red-flag to me. As far as I know, this could mean that the implementation could choose to ignore your character designations and just assign normal incrementing values if it wanted. It may do something "nicer", but you can't rely on that behavior across compilers (or even compiler versions). At least "goto" has predictable (if undesirable) behavior...

That's my 2c, anyway.

Edit: on "implementation-defined":

From Bjarne Stroustrup's C++ Glossary:

implementation defined - an aspect of C++'s semantics that is defined for each implementation rather than specified in the standard for every implementation. An example is the size of an int (which must be at least 16 bits but can be longer). Avoid implementation defined behavior whenever possible. See also: undefined. TC++PL C.2.

also...

undefined - an aspect of C++'s semantics for which no reasonable behavior is required. An example is dereferencing a pointer with the value zero. Avoid undefined behavior. See also: implementation defined. TC++PL C.2.

I believe this means the comment is correct: it should at least compile, although anything beyond that is not specified. Note the advice in the definition, also.

坏尐絯 2024-10-04 19:18:48

四字符文字,我见过并使用过。它们映射到 4 个字节 = 1 个 32 位字。如上所述,它对于调试目的非常有用。它们可以在带有整数的 switch/case 语句中使用,这很好。

这个(4 个字符)是相当标准的(即至少受 GCC 和 VC++ 支持),尽管结果(编译的实际值)可能因一种实现而异。

但超过 4 个字符?我不会用。

更新:来自 C4 页面:“对于我们的简单操作,我们将只提供一些值的枚举,这是通过指定四字符常量在 C4 中完成的”。所以他们使用 4 个字符文字,就像我的情况一样。

Four character literals, I've seen and used. They map to 4 bytes = one 32 bit word. It's very useful for debugging purposes as said above. They can be used in a switch/case statement with ints, which is nice.

This (4 Chars) is pretty standard (ie supported by GCC and VC++ at least), although results (actual values compiled) may vary from one implementation to another.

But over 4 chars? I wouldn't use.

UPDATE: From the C4 page: "For our simple actions, we'll just provide an enumeration of some values, which is done in C4 by specifying four-character constants". So they are using 4 chars literals, as was my case.

分開簡單 2024-10-04 19:18:48

多字符文字允许通过等效的字符表示来指定 int 值。对于枚举、FourCC 代码和标签以及非类型模板参数很有用。使用多字符文字,可以直接在源代码中输入 FourCC 代码,这很方便。

gcc 中的实现描述于 https://gcc.gnu.org/ onlinedocs/cpp/Implementation-define-behavior.html 。请注意,该值会被截断为 int 类型的大小,因此如果您的 int 为 4 个字符宽,则 'efgh' == 'abcdefgh',尽管 gcc 会发出对溢出的文字发出警告。

不幸的是,如果传递了 -pedantic,gcc 将对所有多字符文字发出警告,因为它们的行为是实现定义的。正如您在上面所看到的,如果您切换实现,两个多字符文字的相等性可能会发生变化。

Multicharacter literals allow one to specify int values via the equivalent representation in characters. Useful for enums, FourCC codes and tags, and non-type template parameters. With a multicharacter literal, a FourCC code can be typed directly into the source, which is handy.

The implementation in gcc is described at https://gcc.gnu.org/onlinedocs/cpp/Implementation-defined-behavior.html . Note that the value is truncated to the size of the type int, so 'efgh' == 'abcdefgh' if your ints are 4 chars wide, although gcc will issue a warning on the literal that overflows.

Unfortunately, gcc will issue a warning on all multi-character literals if -pedantic is passed, as their behavior is implementation-defined. As you can see above, it is perhaps possible for equality of two multi-character literals to change if you switch implementations.

泪冰清 2024-10-04 19:18:48

C++ 14 规范草案 N4527 第 2.13.3 节,条目 2:

... 包含多个 c 字符的普通字符文字是多字符文字。多字符文字或包含在执行字符集中不可表示的单个 c-char 的普通字符文字是有条件支持的,具有 int 类型,并且具有实现定义的值。

您问题的先前答案主要涉及支持多字符文字的真实机器。具体来说,在 int 为 4 字节的平台上,四字节多字符就可以了,并且可以根据 Ferrucio 的内存转储示例来方便地使用。但是,由于无法保证这在其他平台上能够正常工作或以相同的方式工作,因此对于可移植程序,应弃用多字符文字

In C++14 specification draft N4527 section 2.13.3, entry 2:

... An ordinary character literal that contains more than one c-char is a multicharacter literal. A multicharacter literal, or an ordinary character literal containing a single c-char not representable in the execution character set, is conditionally-supported, has type int, and has an implementation-defined value.

Previous answers to your question pertained mostly on real machines that did support multicharacter literals. Specifically, on platforms where int is 4 bytes, four-byte multicharacter is fine and can be used for convenience, as per Ferrucio's mem dump example. But, as there is no guarantee that this will ever work or work the same way on other platforms, use of multicharacter literals should be deprecated for portable programs.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文