C语言结构

发布于 2024-12-07 10:54:12 字数 187 浏览 1 评论 0原文

为什么这有效

printf("Hello"
"World");

而

printf("Hello
""World");

无效？ ANSI C 连接相邻的字符串，这没问题……但这是另一回事。这和C语言解析器有什么关系吗？谢谢

原文

Why does this work

printf("Hello"
"World");

Whereas

printf("Hello
""World");

does not?
ANSI C concatenates adjacent Strings, that's ok... but it's a different thing.
Does this have something to do with the C language parser or something?
Thanks

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

无声无音无过去 2024-12-14 10:54:12

字符串必须在行尾之前终止。这是一件好事。否则，忘记的右引号可能会阻止后续代码行的执行。

这可能会花费大量时间进行调试。如今，语法着色可以提供线索，但早年有单色显示。

回复收藏 0 原文

白云不回头 2024-12-14 10:54:12

您无法在字符串文字中创建新行。这是我的 C 设计师做出的选择。在我看来，这是一个很好的功能。
但是，您可以这样做：

printf("Hello\
""World");

这会产生相同的结果。

You can't make a new line in a string literal. This was a choice made my the designers of C. IMO it's a good feature though.
You can however do this:

printf("Hello\
""World");

Which gives the same results.

回复收藏 0 原文

心意如水 2024-12-14 10:54:12

C 语言是根据标记来定义的，其中一个标记是字符串文字（标准语言：s-char-sequence）。 s-char-sequences 以未转义的双引号开头和结尾，并且不得包含未转义的换行符。

相关标准 (C99) 引用：

> Syntax
>   string-literal:
>     " s-char-sequence(opt) "
>     L" s-char-sequence(opt) "
>   s-char-sequence:
>     s-char
>     s-char-sequence s-char
>   s-char:
>     any member of the source character set
>           except the double-quote ", backslash \,
>           or new-line character
>     escape-sequence

然而，转义换行符在称为行拼接的早期翻译阶段被删除，因此编译器永远无法解释它们。以下是相关标准（C99）引用：

翻译语法规则的优先级由以下阶段指定。

如有必要，物理源文件多字节字符将以实现定义的方式映射到源字符集（引入换行符作为行尾指示符）。三字母序列被相应的单字符内部表示取代。
紧接着换行符的每个反斜杠字符 (\) 实例都将被删除，从而拼接物理源代码行以形成逻辑源代码行。只有任何物理源行上的最后一个反斜杠才有资格成为此类拼接的一部分。非空的源文件应以换行符结尾，在发生任何此类拼接之前，换行符前面不应紧接反斜杠字符。
源文件被分解为预处理标记6) 和序列
空白字符（包括注释）。源文件不得以
部分预处理标记或部分注释中。每个评论都被替换为
一个空格字符。保留换行符。是否每个非空
保留除换行符之外的一系列空白字符或将其替换为一个空格字符是实现定义的。
执行预处理指令，扩展宏调用，并且
_Pragma 一元运算符表达式被执行。如果一个字符序列
匹配由 token 生成的通用字符名称的语法
连接（6.10.3.3），行为未定义。 #include 预处理
指令导致指定的头文件或源文件从第 1 阶段开始处理
递归地完成第 4 阶段。然后删除所有预处理指令。
字符常量和字符串文字中的每个源字符集成员和转义序列都转换为执行字符集的相应成员；如果没有相应的成员，则将其转换为除空（宽）字符之外的实现定义成员。7)
相邻字符串文字标记被连接。
分隔标记的空白字符不再重要。每个
预处理token被转换成token。生成的令牌是
作为一个翻译单元进行句法和语义分析和翻译。
所有外部对象和函数引用均已解析。链接库组件以满足对当前翻译中未定义的函数和对象的外部引用。所有此类翻译器输出都被收集到程序映像中，该程序映像包含在其执行环境中执行所需的信息。

The C language is defined in terms of tokens and one of the tokens is a string literal (in standardese: an s-char-sequence). s-char-sequences start and end with unescaped double quotes and must not contain an unescaped newline.

Relevant standard (C99) quote:

> Syntax
>   string-literal:
>     " s-char-sequence(opt) "
>     L" s-char-sequence(opt) "
>   s-char-sequence:
>     s-char
>     s-char-sequence s-char
>   s-char:
>     any member of the source character set
>           except the double-quote ", backslash \,
>           or new-line character
>     escape-sequence

Escaped newlines, however, are removed in an early translation phase called line splicing, so the compiler never gets to interpret them. Here's the relevant standard (C99) quote:

The precedence among the syntax rules of translation is specified by the following phases.

Physical source file multibyte characters are mapped, in an implementationdefined manner, to the source character set (introducing new-line characters for end-of-line indicators) if necessary. Trigraph sequences are replaced by corresponding single-character internal representations.
Each instance of a backslash character (\) immediately followed by a new-line character is deleted, splicing physical source lines to form logical source lines. Only the last backslash on any physical source line shall be eligible for being part of such a splice. A source file that is not empty shall end in a new-line character, which shall not be immediately preceded by a backslash character before any such splicing takes place.
The source file is decomposed into preprocessing tokens6) and sequences of
white-space characters (including comments). A source file shall not end in a
partial preprocessing token or in a partial comment. Each comment is replaced by
one space character. New-line characters are retained. Whether each nonempty
sequence of white-space characters other than new-line is retained or replaced by one space character is implementation-defined.
Preprocessing directives are executed, macro invocations are expanded, and
_Pragma unary operator expressions are executed. If a character sequence that
matches the syntax of a universal character name is produced by token
concatenation (6.10.3.3), the behavior is undefined. A #include preprocessing
directive causes the named header or source file to be processed from phase 1
through phase 4, recursively. All preprocessing directives are then deleted.
Each source character set member and escape sequence in character constants and string literals is converted to the corresponding member of the execution character set; if there is no corresponding member, it is converted to an implementationdefined member other than the null (wide) character.7)
Adjacent string literal tokens are concatenated.
White-space characters separating tokens are no longer significant. Each
preprocessing token is converted into a token. The resulting tokens are
syntactically and semantically analyzed and translated as a translation unit.
All external object and function references are resolved. Library components are linked to satisfy external references to functions and objects not defined in the current translation. All such translator output is collected into a program image which contains information needed for execution in its execution environment.