为什么必须 C/C++字符串文字声明是单行的吗?
C++ 中不允许使用如下所示的多行字符串文字,是否有任何特殊原因?
string script =
"
Some
Formatted
String Literal
";
我知道可以通过在每个换行符之前添加反斜杠来创建多行字符串文字。 我正在编写一种编程语言(类似于 C),并且希望能够轻松创建多行字符串(如上面的示例所示)。
是否有任何技术原因可以避免这种字符串文字?否则,我将不得不使用类似 python 的带有三引号的字符串文字(我不想这样做):
string script =
"""
Some
Formatted
String Literal
""";
为什么 C/C++ 字符串文字声明必须是单行?
Is there any particular reason that multi-line string literals such as the following are not permitted in C++?
string script =
"
Some
Formatted
String Literal
";
I know that multi-line string literals may be created by putting a backslash before each newline.
I am writing a programming language (similar to C) and would like to allow the easy creation of multi-line strings (as in the above example).
Is there any technical reason for avoiding this kind of string literal? Otherwise I would have to use a python-like string literal with a triple quote (which I don't want to do):
string script =
"""
Some
Formatted
String Literal
""";
Why must C/C++ string literal declarations be single-line?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(10)
简洁的答案是“因为语法禁止多行字符串文字”。我不知道除了历史原因之外,是否还有其他充分的理由。
当然,有一些方法可以解决这个问题。您可以使用行拼接:
如果
\
作为行上的最后一个字符出现,则在预处理过程中将删除换行符。或者,您可以使用字符串文字串联:
相邻的字符串文字在预处理期间被连接,因此它们在编译时最终将成为单个字符串文字。
使用任一技术,字符串文字最终都会像这样编写:
The terse answer is "because the grammar prohibits multiline string literals." I don't know whether there is a good reason for this other than historical reasons.
There are, of course, ways around this. You can use line splicing:
If the
\
appears as the last character on the line, the newline will be removed during preprocessing.Or, you can use string literal concatenation:
Adjacent string literals are concatenated during preprocessing, so these will end up as a single string literal at compile-time.
Using either technique, the string literal ends up as if it were written:
人们必须考虑到 C 并不是一种“应用程序”编程语言,而是一种系统编程语言。可以说它是专门为了重写 Unix 而设计的。考虑到这一点,当时没有 EMACS 或 VIM,您的用户界面是串行终端。在没有多行文本编辑器的系统上,多行字符串声明似乎有点毫无意义。此外,对于那些希望在特定时间点编写操作系统的人来说,字符串操作并不是主要关心的问题。传统的 UNIX 脚本工具集,例如 AWK 和 SED(以及许多其他工具)证明了他们没有使用 C 来进行重要的字符串操作。
其他注意事项:在 70 年代初期(编写 C 语言时),在打孔卡上提交程序并在第二天回来获取它们的情况并不少见。编译具有多行字符串文字的程序会消耗额外的处理时间吗?并不真地。实际上,编译器的工作量可以减少。但在大多数情况下,无论如何你都会在第二天回来拿它。但填写打孔卡的人不会在程序中放入大量不需要的文本。
在现代环境中,除了设计者的偏好之外,可能没有理由不包含多行字符串文字。从语法上来说,它可能更简单,因为在解析字符串文字时不必考虑换行。
One has to consider that C was not written to be an "Applications" programming language but a systems programming language. It would not be inaccurate to say it was designed expressly to rewrite Unix. With that in mind, there was no EMACS or VIM and your user interfaces were serial terminals. Multiline string declarations would seem a bit pointless on a system that did not have a multiline text editor. Furthermore, string manipulation would not be a primary concern for someone looking to write an OS at that particular point in time. The traditional set of UNIX scripting tools such as AWK and SED (amongst MANY others) are a testament to the fact they weren't using C to do significant string manipulation.
Additional considerations: it was not uncommon in the early 70s (when C was written) to submit your programs on PUNCH CARDS and come back the next day to get them. Would it have eaten up extra processing time to compile a program with multiline strings literals? Not really. It can actually be less work for the compiler. But you were going to come back for it the next day anyhow in most cases. But nobody who was filling out a punch card was going to put large amounts of text that wasn't needed in their programs.
In a modern environment, there is probably no reason not to include multiline string literals other than designer's preference. Grammatically speaking, it's probably simpler because you don't have to take linefeeds into consideration when parsing the string literal.
除了现有答案之外,您还可以使用 C++11 的原始字符串文字来解决此问题,例如:
现场演示。
尽管不规范,但此注释及其后面的示例在
[n3290: 2.14.5/5]
中用于补充语法中的指示,即产生式 r-char-sequence 可能包含换行符(而产生式 s-char-sequence 用于普通字符串文字,可能会包含换行符)不是)。In addition to the existing answers, you can work around this using C++11's raw string literals, e.g.:
Live demo.
Though non-normative, this note and the example that follows it in
[n3290: 2.14.5/5]
serve to complement the indication in the grammar that the productionr-char-sequence
may contain newlines (whereas the productions-char-sequence
, used for normal string literals, may not).其他人提到了一些出色的解决方法,我只是想解决原因。
原因很简单,C 语言是在处理能力非常重要的时代创建的,编译器必须简单且尽可能快。如今,如果 C 要更新(我正在看着你,C1X), 有可能完全按照您的意愿行事。然而,这不太可能。主要是由于历史原因;这样的更改可能需要对编译器进行大量重写,因此很可能会被拒绝。
Others have mentioned some excellent workarounds, I just wanted to address the reason.
The reason is simply that C was created at a time when processing was at a premium and compilers had to be simple and as fast as possible. These days, if C were to be updated (I'm looking at you, C1X), it's quite possible to do exactly what you want. It's unlikely, however. Mostly for historical reasons; such a change could require extensive rewrites of compilers, and so will likely be rejected.
C 预处理器逐行工作,但使用词汇标记。这意味着预处理器知道
"foo"
是一个标记。然而,如果 C 允许多行文字,预处理器就会遇到麻烦。考虑一下:预处理器无法干扰令牌的内部 - 但它是逐行运行的。那么,这个案子应该如何处理呢?简单的解决方案是完全禁止多行字符串。
The C preprocessor works on a line-by-line basis, but with lexical tokens. That means that the preprocessor understands that
"foo"
is a token. If C were to allow multi-line literals, however, the preprocessor would be in trouble. Consider:The preprocessor isn't able to mess with the inside of a token - but it's operating line-by-line. So how is it supposed to handle this case? The easy solution is to simply forbid multiline strings entirely.
实际上,您可以这样分解它:
相邻的字符串文字由编译器连接起来。
Actually, you can break it up thus:
Adjacent string literals are concatenated by the compiler.
字符串可以放在多行上,但每行必须单独引用:
Strings can lay on multiple lines, but each line has to be quoted individually :
您没有理由不能创建一种允许多行字符串的编程语言。
例如,Vedit Macro Language(VEDIT 文本编辑器的类 C 脚本语言)允许多行字符串,例如:
由您决定如何定义语言语法。
There is no reason why you couldn't create a programming language that allows multi-line strings.
For example, Vedit Macro Language (which is C-like scripting language for VEDIT text editor) allows multi-line strings, for example:
It is up to you how you define your language syntax.
您还可以执行以下操作:
将一个文字接一个放置,不带任何特殊字符。
You can also do:
Place one literal after another without any special chars.
文字声明不必是单行。
GPUImage 内联多行着色器代码。查看其 SHADER_STRING 宏。
Literal declarations doesn't have to be single-line.
GPUImage inlines multiline shader code. Checkout its SHADER_STRING macro.