为什么必须 C/C++字符串文字声明是单行的吗?

发布于 2024-09-06 14:02:26 字数 425 浏览 3 评论 0原文

C++ 中不允许使用如下所示的多行字符串文字,是否有任何特殊原因?

string script =
"
      Some
   Formatted
 String Literal
";

我知道可以通过在每个换行符之前添加反斜杠来创建多行字符串文字。 我正在编写一种编程语言(类似于 C),并且希望能够轻松创建多行字符串(如上面的示例所示)。

是否有任何技术原因可以避免这种字符串文字?否则,我将不得不使用类似 python 的带有三引号的字符串文字(我不想这样做):

string script =
"""
      Some
   Formatted
 String Literal
""";

为什么 C/C++ 字符串文字声明必须是单行?

Is there any particular reason that multi-line string literals such as the following are not permitted in C++?

string script =
"
      Some
   Formatted
 String Literal
";

I know that multi-line string literals may be created by putting a backslash before each newline.
I am writing a programming language (similar to C) and would like to allow the easy creation of multi-line strings (as in the above example).

Is there any technical reason for avoiding this kind of string literal? Otherwise I would have to use a python-like string literal with a triple quote (which I don't want to do):

string script =
"""
      Some
   Formatted
 String Literal
""";

Why must C/C++ string literal declarations be single-line?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(10

冷默言语 2024-09-13 14:02:26

简洁的答案是“因为语法禁止多行字符串文字”。我不知道除了历史原因之外,是否还有其他充分的理由。

当然,有一些方法可以解决这个问题。您可以使用行拼接:

const char* script = "\
      Some\n\
   Formatted\n\
 String Literal\n\
";

如果 \ 作为行上的最后一个字符出现,则在预处理过程中将删除换行符。

或者,您可以使用字符串文字串联:

const char* script = 
"      Some\n"
"   Formatted\n"
" String Literal\n";

相邻的字符串文字在预处理期间被连接,因此它们在编译时最终将成为单个字符串文字。

使用任一技术,字符串文字最终都会像这样编写:

const char* script = "      Some\n   Formatted\n  String Literal\n";

The terse answer is "because the grammar prohibits multiline string literals." I don't know whether there is a good reason for this other than historical reasons.

There are, of course, ways around this. You can use line splicing:

const char* script = "\
      Some\n\
   Formatted\n\
 String Literal\n\
";

If the \ appears as the last character on the line, the newline will be removed during preprocessing.

Or, you can use string literal concatenation:

const char* script = 
"      Some\n"
"   Formatted\n"
" String Literal\n";

Adjacent string literals are concatenated during preprocessing, so these will end up as a single string literal at compile-time.

Using either technique, the string literal ends up as if it were written:

const char* script = "      Some\n   Formatted\n  String Literal\n";
风渺 2024-09-13 14:02:26

人们必须考虑到 C 并不是一种“应用程序”编程语言,而是一种系统编程语言。可以说它是专门为了重写 Unix 而设计的。考虑到这一点,当时没有 EMACS 或 VIM,您的用户界面是串行终端。在没有多行文本编辑器的系统上,多行字符串声明似乎有点毫无意义。此外,对于那些希望在特定时间点编写操作系统的人来说,字符串操作并不是主要关心的问题。传统的 UNIX 脚本工具集,例如 AWK 和 SED(以及许多其他工具)证明了他们没有使用 C 来进行重要的字符串操作。

其他注意事项:在 70 年代初期(编写 C 语言时),在打孔卡上提交程序并在第二天回来获取它们的情况并不少见。编译具有多行字符串文字的程序会消耗额外的处理时间吗?并不真地。实际上,编译器的工作量可以减少。但在大多数情况下,无论如何你都会在第二天回来拿它。但填写打孔卡的人不会在程序中放入大量不需要的文本。

在现代环境中,除了设计者的偏好之外,可能没有理由不包含多行字符串文字。从语法上来说,它可能更简单,因为在解析字符串文字时不必考虑换行。

One has to consider that C was not written to be an "Applications" programming language but a systems programming language. It would not be inaccurate to say it was designed expressly to rewrite Unix. With that in mind, there was no EMACS or VIM and your user interfaces were serial terminals. Multiline string declarations would seem a bit pointless on a system that did not have a multiline text editor. Furthermore, string manipulation would not be a primary concern for someone looking to write an OS at that particular point in time. The traditional set of UNIX scripting tools such as AWK and SED (amongst MANY others) are a testament to the fact they weren't using C to do significant string manipulation.

Additional considerations: it was not uncommon in the early 70s (when C was written) to submit your programs on PUNCH CARDS and come back the next day to get them. Would it have eaten up extra processing time to compile a program with multiline strings literals? Not really. It can actually be less work for the compiler. But you were going to come back for it the next day anyhow in most cases. But nobody who was filling out a punch card was going to put large amounts of text that wasn't needed in their programs.

In a modern environment, there is probably no reason not to include multiline string literals other than designer's preference. Grammatically speaking, it's probably simpler because you don't have to take linefeeds into consideration when parsing the string literal.

执着的年纪 2024-09-13 14:02:26

除了现有答案之外,您还可以使用 C++11 的原始字符串文字来解决此问题,例如:

#include <iostream>
#include <string>

int main() {
   std::string str = R"(a
b)";
   std::cout << str;
}

/* Output:
a
b
*/

现场演示。


[n3290: 2.14.5/4]: [ 注意: 原始字符串中的源文件换行符
文字结果在执行结果中产生换行
字符串文字。假设行首没有空格
下面的例子,断言就会成功:

const char *p = R"(a\
乙
c)”;
断言(std::strcmp(p, "a\\\nb\nc") == 0);

—尾注]

尽管不规范,但此注释及其后面的示例在 [n3290: 2.14.5/5] 中用于补充语法中的指示,即产生式 r-char-sequence 可能包含换行符(而产生式 s-char-sequence 用于普通字符串文字,可能会包含换行符)不是)。

In addition to the existing answers, you can work around this using C++11's raw string literals, e.g.:

#include <iostream>
#include <string>

int main() {
   std::string str = R"(a
b)";
   std::cout << str;
}

/* Output:
a
b
*/

Live demo.


[n3290: 2.14.5/4]: [ Note: A source-file new-line in a raw string
literal results in a new-line in the resulting execution
string-literal. Assuming no whitespace at the beginning of lines in
the following example, the assert will succeed:

const char *p = R"(a\
b
c)";
assert(std::strcmp(p, "a\\\nb\nc") == 0);

—end note ]

Though non-normative, this note and the example that follows it in [n3290: 2.14.5/5] serve to complement the indication in the grammar that the production r-char-sequence may contain newlines (whereas the production s-char-sequence, used for normal string literals, may not).

○愚か者の日 2024-09-13 14:02:26

其他人提到了一些出色的解决方法,我只是想解决原因

原因很简单,C 语言是在处理能力非常重要的时代创建的,编译器必须简单且尽可能快。如今,如果 C 要更新(我正在看着你,C1X), 有可能完全按照您的意愿行事。然而,这不太可能。主要是由于历史原因;这样的更改可能需要对编译器进行大量重写,因此很可能会被拒绝。

Others have mentioned some excellent workarounds, I just wanted to address the reason.

The reason is simply that C was created at a time when processing was at a premium and compilers had to be simple and as fast as possible. These days, if C were to be updated (I'm looking at you, C1X), it's quite possible to do exactly what you want. It's unlikely, however. Mostly for historical reasons; such a change could require extensive rewrites of compilers, and so will likely be rejected.

话少情深 2024-09-13 14:02:26

C 预处理器逐行工作,但使用词汇标记。这意味着预处理器知道 "foo" 是一个标记。然而,如果 C 允许多行文字,预处理器就会遇到麻烦。考虑一下:

"foo
#ifdef BAR
bar
#endif
baz"

预处理器无法干扰令牌的内部 - 但它是逐行运行的。那么,这个案子应该如何处理呢?简单的解决方案是完全禁止多行字符串。

The C preprocessor works on a line-by-line basis, but with lexical tokens. That means that the preprocessor understands that "foo" is a token. If C were to allow multi-line literals, however, the preprocessor would be in trouble. Consider:

"foo
#ifdef BAR
bar
#endif
baz"

The preprocessor isn't able to mess with the inside of a token - but it's operating line-by-line. So how is it supposed to handle this case? The easy solution is to simply forbid multiline strings entirely.

物价感观 2024-09-13 14:02:26

实际上,您可以这样分解它:

string script =
"\n"
"      Some\n"
"   Formatted\n"
" String Literal\n";

相邻的字符串文字由编译器连接起来。

Actually, you can break it up thus:

string script =
"\n"
"      Some\n"
"   Formatted\n"
" String Literal\n";

Adjacent string literals are concatenated by the compiler.

淡墨 2024-09-13 14:02:26

字符串可以放在多行上,但每行必须单独引用:

string script =
    "                \n"
    "       Some     \n"
    "    Formatted   \n"
    " String Literal ";

Strings can lay on multiple lines, but each line has to be quoted individually :

string script =
    "                \n"
    "       Some     \n"
    "    Formatted   \n"
    " String Literal ";
星星的轨迹 2024-09-13 14:02:26

我正在编写一种编程语言
(类似于C)并且想让
轻松编写多行字符串(例如
在上面的例子中)。

您没有理由不能创建一种允许多行字符串的编程语言。
例如,Vedit Macro Language(VEDIT 文本编辑器的类 C 脚本语言)允许多行字符串,例如:

Reg_Set(1,"
      Some
   Formatted
 String Literal
")

由您决定如何定义语言语法。

I am writing a programming language
(similar to C) and would like to let
write multi-line strings easily (like
in above example).

There is no reason why you couldn't create a programming language that allows multi-line strings.
For example, Vedit Macro Language (which is C-like scripting language for VEDIT text editor) allows multi-line strings, for example:

Reg_Set(1,"
      Some
   Formatted
 String Literal
")

It is up to you how you define your language syntax.

转瞬即逝 2024-09-13 14:02:26

您还可以执行以下操作:

string useMultiple =  "this" 
                      "is "
                      "a string in C."; 

将一个文字接一个放置,不带任何特殊字符。

You can also do:

string useMultiple =  "this" 
                      "is "
                      "a string in C."; 

Place one literal after another without any special chars.

一绘本一梦想 2024-09-13 14:02:26

文字声明不必是单行。

GPUImage 内联多行着色器代码。查看其 SHADER_STRING 宏。

Literal declarations doesn't have to be single-line.

GPUImage inlines multiline shader code. Checkout its SHADER_STRING macro.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文