C++ 中三字母序列的目的?

发布于 2024-07-30 10:02:52 字数 935 浏览 10 评论 0原文

根据 C++'03 标准 2.3/1:

在进行任何其他处理之前,每次出现以下三个字符序列(“三字符序列”)之一都将替换为表 1 中所示的单个字符。

<前><代码>-------------------------------------------------------- -------------------------------- | 三字母| 更换| 三字母| 更换| 三字母| 更换| -------------------------------------------------- -------------------------- | ?? = | #| ??( | [ | ??< | { | | ??/ | \ | ??) | ] | ??> | } | | ??' | ^ | ??! | | | ??- | ~ | -------------------------------------------------- --------------------------

在现实生活中,这意味着代码 printf(“What? ?!\n" ); 将导致打印 What| 因为 ??! 是一个三字母序列,被 |字符。

我的问题是使用三字母的目的是什么?使用三字母有什么实际好处吗?

UPD:答案中提到一些欧洲键盘没有所有标点符号,所以非美国程序员在日常生活中必须使用三字母?

UPD2:Visual Studio 2010 默认情况下关闭三字符组支持。

According to C++'03 Standard 2.3/1:

Before any other processing takes place, each occurrence of one of the following sequences of three characters (“trigraph sequences”) is replaced by the single character indicated in Table 1.

----------------------------------------------------------------------------
| trigraph | replacement | trigraph | replacement | trigraph | replacement |
----------------------------------------------------------------------------
| ??=      | #           | ??(      | [           | ??<      | {           |
| ??/      | \           | ??)      | ]           | ??>      | }           |
| ??’      | ˆ           | ??!      | |           | ??-      | ˜           |
----------------------------------------------------------------------------

In real life that means that code printf( "What??!\n" ); will result in printing What| because ??! is a trigraph sequence that is replaced with the | character.

My question is what purpose of using trigraphs? Is there any practical advantage of using trigraphs?

UPD: In answers was mentioned that some European keyboards don't have all the punctuation characters, so non-US programmers have to use trigraphs in everyday life?

UPD2: Visual Studio 2010 has trigraph support turned off by default.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(9

骑趴 2024-08-06 10:02:52

这个问题(关于密切相关的二合字母)有答案。

归根结底,ISO 646 字符集并不包含 C 语法的所有字符,因此有些系统的键盘和显示器无法处理这些字符(尽管我认为这些是相当罕见的)如今)。

一般来说,您不需要使用它们,但您需要了解它们以解决您遇到的问题。 三字母是“?”字符具有转义序列的原因:

'\?'

因此,可以避免示例问题的几种方法是:

 printf( "What?\?!\n" ); 

 printf( "What?" "?!\n" ); 

但是您必须记住何时键入两个“?” 你可能会开始一个三字母组的字符(这当然不是我正在考虑的事情)。

在实践中,三字母和二字母是我日常根本不用担心的事情。 但你应该意识到它们,因为每隔几年你就会遇到与它们相关的错误(并且你会花一天的剩余时间诅咒它们的存在)。 如果编译器可以配置为在遇到三字母或二字母时发出警告(或错误),那就太好了,这样我就可以知道我有一些我应该有意识地处理的东西。

为了完整起见,二合字母的危险性要小得多,因为它们被作为标记进行处理,因此字符串文字内的二合字母不会被解释为二合字母。

要获得有关 C/C++ 程序中标点符号的各种乐趣的良好教育(包括肯定会让我抓狂的三字符错误),请查看 Herb Sutter 的 GOTW #86 文章


附录:

默认情况下,GCC 似乎不会处理(并且会警告)三字母组。 其他一些编译器可以选择关闭三字符组支持(例如 IBM 的)。 Microsoft 开始在 VS2008 中支持必须显式启用的警告 (C4837)(使用 -Wall 或其他东西)。

This question (about the closely related digraphs) has the answer.

It boils down to the fact that the ISO 646 character set doesn't have all the characters of the C syntax, so there are some systems with keyboards and displays that can't deal with the characters (though I imagine that these are quite rare nowadays).

In general, you don't need to use them, but you need to know about them for exactly the problem you ran into. Trigraphs are the reason the the '?' character has an escape sequence:

'\?'

So a couple ways you can avoid your example problem are:

 printf( "What?\?!\n" ); 

 printf( "What?" "?!\n" ); 

But you have to remember when you're typing the two '?' characters that you might be starting a trigraph (and it's certainly never something I'm thinking about).

In practice, trigraphs and digraphs are something I don't worry about at all on a day-to-day basis. But you should be aware of them because once every couple years you'll run into a bug related to them (and you'll spend the rest of the day cursing their existance). It would be nice if compilers could be configured to warn (or error) when it comes across a trigraph or digraph, so I could know I've got something I should knowingly deal with.

And just for completeness, digraphs are much less dangerous since they get processed as tokens, so a digraph inside a string literal won't get interpreted as a digraph.

For a nice education on various fun with punctuation in C/C++ programs (including a trigraph bug that would defintinely have me pulling my hair out), take a look at Herb Sutter's GOTW #86 article.


Addendum:

It looks like GCC will not process (and will warn about) trigraphs by default. Some other compilers have options to turn off trigraph support (IBM's for example). Microsoft started supporting a warning (C4837) in VS2008 that must be explicitly enabled (using -Wall or something).

暗恋未遂 2024-08-06 10:02:52

今天孩子们! :-)

是的,国外设备,例如 IBM 3270 终端。 如果我没记错的话,3270 没有花括号! 如果您想在 IBM 迷你/大型机上编写 C,您必须为每个块边界使用糟糕的三字母组。 幸运的是,我只需用 C 语言编写软件来模拟一些 IBM 小型机设施,而不是在 System/36 上实际编写 C 软件。

查看“P”键旁边:

keyboard

嗯。 很难说。 “回车”旁边有一个额外的按钮,我可能会把它倒过来:也许是缺少“[”/“]”对。 无论如何,如果您必须编写 C,这个键盘会让您感到悲伤。

此外,这些终端显示 EBCDIC(IBM 的“本机”大型机字符集),而不是 ASCII(感谢 Pavel Minaev 的提醒)。

另一方面,就像 GNU C 指南所说:“你不需要这种脑损伤。” gcc 编译器默认禁用此“功能”。

Kids today! :-)

Yes, foreign equipment, such as an IBM 3270 terminal. The 3270 has, if I remember, no curly braces! If you wanted to write C on an IBM mini / mainframe, you had to use the wretched trigraphs for every block boundary. Fortunately, I only had to write software in C to emulate some IBM minicomputer facilities, not actually write C software on the System/36.

Look next to the "P" key:

keyboard

Hmmm. Hard to tell. There is an extra button next to "carriage return", and I might have it backwards: maybe it was the "[" / "]" pair that was missing. At any rate, this keyboard would cause you grief if you had to write C.

Also, these terminals display EBCDIC, IBM's "native" mainframe character set, not ASCII (thanks, Pavel Minaev, for the reminder).

On the other hand, like the GNU C guide says: "You don't need this brain damage." The gcc compiler leaves this "feature" disabled by default.

蓝海 2024-08-06 10:02:52

摘自《C++ 编程语言》特别版,第 829 页

ASCII 特殊字符 [, ], {, }, |\ 占据 ISO 指定为字母的字符集位置。 在大多数欧洲国家 ISO-646 字符集中,这些位置被英语字母表中未找到的字母占据。

提供了一组三字母组,允许使用真正标准的最小字符集以可移植的方式表达国家字符。 这对于程序的交换很有用,但它并不能让人们更容易地阅读程序。 当然,解决这个问题的长期方案是让 C++ 程序员获得既支持其母语又支持 C++ 的设备。 不幸的是,这对某些人来说似乎是不可行的,而且新设备的引入可能是一个令人沮丧的缓慢过程。

From The C++ Programming Language Special Edition, page 829

The ASCII special characters [, ], {, }, |, and \ occupy character set positions designated as alphabetic by ISO. In most European national ISO-646 character sets, these positions are occupied by letters not found in the English alphabet.

A set of trigraphs is provided to allow national characters to be expressed in a portable way using a truly standard minimal character set. This can be useful for interchange of programs, but it doesn't make it easier for people to read programs. Naturally, the long-term solution to this problem is for C++ programmers to get equipment that supports both their native language and C++ well. Unfortunately, this appears to be infeasible for some, and the introduction of new equipment can be a frustratingly slow process.

潦草背影 2024-08-06 10:02:52

它们用于缺少 C++ 基本字符集中某些字符的系统。 不用说,这样的系统非常罕见。

They are for use on systems that lack some of the characters in C++'s basic character set. Needless to say, such systems are exceedingly rare.

-黛色若梦 2024-08-06 10:02:52

已建议在 C++0x 中删除三字母组。 也就是说,似乎仍然有强有力的论据支持它们 - 请参阅 C++ 委员会论文 N2910 对此进行了讨论。 显然,EBCDIC 是需要它们的主要据点之一。

Trigraphs have been proposed for removal in C++0x. That said, there still seems to be strong argument in support of them - see C++ committee paper N2910 which discusses this. Apparently, EBCDIC is one major stronghold where they are needed.

半岛未凉 2024-08-06 10:02:52

我见过 90 年代初使用三字母来帮助将 PL/1 程序从大型机转换为在 PC 上运行/编译/调试。

他们尝试使用 PL/I 到 C 编译器在 PC 上编辑 PL/I,并且希望代码在移回不支持花括号的大型机时能够正常工作。 我建议他们可以使用类似

#def BEGIN {    
#def END }  

或 的宏作为更友好的 PL/I 替代方案

#def BEGIN ??<
#def END ??>

,如果他们真的想尝试一下

#ifdef MAINFRAME
    #def BEGIN ??<
    #def END ??>
#else
    #def BEGIN {    
    #def END }  
#endif

,那么程序看起来就像是用 Pascal 编写的。 他们只是用滑稽的眼神看着我,一整天都没有和我说话。 我不认为我责怪他们。 :)

杀死工作的不是三图,而是平台之间 IO 系统的差异。 在 PC 上打开文件与在大型机上打开文件有很大不同,以至于要在两者上运行相同的代码会引入太多的麻烦。

I've seen trigraphs used in the early '90s to help convert PL/1 programs from a mainframe to be run/compiled/debugged on a PC.

They were dabbling with editing PL/I on the PC using a PL/I to C compiler and they wanted the code to work when moved back to the mainframe which did not support curly braces. I suggested that they could use macros like

#def BEGIN {    
#def END }  

or as a friendlier PL/I alternative

#def BEGIN ??<
#def END ??>

and if they really wanted to get fancy they could try

#ifdef MAINFRAME
    #def BEGIN ??<
    #def END ??>
#else
    #def BEGIN {    
    #def END }  
#endif

and then the program would look like it was written in Pascal. They just looked at me funny and wouldn't speak to me for the rest of the day. I don't think I blame them. :)

What killed the effort what not the tri-graphs, it was the IO system differences between the platforms. Opening files on the PC was so much different than the mainframe it would have introduced way too many kludges to keep the same code running on both.

若水微香 2024-08-06 10:02:52

主要是因为 C 标准早在 1989 年就引入了它们,当时在某些机器上三字母映射到的字符存在问题。 到 1998 年 C++ 标准发布时,对三字母组的需求已经不大。 他们是C上的一个疣; 它们同样是 C++ 的一个缺点。 人们需要它们——尤其是在英语世界之外——这就是它们被添加到 C 中的原因。

Primarily because the C standard introduced them back in 1989, when there were issues with the presence of the characters that trigraphs map to on some machines. By the time the C++ standard was published in 1998, the need for trigraphs was not great. They are a wart on C; they are just as much a wart on C++. There was a need for them - especially outside the English-speaking world - which is why they were added to C.

网白 2024-08-06 10:02:52

一些欧洲键盘没有(没有?)美国键盘所具有的所有标点符号,因为它们需要按键来输入不寻常的字母字符。 例如(编造),瑞典键盘在花括号所在的位置有一个 A 形环。

为了适应这些用户,三字母组是一种仅使用最常见的 ASCII 字符输入标点符号的方法。

Some European keyboards don't (didn't?) have all the punctuation characters that US keyboards had, because they needed the keys for their unusual alphabetic characters. So for example (making this up), the Swedish keyboard would have A-ring where the curly brace was.

To accommodate those users, trigraphs are a way to enter punctuation using only the most common ASCII characters.

柳絮泡泡 2024-08-06 10:02:52

它们的存在主要是出于历史原因。 如今,大多数语言的大多数现代键盘都允许访问所有这些字符,但这曾经是一些欧洲键盘的问题。 这就是发明三字母组的原因。

如果您不知道它们的用途,则不应使用它们。

不过,了解它们仍然是件好事,因为您可能会无意中在代码中使用它们。

They are there mostly for historical reasons. Nowadays, most modern keyboards for most languages allow access to all those characters, but this used to be a problem once with some European keyboards. This is why trigraphs were invented.

If you don't know what they're for, you shouldn't use them.

It's still good to be aware of them, though, since you might accidentally and unintentionally use one in your code.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文