Escape(\) 字符背后的魔力是什么
C/C++编译器如何操作源代码中的转义字符["\"]? 编译器语法是如何编写来处理该字符的? 编译器遇到该字符后会做什么?
How does the C/C++ compiler manipulate the escape character ["\"] in source code? How is compiler grammar written for processing that character? What does the compiler do after encountering that character?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(4)
大多数编译器分为几个部分:编译器前端称为词法分析器或扫描器。 编译器的这一部分读取实际字符并创建标记。 它有一个状态机,在看到转义字符时决定它是否是真实的(例如当它出现在字符串中时)或修改下一个字符。 该标记相应地作为转义字符或其他一些标记(例如制表符或换行符)输出到编译器的下一部分(解析器)。 状态机可以将多个字符分组为一个令牌。
Most compilers are divided into parts: the compiler front-end is called a lexical analyzer or a scanner. This part of the compiler reads the actual characters and creates tokens. It has a state machine which decides, upon seeing an escape character, whether it is genuine (for example when it appears inside a string) or it modifies the next character. The token is output accordingly as the escape character or some other token (such as a tab or a newline) to the next part of the compiler (the parser). The state machine can group several characters into a token.
关于这个主题的一个有趣的注释是关于信任信任[ PDF 链接]。
该论文描述了编译器可以准确处理此问题的一种方法,展示了 c-writing-in-c 编译器如何不将代码显式转换为 ASCII 值; 以及如何将新的转义代码引导到编译器中,以便隐式地理解新代码的 ASCII 值。
An interesting note on this subject is On Trusting Trust [PDF link].
The paper describes one way a compiler could handle this problem exactly, shows how the c-written-in-c compiler does not have an explicit translation of the codes into ASCII values; and how to bootstrap a new escape code into the compiler so that the understanding of the ASCII value for the new code is also implicit.
它通常转义以下字符:
\a
表示“警报”(闪烁终端、蜂鸣声或其他),\n
表示“换行”,\xNUM
表示十六进制数例如。It generally escapes the following character:
\a
means 'alert' (flashing the terminal, beeping or whatever),\n
means 'linefeed',\xNUM
means an hexadecimal number for example.带有后续字符的转义字符(如
\n
)对于 C 编译器来说是单个字符 - 扫描器将其作为字符标记呈现给解析器,因此解析器中不需要特殊的语法规则来转义字符。escape character with a following character (like
\n
) is a single character for C compiler - scanner presents it to parser as character token, so there is no need in special syntax rules in parser for escape character.