当前位置：文江博客话题详情

词法分析器如何处理注释和转义序列？

发布于 2024-10-20 06:42:05 字数 148 浏览 3 评论 0原文

注释和转义序列（例如字符串文字）与常规符号表示相比非常特殊。

对我来说很难理解常规词法分析器如何对它们进行标记。像 lex、flex 等词法分析器如何处理这种符号？有通用方法吗？或者只是针对每种语言的具体情况？

原文

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

憧憬巴黎街头的黎明 2024-10-27 06:42:05

我认为这 - 每种语言的具体情况 - 是正确的。
如果注释起始符存在于字符串文字中，则词法分析器必须忽略它。
类似地，在 C 中，如果转义双引号 \" 存在于字符串文字中，
词法分析器必须忽略它。
为此，flex 有启动条件。
这可以实现上下文分析。
例如，有一个C注释分析的例子（/*和*/之间）
在flex texinfo手册中：

<INITIAL>"/*"   BEGIN(IN_COMMENT);
<IN_COMMENT>{
"*/"            BEGIN(INITIAL);
[^*\n]+         /* eat comment in chunks */
"*"             /* eat the lone star */
\n              yylineno++;
}

开始条件还可以进行字符串文字分析。
有一个示例说明如何使用 start 来匹配 C 风格的带引号的字符串
项目开始条件中的条件，以及
还有一个常见问题解答项目标题为
如何在 C 样式引用字符串中扩展反斜杠转义序列？
在flex texinfo手册中。
也许这会直接回答您关于字符串文字的问题。

I think this - case by case for each language - is true.
If comment starter exists in a string literal, lexer has to ignore it.
Similarly, in C, if escaped double quote \" exists in a string literal,
lexer has to ignore it.
For this purpose, flex has start condition.
This enables contextual analysis.
For instance, there is an example for C comment analysis(between /* and */)
in flex texinfo manual:

<INITIAL>"/*"   BEGIN(IN_COMMENT);
<IN_COMMENT>{
"*/"            BEGIN(INITIAL);
[^*\n]+         /* eat comment in chunks */
"*"             /* eat the lone star */
\n              yylineno++;
}

Start condition also enables string literal analysis.
There is an example of how to match C-style quoted strings using start
conditions in the item Start Conditions, and
there is also FAQ item titled
How do I expand backslash-escape sequences in C-style quoted strings?
in flex texinfo manual.
Probably this will answer directly your question about string literal.

回复收藏 0 原文

ゝ杯具 2024-10-27 06:42:05

注释和转义序列（例如字符串文字）与常规符号表示相比非常特殊。

我不确定你的意思，但这个说法肯定是错误的。注释（除非它们可以嵌套）和带有转义序列的字符串都允许简单的常规语言描述。

例如，允许 \\、\"、\n 和 \r 的转义序列可以描述为以下正则语法（以 E 开头）：

E -> \ S
S -> \
S -> "
S -> n
S -> r
…

字符串只是零个或多个未转义符号或转义序列的重复（即两个正则语言上的 Kleene 闭包，其本身就是正则的）。

Comment and escape sequence (such as string literal) are very exceptional from regular symbolic representation.

I’m not sure what you mean but this statement is certainly wrong. Both comments (unless they may be nested) and strings with escape sequence admit a simple regular language description.

For example, an escape sequence allowing \\, \", \n and \r can be described by the following regular grammar (with start symbol E):

E -> \ S
S -> \
S -> "
S -> n
S -> r
…

And a string is just a repetition of zero or more unescaped symbols or escape sequences (i.e. a Kleene closure over two regular languages, which is itself regular).

回复收藏 0 原文