关于词法分析的问题

发布于 2024-09-09 06:24:02 字数 561 浏览 3 评论 0原文

我正在看龙书。引用书中的文字(3.1.4 词汇错误,Pno 114)

词法分析器很难 告诉,无需其他人的帮助 组件,有一个 源代码错误。例如,如果 遇到字符串 fi 第一次在 C 程序中 上下文:

fi ( a == f(x) ) ...

词法分析器无法判断是否 fi 是关键字的拼写错误 if 或未声明的函数 标识符。由于 fi 是有效的 lexeme 为 token id,词法 分析器必须将令牌id返回到 解析器并让其他一些阶段 编译器 - 可能是解析器 这种情况 - 处理由于以下原因导致的错误 字母的换位。

读完这篇文章后我有点困惑。我的理解是词法分析器开始从左到右处理文本,并在模式匹配时返回标记。那么对于以 if 为要匹配关键字的语言,fi 如何匹配呢?

有什么想法吗?

I am reading the dragon book. Quoting the text from the book (3.1.4 Lexical errors, Pno 114)

It is hard for a lexical analyzer to
tell, without the aid of other
components, that there is a
source-code error. For instance, if
the string fi is encountered for the
first time in a C program in the
context:

fi ( a == f(x) ) ...

a lexical analyzer cannot tell whether
fi is a misspelling of the keyword
if or an undeclared function
identifier. Since fi is a valid
lexeme for the token id, the lexical
analyzer must return the token id to
the parser and let some other phase of
the compiler - probably the parser in
this case - handle an error due to
transposition of the letters.

I am bit confused after reading this. My understanding was lexical analyser starts processing the text from left to right and return tokens whenever the pattern matches. So for a language where if is the keyword to match, how can fi match?

Any thoughts?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(4

深海里的那抹蓝 2024-09-16 06:24:02

它与 if 标记不匹配,而是与 id 标记匹配,后者代表“标识符”。如果没有关键字匹配,则它是包罗万象的。词法分析器不知道在某些位置“期望”什么。它只返回标记,解析器会知道它期望什么。 AC 解析器必须接受以下语句,例如,这是一个函数调用

fi ( a  == f(x) );

It doesn't match the if token, but the id token, which stands for "identifier". It's the catch-all if no keyword matches. The lexical analyser doesn't know what to "expect" at certain positions. It just returns tokens, and the parser will know what it expects. A C parser has to accept the following statement, for example, which is a function call

fi ( a  == f(x) );
假扮的天使 2024-09-16 06:24:02

您必须区分语法分析和词法分析。

  • 词法分析的任务是将字符序列转换为标记字符串。可以有各种类型的标记,例如标识符、加法运算符、语句结束运算符等。如果词法分析遇到与任何标记都不对应的文本字符串,则词法分析只会失败并出现错误。在您的情况下 fi ( a == f(x) ) ... 将转换为 ; <左括号> <标识符> <平等> <标识符> <左括号> <标识符> <右括号> .....

  • 一旦生成了标记字符串,就会执行语法分析。这通常涉及从标记构建某种语法树。解析器知道该语言允许的所有有效语句的形式。如果解析器找不到允许上述标记序列的语法规则,它将失败。

You must make a distinction between syntax analysis and lexical analysis.

  • The task of lexical analysis is to convert a sequence of characters into a string of tokens. There can be various types of tokens, ex IDENTIFIER, ADDITION OPERATOR, END OF STATEMENT OPERATOR, etc. Lexical analysis can only fail with an error if it encounters a string of text which doesn't correspond to any token. In your case fi ( a == f(x) ) ... would translate to <IDENTIFIER> <LEFT BRACKET> <IDENTIFIER> <EQUALITY> <IDENTIFIER> <LEFT BRACKET> <IDENTIFIER> <RIGHT BRACKET> <RIGHT BRACKET> .....

  • Once a string of tokens have been generated, syntax analysis is performed. This typically involves constructing some sort of syntax tree from the tokens. The parser is aware of all the forms of valid statements that are allowed in the language. If the parser cannot find a syntax rule allowing the above sequence of tokens, it will fail.

謸气贵蔟 2024-09-16 06:24:02

您如何判断 if 是否是给定点的唯一预期输入?

int a = 42;
if (a == 42)
    puts("ok");

int a = 42;
fi (a == 42)
    puts("ok");

fi 可能是函数调用。例如,上面的可能拼写错误:

int a = 42;
fi(a == 42);
puts("ok");

其中fi是一个采用int并返回void的函数>。

How would you tell if if was the only expected input at a given point?

int a = 42;
if (a == 42)
    puts("ok");

vs.

int a = 42;
fi (a == 42)
    puts("ok");

fi could be a function call. For example, the above could be a mis-spelling of:

int a = 42;
fi(a == 42);
puts("ok");

where fi is a function taking int and returning void.

前事休说 2024-09-16 06:24:02

对于词法分析错误的解释来说,这是一个糟糕的例子。这段文字试图告诉您的是,编译器无法识别您拼写错误的“if”关键字(反写)。它只看到“fi”,例如一个有效的变量名称,因此将 id(例如)“VARIABLE”返回给解析器。解析器随后意识到语法错误。

它与从左到右或从右到左无关。编译器当然是从左到右读取源代码的。正如我所说 - 这个解释的关键字选择很糟糕。

This is a poor choice of example for a lexical analysis error explanation. What this text tries to tell you is, that the compiler cannot recognize you misspelled the "if" keyword (wrote it backwards). It just sees "fi" which is for example a valid variable name and so returns the id (for example) "VARIABLE" to the parser. The parser then later realizes the syntax error.

It has nothing to do with going left-to-right or right-to-left. The compiler of course reads the source code from left-to-right. As I said - a poor choice of keyword for this explanation.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文