关于词法分析的问题
我正在看龙书。引用书中的文字(3.1.4 词汇错误,Pno 114)
词法分析器很难 告诉,无需其他人的帮助 组件,有一个 源代码错误。例如,如果 遇到字符串
fi
第一次在 C 程序中 上下文:fi ( a == f(x) ) ...
词法分析器无法判断是否
fi
是关键字的拼写错误if
或未声明的函数 标识符。由于fi
是有效的 lexeme 为 token id,词法 分析器必须将令牌id
返回到 解析器并让其他一些阶段 编译器 - 可能是解析器 这种情况 - 处理由于以下原因导致的错误 字母的换位。
读完这篇文章后我有点困惑。我的理解是词法分析器开始从左到右处理文本,并在模式匹配时返回标记。那么对于以 if
为要匹配关键字的语言,fi
如何匹配呢?
有什么想法吗?
I am reading the dragon book. Quoting the text from the book (3.1.4 Lexical errors, Pno 114)
It is hard for a lexical analyzer to
tell, without the aid of other
components, that there is a
source-code error. For instance, if
the stringfi
is encountered for the
first time in a C program in the
context:fi ( a == f(x) ) ...
a lexical analyzer cannot tell whether
fi
is a misspelling of the keywordif
or an undeclared function
identifier. Sincefi
is a valid
lexeme for the token id, the lexical
analyzer must return the tokenid
to
the parser and let some other phase of
the compiler - probably the parser in
this case - handle an error due to
transposition of the letters.
I am bit confused after reading this. My understanding was lexical analyser starts processing the text from left to right and return tokens whenever the pattern matches. So for a language where if
is the keyword to match, how can fi
match?
Any thoughts?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(4)
它与
if
标记不匹配,而是与id
标记匹配,后者代表“标识符”。如果没有关键字匹配,则它是包罗万象的。词法分析器不知道在某些位置“期望”什么。它只返回标记,解析器会知道它期望什么。 AC 解析器必须接受以下语句,例如,这是一个函数调用It doesn't match the
if
token, but theid
token, which stands for "identifier". It's the catch-all if no keyword matches. The lexical analyser doesn't know what to "expect" at certain positions. It just returns tokens, and the parser will know what it expects. A C parser has to accept the following statement, for example, which is a function call您必须区分语法分析和词法分析。
词法分析的任务是将字符序列转换为标记字符串。可以有各种类型的标记,例如标识符、加法运算符、语句结束运算符等。如果词法分析遇到与任何标记都不对应的文本字符串,则词法分析只会失败并出现错误。在您的情况下
fi ( a == f(x) ) ...
将转换为; <左括号> <标识符> <平等> <标识符> <左括号> <标识符> <右括号>
.....一旦生成了标记字符串,就会执行语法分析。这通常涉及从标记构建某种语法树。解析器知道该语言允许的所有有效语句的形式。如果解析器找不到允许上述标记序列的语法规则,它将失败。
You must make a distinction between syntax analysis and lexical analysis.
The task of lexical analysis is to convert a sequence of characters into a string of tokens. There can be various types of tokens, ex IDENTIFIER, ADDITION OPERATOR, END OF STATEMENT OPERATOR, etc. Lexical analysis can only fail with an error if it encounters a string of text which doesn't correspond to any token. In your case
fi ( a == f(x) ) ...
would translate to<IDENTIFIER> <LEFT BRACKET> <IDENTIFIER> <EQUALITY> <IDENTIFIER> <LEFT BRACKET> <IDENTIFIER> <RIGHT BRACKET> <RIGHT BRACKET>
.....Once a string of tokens have been generated, syntax analysis is performed. This typically involves constructing some sort of syntax tree from the tokens. The parser is aware of all the forms of valid statements that are allowed in the language. If the parser cannot find a syntax rule allowing the above sequence of tokens, it will fail.
您如何判断
if
是否是给定点的唯一预期输入?与
fi
可能是函数调用。例如,上面的可能拼写错误:其中
fi
是一个采用int
并返回void
的函数>。How would you tell if
if
was the only expected input at a given point?vs.
fi
could be a function call. For example, the above could be a mis-spelling of:where
fi
is a function takingint
and returningvoid
.对于词法分析错误的解释来说,这是一个糟糕的例子。这段文字试图告诉您的是,编译器无法识别您拼写错误的“if”关键字(反写)。它只看到“fi”,例如一个有效的变量名称,因此将 id(例如)“VARIABLE”返回给解析器。解析器随后意识到语法错误。
它与从左到右或从右到左无关。编译器当然是从左到右读取源代码的。正如我所说 - 这个解释的关键字选择很糟糕。
This is a poor choice of example for a lexical analysis error explanation. What this text tries to tell you is, that the compiler cannot recognize you misspelled the "if" keyword (wrote it backwards). It just sees "fi" which is for example a valid variable name and so returns the id (for example) "VARIABLE" to the parser. The parser then later realizes the syntax error.
It has nothing to do with going left-to-right or right-to-left. The compiler of course reads the source code from left-to-right. As I said - a poor choice of keyword for this explanation.