该语法需要什么类型的解析器?
我有一个语法,除了我不相信该语法是 LL(1) 之外,我不知道需要什么类型的解析器来解析它。我想我需要一个带有回溯或 LL(*) 之类的解析器。我想出的语法(可能需要一些重写)是:
S: Rules
Rules: Rule | Rule Rules
Rule: id '=' Ids
Ids: id | Ids id
我试图生成的语言看起来像这样:
abc = def g hi jk lm
xy = aaa bbb ccc ddd eee fff jjj kkk
foo = bar ha ha
零个或多个规则,其中包含左标识符,后跟等号,后跟一个或多个标识符。我认为编写解析器时会遇到问题的部分是,语法允许规则中存在任意数量的 id,并且判断新规则何时开始的唯一方法是它何时找到 id =,这需要回溯。
有谁知道这个语法的分类以及手写解析器的最佳解析方法?
I have a grammar that I do not know what type of parser I need in order to parse it other than I do not believe the grammar is LL(1). I am thinking I need a parser with backtracking or LL(*) of some sort. The grammar I have came up with (which may need some rewriting) is:
S: Rules
Rules: Rule | Rule Rules
Rule: id '=' Ids
Ids: id | Ids id
The language I am trying to generate looks something like this:
abc = def g hi jk lm
xy = aaa bbb ccc ddd eee fff jjj kkk
foo = bar ha ha
Zero or more Rule that contain a left identifier followed by an equal sign followed by one or more identifers. The part that I think I will have a problem writing a parser for is that the grammar allows any amount of id in a Rule and that the only way to tell when a new Rule starts is when it finds id =, which would require backtracking.
Does anyone know the classification of this grammar and the best method of parsing, for a hand written parser?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
生成标识符后跟等号后跟有限标识符序列的语法是常规。这意味着可以使用 DFA 或正则表达式来解析该语言中的字符串。不需要花哨的非确定性或 LL(*) 解析器。
要查看该语言是否是正则语言,请设 Id = U {a : a ∈ Г},其中 Г ⊂ Σ 是符号集这可能出现在标识符中。您尝试生成的语言由正则表达式
设置 Γ = {a, b, ..., z},正则表达式语言中的字符串示例是:
不需要使用强大的解析技术来解析您的语言。这是使用正则表达式或 DFA 进行解析既合适又最佳的一种情况。
编辑:
调用上面的正则表达式R。要解析R*,请生成识别R*语言的DFA。为此,请使用从 Kleene 定理获得的算法生成一个识别 R* 语言的 NFA。然后使用子集构造将 NFA 转换为 DFA。生成的 DFA 将识别 R* 中的所有字符串。给定实现语言中构造的 DFA 的表示,所需的操作 - 例如,
可以将其编码到DFA的状态中。实际上,对于这样一种简单的语言来说,使用克莱恩定理和子集构造可能是不必要的。也就是说,您可能只需编写一个具有上述两个操作的解析器,而无需实现自动机。给定更复杂的常规语言(例如,编程语言的词法结构),转换将是最佳选择。
The grammar that generates an identifier followed by an equals sign followed by a finite sequence of identifiers is regular. This means that strings in the language can be parsed using a DFA or regular expression. No need for fancy nondeterministic or LL(*) parsers.
To see that the language is regular, let Id = U {a : a ∈ Γ}, where Γ ⊂ Σ is the set of symbols that can occur in identifiers. The language you are trying to generate is denoted by the regular expression
Setting Γ = {a, b, ..., z}, examples of strings in the language of the regular expression are:
There is no need to parse your language using powerful parsing techniques. This is one case where parsing using regular expressions or DFA is both appropriate and optimal.
edit:
Call the above regular expression R. To parse R*, generate a DFA recognizing the language of R*. To do this, generate an NFA recognizing the language of R* using the algorithm obtainable from Kleene's theorem. Then convert the NFA into a DFA using the subset construction. The resultant DFA will recognize all strings in R*. Given a representation of the constructed DFA in your implementation language, the required actions - for instance,
can be encoded into the states of the DFA. In reality, using Kleene's theorem and the subset construction is probably unnecessary for such a simple language. That is, you can probably just write a parser with the above two actions without implementing an automaton. Given a more complicated regular langauge (for instance, the lexical structure of a programming langauge), the conversion would be the best option.