该语法需要什么类型的解析器?

发布于 2024-12-09 23:23:49 字数 481 浏览 0 评论 0原文

我有一个语法,除了我不相信该语法是 LL(1) 之外,我不知道需要什么类型的解析器来解析它。我想我需要一个带有回溯或 LL(*) 之类的解析器。我想出的语法(可能需要一些重写)是:

S: Rules
Rules: Rule | Rule Rules
Rule: id '=' Ids
Ids: id | Ids id

我试图生成的语言看起来像这样:

abc = def g hi jk lm
xy = aaa bbb ccc ddd eee fff jjj kkk
foo = bar ha ha 

零个或多个规则,其中包含左标识符,后跟等号,后跟一个或多个标识符。我认为编写解析器时会遇到问题的部分是,语法允许规则中存在任意数量的 id,并且判断新规则何时开始的唯一方法是它何时找到 id =,这需要回溯。

有谁知道这个语法的分类以及手写解析器的最佳解析方法?

I have a grammar that I do not know what type of parser I need in order to parse it other than I do not believe the grammar is LL(1). I am thinking I need a parser with backtracking or LL(*) of some sort. The grammar I have came up with (which may need some rewriting) is:

S: Rules
Rules: Rule | Rule Rules
Rule: id '=' Ids
Ids: id | Ids id

The language I am trying to generate looks something like this:

abc = def g hi jk lm
xy = aaa bbb ccc ddd eee fff jjj kkk
foo = bar ha ha 

Zero or more Rule that contain a left identifier followed by an equal sign followed by one or more identifers. The part that I think I will have a problem writing a parser for is that the grammar allows any amount of id in a Rule and that the only way to tell when a new Rule starts is when it finds id =, which would require backtracking.

Does anyone know the classification of this grammar and the best method of parsing, for a hand written parser?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

·深蓝 2024-12-16 23:23:49

生成标识符后跟等号后跟有限标识符序列的语法是常规。这意味着可以使用 DFA 或正则表达式来解析该语言中的字符串。不需要花哨的非确定性或 LL(*) 解析器。

要查看该语言是否是正则语言,请设 Id = U {a : a ∈ Г},其中 Г ⊂ Σ 是符号集这可能出现在标识符中。您尝试生成的语言由正则表达式

  • Id+ =( Id+)* Id+

设置 Γ = {a, b, ..., z},正则表达式语言中的字符串示例是:

  • 看 = 我是常规语言
  • 嘿 = 这意味着我可以被 dfa 识别
  • 很酷 = 甚至是正则表达式

不需要使用强大的解析技术来解析您的语言。这是使用正则表达式或 DFA 进行解析既合适又最佳的一种情况。

编辑:

调用上面的正则表达式R。要解析R*,请生成识别R*语言的DFA。为此,请使用从 Kleene 定理获得的算法生成一个识别 R* 语言的 NFA。然后使用子集构造将 NFA 转换为 DFA。生成的 DFA 将识别 R* 中的所有字符串。给定实现语言中构造的 DFA 的表示,所需的操作 - 例如,

  • 将解析的最后一个标识符添加到正在解析的当前声明语句的右侧将解析的
  • 最后一个声明语句添加到已解析声明列表中,并使用最后解析的标识符开始解析新的声明语句,

可以将其编码到DFA的状态中。实际上,对于这样一种简单的语言来说,使用克莱恩定理和子集构造可能是不必要的。也就是说,您可能只需编写一个具有上述两个操作的解析器,而无需实现自动机。给定更复杂的常规语言(例如,编程语言的词法结构),转换将是最佳选择。

The grammar that generates an identifier followed by an equals sign followed by a finite sequence of identifiers is regular. This means that strings in the language can be parsed using a DFA or regular expression. No need for fancy nondeterministic or LL(*) parsers.

To see that the language is regular, let Id = U {a : a ∈ Γ}, where Γ ⊂ Σ is the set of symbols that can occur in identifiers. The language you are trying to generate is denoted by the regular expression

  • Id+ =( Id+)* Id+

Setting Γ = {a, b, ..., z}, examples of strings in the language of the regular expression are:

  • look = i am in a regular language
  • hey = that means i can be recognized by a dfa
  • cool = or even a regular expression

There is no need to parse your language using powerful parsing techniques. This is one case where parsing using regular expressions or DFA is both appropriate and optimal.

edit:

Call the above regular expression R. To parse R*, generate a DFA recognizing the language of R*. To do this, generate an NFA recognizing the language of R* using the algorithm obtainable from Kleene's theorem. Then convert the NFA into a DFA using the subset construction. The resultant DFA will recognize all strings in R*. Given a representation of the constructed DFA in your implementation language, the required actions - for instance,

  • Add the last identifier parsed to the right-hand side of the current declaration statement being parsed
  • Add the last declaration statement parsed to a list of parsed declarations, and use the last identifier parsed to begin parsing a new declaration statement

can be encoded into the states of the DFA. In reality, using Kleene's theorem and the subset construction is probably unnecessary for such a simple language. That is, you can probably just write a parser with the above two actions without implementing an automaton. Given a more complicated regular langauge (for instance, the lexical structure of a programming langauge), the conversion would be the best option.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文