使用最少的分隔符解析列表

发布于 2024-10-08 03:16:30 字数 570 浏览 9 评论 0原文

我有一种包含 4 种语句的语言：s00、s01、s10、s11，其中前导 1 表示初始关键字，尾随 1 表示终止，并且我有一个分隔符“;”。我可以用“;”终止任何语句。我想解析一种允许最少使用“;”的语句列表的语言。解析器是 Dypgen，它是 GLR+。

示例：

{ x=1 fun f(){} x=1; x=1 var x=1 var x=1; x=1 }

有可能做到这一点吗？如果是这样，怎么办？如果没有，为什么？

我相信这是做不到的，主要是因为我想不出怎么做:) 然而，它似乎确实是上下文相关的：规则是你必须插入一个“;”在 A 和 B 之间，如果 A 未终止且 B 未启动，则 B 和 C 也是如此，这意味着 B 被使用两次。

然而，由于解析器是 GLR+，因此很容易将其用作

(s00|s01|s10|s11}*

规则，如果解析错误，则会抛出“;” （这是 s11 无操作）来解决歧义。如果解析器能够报告语法错误那就更好了。也许这可以在合并替代产品时完成。真正的问题是当它们重叠而不是合并时：如果发生这种情况，程序解析可能会爆炸。

原文

I have a language with statements of 4 kinds: s00, s01, s10, s11 where a leading 1 means initial keyword, a trailing 1 means terminated, and I have a separator ";". I can terminate any statement with ";". I would like to parse a language allowing a list of statements which allows minimal use of ";". The parser is Dypgen which is GLR+.

Example:

{ x=1 fun f(){} x=1; x=1 var x=1 var x=1; x=1 }

Is it possible to do this at all? If so, how? If not, why?

I believe it can't be done, mainly because I can't think of how to do it :)
However it does seem context sensitive: the rule is you have to insert a ";" between A and B if A is not terminated and B is not initiated, ditto for B and C which means B is used twice.

However because the parser is GLR+ it is tempting to just use

(s00|s01|s10|s11}*

as the rule, and if it misparses throw in a ";" (which is an s11 no-op) to resolve the ambiguity. It would be nicer if the parser would report a syntax error though. Perhaps this could be done when merging alternate s productions. The real problem is when they overlap instead of merging: if this occurs a program parse could explode.

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

可是我不能没有你 2024-10-15 03:16:30

我最近在顶级短语中遇到了类似的问题，其中一些在前一个短语中需要终止 ;; ，而其他短语（以短语引入关键字开头）则不需要。我通过将短语的句法类别一分为二，并为表达这种行为的短语序列提供了精细的规则，解决了我的问题。但这导致了分裂语法的重复。

在你的情况下，它会是这样的：

sequence:
  | (s00 | s10) sequence_closed
  | (s01 | s11) sequence_open
  | ε

sequence_closed:
  | s10 sequence_closed
  | s11 sequence_open
  | ';' sequence_open
  | ε

sequence_open:
  | s00 sequence_closed
  | s01 sequence_open
  | ε

如果你想允许多余的分隔符（你很可能想要），那就有点复杂了，但这就是想法。

I've recently had a similar problem with toplevel phrases , some of them needing a terminating ;; in the previous phrase, and others (beginning with a phrase-introducing keyword) not. I've solved my problem by splitting the syntactic category of phrases in two, and giving fine rules to phrase sequences expressing this behaviour. But this resulted in duplication in the splitted grammar.

In your case it would be something like :

sequence:
  | (s00 | s10) sequence_closed
  | (s01 | s11) sequence_open
  | ε

sequence_closed:
  | s10 sequence_closed
  | s11 sequence_open
  | ';' sequence_open
  | ε

sequence_open:
  | s00 sequence_closed
  | s01 sequence_open
  | ε

It's a bit more complicated if you want to allow superfluous delimiters (and you most probably want to), but that's the idea.

回复收藏 0 原文

~没有更多了~