树顶语法无限循环

发布于 2024-11-09 03:35:55 字数 520 浏览 0 评论 0原文

我的脑海中浮现出一些关于新编程语言的想法，所以我想尝试一下实现它。一位朋友建议我尝试使用 Treetop（Ruby gem）来创建一个解析器。 Treetop 的文档很少，而且我以前从未做过此类事情。

我的解析器的行为就像它有一个无限循环，但没有堆栈跟踪；事实证明很难追踪。有人能给我指出入门级解析/AST 指南的方向吗？我真的需要一些列出使用 Treetop 等工具的规则、常见用法等的东西。我的解析器语法位于 GitHub 上，以防有人希望提供帮助我改进它。

class {
  initialize = lambda (name) {
    receiver.name = name
  }

  greet = lambda {
    IO.puts("Hello, #{receiver.name}!")
  }
}.new(:World).greet()

原文

I have had some ideas for a new programming language floating around in my head, so I thought I'd take a shot at implementing it. A friend suggested I try using Treetop (the Ruby gem) to create a parser. Treetop's documentation is sparse, and I've never done this sort of thing before.

My parser is acting like it has an infinite loop in it, but with no stack traces; it is proving difficult to track down. Can somebody point me in the direction of an entry-level parsing/AST guide? I really need something that list rules, common usage etc for using tools like Treetop. My parser grammer is on GitHub, in case someone wishes to help me improve it.

class {
  initialize = lambda (name) {
    receiver.name = name
  }

  greet = lambda {
    IO.puts("Hello, #{receiver.name}!")
  }
}.new(:World).greet()

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

软甜啾 2024-11-16 03:35:55

我要求treetop 将您的语言编译成.rb 文件。这给了我一些值得深入研究的东西：

$ tt -o /tmp/rip.rb /tmp/rip.treetop

然后我使用这个小存根重新创建循环：

require 'treetop'
load '/tmp/rip.rb'
RipParser.new.parse('')

这挂起了。现在，是不是很有趣！空字符串会重现该行为，就像您问题中的十几行示例一样。

为了找出它挂在哪里，我使用 Emacs 键盘宏来编辑 rip.rb，在每个方法的条目中添加调试语句。例如：

def _nt_root
  p [__LINE__, '_nt_root'] #DEBUG
  start_index = index

现在我们可以看到循环的范围：

[16, "root"]
[21, "_nt_root"]
[57, "_nt_statement"]
...
[3293, "_nt_eol"]
[3335, "_nt_semicolon"]
[3204, "_nt_comment"]
[57, "_nt_statement"]
[57, "_nt_statement"]
[57, "_nt_statement"]
...

从那里进一步调试可以发现整数可以为空字符串：

rule integer
   digit*
end

这间接允许语句为空字符串，并且顶级规则 语句* 永远消耗空语句。将 * 更改为 + 修复了循环，但揭示了另一个问题：

/tmp/rip.rb:777:in `_nt_object': stack level too deep (SystemStackError)
        from /tmp/rip.rb:757:in `_nt_compound_object'
        from /tmp/rip.rb:1726:in `_nt_range'
        from /tmp/rip.rb:1671:in `_nt_special_literals'
        from /tmp/rip.rb:825:in `_nt_literal_object'
        from /tmp/rip.rb:787:in `_nt_object'
        from /tmp/rip.rb:757:in `_nt_compound_object'
        from /tmp/rip.rb:1726:in `_nt_range'
        from /tmp/rip.rb:1671:in `_nt_special_literals'
         ... 3283 levels...

Range 是通过special_literals、literal_object、object 和compound_object 间接左递归的。 Treetop，当面对左递归时，会吃掉堆栈直到它呕吐。我没有快速解决该问题的方法，但至少从现在开始您可以进行堆栈跟踪。

另外，这不是您眼前的问题，但 digit 的定义很奇怪：它可以是一位数字，也可以是多个数字。这会导致 digit* 或 digit+ 允许（可能）非法整数 1________2。

I asked treetop to compile your language into an .rb file. That gave me something to dig into:

$ tt -o /tmp/rip.rb /tmp/rip.treetop

Then I used this little stub to recreate the loop:

require 'treetop'
load '/tmp/rip.rb'
RipParser.new.parse('')

This hangs. Now, isn't that interesting! An empty string reproduces the behavior just as well as the dozen-or-so-line example in your question.

To find out where it's hanging, I used an Emacs keyboard macro to edit rip.rb, adding a debug statement to the entry of each method. For example:

def _nt_root
  p [__LINE__, '_nt_root'] #DEBUG
  start_index = index

Now we can see the scope of the loop:

[16, "root"]
[21, "_nt_root"]
[57, "_nt_statement"]
...
[3293, "_nt_eol"]
[3335, "_nt_semicolon"]
[3204, "_nt_comment"]
[57, "_nt_statement"]
[57, "_nt_statement"]
[57, "_nt_statement"]
...

Further debugging from there reveals that an integer is allowed to be an empty string:

rule integer
   digit*
end

This indirectly allows a statement to be an empty string, and the top-level rule statement* to forever consume empty statements. Changing * to + fixes the loop, but reveals another problem:

/tmp/rip.rb:777:in `_nt_object': stack level too deep (SystemStackError)
        from /tmp/rip.rb:757:in `_nt_compound_object'
        from /tmp/rip.rb:1726:in `_nt_range'
        from /tmp/rip.rb:1671:in `_nt_special_literals'
        from /tmp/rip.rb:825:in `_nt_literal_object'
        from /tmp/rip.rb:787:in `_nt_object'
        from /tmp/rip.rb:757:in `_nt_compound_object'
        from /tmp/rip.rb:1726:in `_nt_range'
        from /tmp/rip.rb:1671:in `_nt_special_literals'
         ... 3283 levels...

Range is left-recursing, indirectly, via special_literals, literal_object, object, and compound_object. Treetop, when faced with left recursion, eats stack until it pukes. I don't have a quick fix for that problem, but at least you've got a stack trace to go from now.

Also, this is not your immediate problem, but the definition of digit is odd: It can either one digit, or multiple. This causes digit* or digit+ to allow the (presumably) illegal integer 1________2.

回复收藏 0 原文