树顶语法无限循环

发布于 2024-11-09 03:35:55 字数 520 浏览 0 评论 0原文

我的脑海中浮现出一些关于新编程语言的想法,所以我想尝试一下实现它。一位朋友建议我尝试使用 Treetop(Ruby gem)来创建一个解析器。 Treetop 的文档很少,而且我以前从未做过此类事情。

我的解析器的行为就像它有一个无限循环,但没有堆栈跟踪;事实证明很难追踪。有人能给我指出入门级解析/AST 指南的方向吗?我真的需要一些列出使用 Treetop 等工具的规则、常见用法等的东西。我的解析器语法位于 GitHub 上,以防有人希望提供帮助我改进它。

class {
  initialize = lambda (name) {
    receiver.name = name
  }

  greet = lambda {
    IO.puts("Hello, #{receiver.name}!")
  }
}.new(:World).greet()

I have had some ideas for a new programming language floating around in my head, so I thought I'd take a shot at implementing it. A friend suggested I try using Treetop (the Ruby gem) to create a parser. Treetop's documentation is sparse, and I've never done this sort of thing before.

My parser is acting like it has an infinite loop in it, but with no stack traces; it is proving difficult to track down. Can somebody point me in the direction of an entry-level parsing/AST guide? I really need something that list rules, common usage etc for using tools like Treetop. My parser grammer is on GitHub, in case someone wishes to help me improve it.

class {
  initialize = lambda (name) {
    receiver.name = name
  }

  greet = lambda {
    IO.puts("Hello, #{receiver.name}!")
  }
}.new(:World).greet()

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

软甜啾 2024-11-16 03:35:55

我要求treetop 将您的语言编译成.rb 文件。这给了我一些值得深入研究的东西:

$ tt -o /tmp/rip.rb /tmp/rip.treetop

然后我使用这个小存根重新创建循环:

require 'treetop'
load '/tmp/rip.rb'
RipParser.new.parse('')

这挂起了。现在,是不是很有趣!空字符串会重现该行为,就像您问题中的十几行示例一样。

为了找出它挂在哪里,我使用 Emacs 键盘宏来编辑 rip.rb,在每个方法的条目中添加调试语句。例如:

def _nt_root
  p [__LINE__, '_nt_root'] #DEBUG
  start_index = index

现在我们可以看到循环的范围:

[16, "root"]
[21, "_nt_root"]
[57, "_nt_statement"]
...
[3293, "_nt_eol"]
[3335, "_nt_semicolon"]
[3204, "_nt_comment"]
[57, "_nt_statement"]
[57, "_nt_statement"]
[57, "_nt_statement"]
...

从那里进一步调试可以发现整数可以为空字符串:

rule integer
   digit*
end

这间接允许语句为空字符串,并且顶级规则 语句* 永远消耗空语句。将 * 更改为 + 修复了循环,但揭示了另一个问题:

/tmp/rip.rb:777:in `_nt_object': stack level too deep (SystemStackError)
        from /tmp/rip.rb:757:in `_nt_compound_object'
        from /tmp/rip.rb:1726:in `_nt_range'
        from /tmp/rip.rb:1671:in `_nt_special_literals'
        from /tmp/rip.rb:825:in `_nt_literal_object'
        from /tmp/rip.rb:787:in `_nt_object'
        from /tmp/rip.rb:757:in `_nt_compound_object'
        from /tmp/rip.rb:1726:in `_nt_range'
        from /tmp/rip.rb:1671:in `_nt_special_literals'
         ... 3283 levels...

Range 是通过special_literals、literal_object、object 和compound_object 间接左递归的。 Treetop,当面对左递归时,会吃掉堆栈直到它呕吐。我没有快速解决该问题的方法,但至少从现在开始您可以进行堆栈跟踪。

另外,这不是您眼前的问题,但 digit 的定义很奇怪:它可以是一位数字,也可以是多个数字。这会导致 digit*digit+ 允许(可能)非法整数 1________2

I asked treetop to compile your language into an .rb file. That gave me something to dig into:

$ tt -o /tmp/rip.rb /tmp/rip.treetop

Then I used this little stub to recreate the loop:

require 'treetop'
load '/tmp/rip.rb'
RipParser.new.parse('')

This hangs. Now, isn't that interesting! An empty string reproduces the behavior just as well as the dozen-or-so-line example in your question.

To find out where it's hanging, I used an Emacs keyboard macro to edit rip.rb, adding a debug statement to the entry of each method. For example:

def _nt_root
  p [__LINE__, '_nt_root'] #DEBUG
  start_index = index

Now we can see the scope of the loop:

[16, "root"]
[21, "_nt_root"]
[57, "_nt_statement"]
...
[3293, "_nt_eol"]
[3335, "_nt_semicolon"]
[3204, "_nt_comment"]
[57, "_nt_statement"]
[57, "_nt_statement"]
[57, "_nt_statement"]
...

Further debugging from there reveals that an integer is allowed to be an empty string:

rule integer
   digit*
end

This indirectly allows a statement to be an empty string, and the top-level rule statement* to forever consume empty statements. Changing * to + fixes the loop, but reveals another problem:

/tmp/rip.rb:777:in `_nt_object': stack level too deep (SystemStackError)
        from /tmp/rip.rb:757:in `_nt_compound_object'
        from /tmp/rip.rb:1726:in `_nt_range'
        from /tmp/rip.rb:1671:in `_nt_special_literals'
        from /tmp/rip.rb:825:in `_nt_literal_object'
        from /tmp/rip.rb:787:in `_nt_object'
        from /tmp/rip.rb:757:in `_nt_compound_object'
        from /tmp/rip.rb:1726:in `_nt_range'
        from /tmp/rip.rb:1671:in `_nt_special_literals'
         ... 3283 levels...

Range is left-recursing, indirectly, via special_literals, literal_object, object, and compound_object. Treetop, when faced with left recursion, eats stack until it pukes. I don't have a quick fix for that problem, but at least you've got a stack trace to go from now.

Also, this is not your immediate problem, but the definition of digit is odd: It can either one digit, or multiple. This causes digit* or digit+ to allow the (presumably) illegal integer 1________2.

楠木可依 2024-11-16 03:35:55

我真的很喜欢 Parr 的语言实现模式;由于 Parr 创建了 ANTLR 解析器生成器,因此这是他在整本书中使用的工具,但它应该是足够简单,仍然可以从中学习。

我真正喜欢它的是每个示例都在前一个示例的基础上发展的方式;他并不是从一个巨大的支持 AST 的解析器开始,而是慢慢地引入需要越来越多的“后端智能”来完成工作的问题,因此这本书可以很好地适应需要解析的语言。

我希望它能更深入地介绍人们可以编写的语言类型,并在设计语言时就“应该做的事情”和“不应该做的事情”提供建议。我见过一些语言解析起来非常困难,我想更多地了解可以做出不同的设计决策。

I really enjoyed Language Implementation Patterns by Parr; since Parr created the ANTLR parser generator, it's the tool he uses throughout the book, but it should be simple enough to learn from it all the same.

What I really liked about it was the way each example grew upon the previous one; he doesn't start out with a gigantic AST-capable parser, instead he slowly introduces problems that need more and more 'backend smarts' to do the job, so the book scales well along with the language that needs parsing.

What I wish it covered in a little more depth is the types of languages that one can write and give advice on Do's and Do Not Do's when designing languages. I've seen some languages that are a huge pain to parse and I'd have liked to know more about the design decisions that could have been made differently.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文