识别 Treetop 语法中的 Ruby 代码

发布于 2024-09-29 14:56:12 字数 932 浏览 5 评论 0原文

我正在尝试使用 Treetop 来解析 ERB 文件。我需要能够处理如下行:

<% ruby_code_here %>
<%= other_ruby_code %>

由于 Treetop 是用 Ruby 编写的,并且您用 Ruby 编写 Treetop 语法,Treetop 中是否已经有某种现有的方式来表示“嘿,在这里查找 Ruby 代码,然后给我它的代码”崩溃”而无需我编写单独的规则来处理 Ruby 语言的所有部分?我正在寻找一种方法,在我的 .treetop 语法文件中,具有类似以下内容:

rule erb_tag
  "<%" ruby_code "%>" {
    def content
      ...
    end
  }
end

其中 ruby_code 由 Treetop 提供的一些规则处理。

编辑:其他人使用Ruby-lex解析ERB,但我在尝试重现他所做的事情时遇到了错误。 rlex 程序在生成解析器类时并未生成完整的类。

编辑:是的,所以你们都很沮丧,但谢谢你的信息。 :) 对于我的硕士项目,我正在编写一个测试用例生成器,需要使用 ERB 作为输入。幸运的是,就我的目的而言,我只需要识别 ERB 代码中的一些内容,例如 if 语句和其他条件以及循环。我想我可以想出 Treetop 语法来匹配它,但需要注意的是它对于 Ruby 来说并不完整。

I'm trying to use Treetop to parse an ERB file. I need to be able to handle lines like the following:

<% ruby_code_here %>
<%= other_ruby_code %>

Since Treetop is written in Ruby, and you write Treetop grammars in Ruby, is there already some existing way in Treetop to say "hey, look for Ruby code here, and give me its breakdown" without me having to write out separate rules to handle all parts of the Ruby language? I'm looking for a way, in my .treetop grammar file, to have something like:

rule erb_tag
  "<%" ruby_code "%>" {
    def content
      ...
    end
  }
end

Where ruby_code is handled by some rules that Treetop provides.

Edit: someone else parsed ERB using Ruby-lex, but I got errors trying to reproduce what he did. The rlex program did not produce a full class when it generated the parser class.

Edit: right, so you lot are depressing, but thanks for the info. :) For my Master's project, I'm writing a test case generator that needs to work with ERB as input. Fortunately, for my purposes, I only need to recognize a few things in the ERB code, such as if statements and other conditionals as well as loops. I think I can come up with Treetop grammar to match that, with the caveat that it isn't complete for Ruby.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(4

涙—继续流 2024-10-06 14:56:12

据我所知,还没有人为 Ruby 创建 Treetop 语法。 (事实上​​,除了 MRI 和 YARV 附带的 YACC 语法之外,没有人能够为 Ruby 创建任何语法。)我知道 Treetop 的作者多年来一直在研究一种语法。 ,但这不是一项微不足道的任务。正确使用 XRuby 的 ANTLR 语法花了大约 5 年的时间,但它仍然不完全兼容。

Ruby 的语法极其、极其复杂。

As far as I know, nobody has yet created a Treetop grammar for Ruby. (In fact, nobody has ever been able to create any grammar for Ruby other than the YACC grammar that ships with MRI and YARV.) I know that the author of Treetop has been working on one for several years, but it's not a trivial undertaking. Getting the ANTLR grammar which is used in XRuby right took about 5 years, and it is still not fully compliant.

Ruby's syntax is insanely, mindbogglingly complex.

却一份温柔 2024-10-06 14:56:12

不,


我不这么认为。在 Treetop 中指定复杂而微妙的 Ruby 语法将是一项重大成就,但这应该是可能的。

实际的 ruby​​ 语法是用 yacc 编写的。现在,yacc 是一个传奇工具,但 treetop 生成了更强大的解析器类,所以它应该是可能的,也许有人已经做到了。

这不是一个下午的项目。

No


I don't think so. Specifying the complex and subtle Ruby grammar in treetop would be a major accomplishment, but it should be possible.

The actual ruby grammer is written in yacc. Now, yacc is a legendary tool but treetop generates a more powerful class of parsers, so it should be possible and perhaps someone has done it.

It's not an afternoon project.

絕版丫頭 2024-10-06 14:56:12

也许我在开玩笑,但如果 yacc 没有 ruby​​ 复杂,那么您可以在 Treetop 中实现 yacc,它使用为 yacc 创建的 ruby​​ 语法。

May be I'm kidding but if yacc is less complex than ruby then you could realize yacc in treetop which than uses the ruby grammar created for yacc.

压抑⊿情绪 2024-10-06 14:56:12

出于您的目的,您可能无需解析所有 Ruby 就可以摆脱困境。您真正需要的是一种检测 %> 的方法。关闭 Ruby 块。如果您不想在 Ruby 代码包含这些结束字符时失败,则必须检测这些字符可能出现在 Ruby 文本中的任何位置;这意味着您需要检测所有形式的文字。

然而,出于您的目的,您可能可以摆脱识别 %> 最有可能的情况。会出现在 Ruby 文本中,并且忽略这些情况。当然,这假设任何剩余的故障都可以通过让用户以稍微不同的方式编写 ERB 来处理。

值得一提的是,Treetop 本身以这种方式“解析”Ruby 块;它只计算 { 和 } 字符,直到找到结束字符。因此,如果您的块在文字字符串中包含 },则说明您已损坏(但您可以通过在注释中包含匹配的字符来解决此问题)。

For your purposes, you can probably get away without parsing all of Ruby. What you actually need is a way to detect the %> that closes off a Ruby block. If you don't ever want to fail when the Ruby code contains those closing characters, you must detect anywhere those characters can occur inside the Ruby text; which means you need to detect all forms of literals.

However for you purposes you can probably get away with recognising the most likely cases where %> would occur in Ruby text, and ignore just those cases. This assumes of course that any remaining failure can be handled by getting your user to write the ERB a little differently.

For what it's worth, Treetop itself "parses" Ruby blocks this way; it just counts { and } characters until the closing one is found. So if your block contains a } in a literal string, you're broken (but you can work around by including the matching one in a comment).

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文