Treetop基本解析和正则表达式使用

发布于 2024-08-24 03:51:33 字数 1193 浏览 11 评论 0原文

我正在使用 ruby Treetop 库开发一个脚本，但在使用其正则表达式语法时遇到问题。首先，许多在其他设置中有效的正则表达式在树顶中的工作方式不同。

这是我的语法：(myline.treetop)

grammar MyLine
    rule line
        string whitespace condition
    end
    rule string
        [\S]*
    end
    rule whitespace
        [\s]*
    end
    rule condition
        "new" / "old" / "used"
    end
end

这是我的用法：(usage.rb)

require 'rubygems'
require 'treetop'
require 'polyglot'
require 'myline'

parser = MyLineParser.new
p parser.parse("randomstring new")

这肯定会找到单词 new，而且确实如此！现在我不会扩展它，以便如果输入字符串变为“randomstring anotherstring new Yetanother andanother”，它可以找到新的并且可能在规则条件的正则表达式之前和之后有任意数量的字符串，后跟空格（包括制表符）。换句话说，如果我向它传递任何带有“new”等单词的句子，它应该能够匹配它。

假设我将语法更改为：

rule line
    string whitespace condition whitespace string
end

那么，它应该能够找到以下内容的匹配项：

p parser.parse("randomstring new anotherstring")

那么，我需要做什么才能允许字符串空白在条件之前和之后重复？如果我尝试写这个：

rule line
    (string whitespace)* condition (whitespace string)*
end

，它就会陷入无限循环。如果我将上面的 () 替换为 []，它将返回 nil 一般来说，当我使用上述内容时，正则表达式会返回匹配项，但树顶正则表达式不会。有人对如何解决这个问题有任何提示/要点吗？另外，由于 Treetop 的文档不多，而且示例要么太琐碎，要么太复杂，有谁知道更全面的 Treetop 文档/指南吗？

原文

I'm developing a script using the ruby Treetop library and having issues working with its syntax for regex's. First off, many regular expressions that work in other settings dont work the same in treetop.

This is my grammar: (myline.treetop)

grammar MyLine
    rule line
        string whitespace condition
    end
    rule string
        [\S]*
    end
    rule whitespace
        [\s]*
    end
    rule condition
        "new" / "old" / "used"
    end
end

This is my usage: (usage.rb)

require 'rubygems'
require 'treetop'
require 'polyglot'
require 'myline'

parser = MyLineParser.new
p parser.parse("randomstring new")

This should find the word new for sure and it does! Now I wont to extend it so that it can find new if the input string becomes "randomstring anotherstring new yetanother andanother"
and possibly have any number of strings followed by whitespace (tab included) before and after the regex for rule condition. In other words, if I pass it any sentence with the word "new" etc in it, it should be able to match it.

So let's say I change my grammar to:

rule line
    string whitespace condition whitespace string
end

Then, it should be able to find a match for:

p parser.parse("randomstring new anotherstring")

So, what do I have to do to allow the string whitespace to be repeated before and after condition? If I try to write this:

rule line
    (string whitespace)* condition (whitespace string)*
end

, it goes in an infinite loop. If i replace the above () with [], it returns nil
In general, regex's return a match when i use the above, but treetop regex's dont.
Does anyone have any tips/points on how to go about this? Plus, since there isn't much documentation for treetop and the examples are either too trivial or too complex, is there anyone who knows a more thorough documentation/guide for treetop?

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

染墨丶若流云 2024-08-31 03:51:34

看起来你甚至不需要语法来完成你所要求的事情。在这种情况下，一个简单的正则表达式就足够了：（

line.match(/(.*)\s(new|old|used)\s(.*)/)

示例：http://rubular.com/r/Kl8rUifxeu )

您可以使用以下命令获取包含条件前后内容的数组：

Regexp.last_match(1).split + Regexp.last_match(3)

并使用以下命令测试条件：

return "Sweet, it's new!" if Regexp.last_match(2) == "new"

It looks like you don't even need a grammar to do what you're asking. A simple regex is sufficient in this case:

line.match(/(.*)\s(new|old|used)\s(.*)/)

(Example: http://rubular.com/r/Kl8rUifxeu )

You can get an array containing the stuff before and after the condition with:

Regexp.last_match(1).split + Regexp.last_match(3)

And test the condition with:

return "Sweet, it's new!" if Regexp.last_match(2) == "new"

回复收藏 0 原文

牵强ㄟ 2024-08-31 03:51:34

这与树顶无关，与你的语法有关。条件规则与您的字符串规则完全匹配，因此当您从 (stringwhitespace)* 重复中断到条件时，它是不明确的。清理你的线条规则，这样你就有了明确的语法，你就会没事的。您可能希望将诸如条件之类的事物/属性标记为这样：

cond:new

这在词法上与字符串规则不同。

This has nothing to do with treetop and everything to do with your grammar. The condition rule is entirely matched by your string rule, so it is ambiguous when you break from the (string whitespace)* repetition to condition. Clean up your line rule so you have an unambiguous grammar and you'll be fine. You might want to make it so that things/attributes like condition are tagged as such: