解析 Gmail 风格的高级搜索语法?
我想使用 Perl 解析类似于 Gmail 提供的搜索字符串。 示例输入为“tag:thing by:{user1 user2} {-tag:a by:user3}”。 我想将其放入树形结构中,如
{and => [
"tag:thing",
{or => [
"by:user1",
"by:user2",
]},
{or => [
{not => "tag:a"},
"by:user3",
]},
}
一般规则是:
- 标记以空格分隔,默认为 AND 运算符。
- 大括号中的标记是替代选项 (OR)。 大括号可以位于字段说明符之前或之后。 即“by:{user1 user2}”和“{by:user1 by:user2}”是等效的。
- 排除以连字符为前缀的标记。
这些元素也可以组合和嵌套:例如“{by:user5 -{tag:k by:user3}} 等”。
我正在考虑编写一个上下文无关语法来表示这些规则,然后将其解析到树中。 这有必要吗? (使用简单的正则表达式可以实现这一点吗?)
建议使用哪些模块来解析上下文无关语法?
(最终这将用于使用 DBIx::Class 生成数据库查询。)
I want to parse a search string similar to that provided by Gmail using Perl. An example input would be "tag:thing by:{user1 user2} {-tag:a by:user3}". I want to put it into a tree structure, such as
{and => [
"tag:thing",
{or => [
"by:user1",
"by:user2",
]},
{or => [
{not => "tag:a"},
"by:user3",
]},
}
The general rules are:
- Tokens separated by space default to the AND operator.
- Tokens in braces are alternative options (OR). The braces can go before or after the field specifier. i.e. "by:{user1 user2}" and "{by:user1 by:user2}" are equivalent.
- Tokens prefixed with a hyphen are excluded.
These elements can also be combined and nested: e.g. "{by:user5 -{tag:k by:user3}} etc".
I'm thinking of writing a context-free grammar to represent these rules, and then parsing it into the tree. Is this unnecessary? (Is this possible using simple regexps?)
What modules are recommended for doing parsing context-free grammars?
(Eventually this will be used to generate an database query with DBIx::Class.)
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(4)
如果您的查询不是树形结构,那么正则表达式将为您完成这项工作。
例如:
即使您的查询是树形结构,正则表达式也可以使递归解析变得更加愉快:
正则表达式很棒:)!
If your query isn't tree structured, then regexes will do the job for you.
For example:
Even if your query is tree structured, regexes can make recursive parsing much more pleasant:
Regexes are awesome :)!
Parse::Recdescent 可以生成这种排序的解析器的东西。 不过,您可能需要一些解析器方面的经验才能有效地使用它。
Parse::Recdescent can generate parsers for this sort of thing. You probably need some experience with parsers to use it effectively though.
正则表达式不能很好地处理嵌套的事情(例如括号)。 当您获得正则表达式计数括号并正确捕获时,您可能就拥有了一个不错的 CFG 解析器。 CFG 可以在逻辑上保证正确的解析,而使用正则表达式解决方案则需要更多的魔法。 我不能推荐任何 Perl CFG 库,但编写一个听起来很宣泄。
Regex doesn't do nested things (like parenthesis) very well. By the time you get your regex counting parenthesis and capturing correctly, you could probably have a decent CFG parser. CFGs can logically guarantee correct parsing, while with a regex solution you're leaving a lot up to the magic. I can't recommend any Perl CFG libaries, but coding one sounds very cathartic.
YAPP 可能会做你想做的事。 您可以使用它来生成然后使用 LALR(1) 解析自动机。
YAPP might do what you want. You can use it to generate and then use a LALR(1) Parsing Automaton.