解析 Gmail 风格的高级搜索语法?

发布于 2024-07-27 20:12:39 字数 689 浏览 7 评论 0原文

我想使用 Perl 解析类似于 Gmail 提供的搜索字符串。 示例输入为“tag:thing by:{user1 user2} {-tag:a by:user3}”。 我想将其放入树形结构中,如

{and => [
    "tag:thing",
    {or => [
       "by:user1",
       "by:user2",
    ]},
    {or => [
       {not => "tag:a"},
       "by:user3",
    ]},
}

一般规则是:

  1. 标记以空格分隔,默认为 AND 运算符。
  2. 大括号中的标记是替代选项 (OR)。 大括号可以位于字段说明符之前或之后。 即“by:{user1 user2}”和“{by:user1 by:user2}”是等效的。
  3. 排除以连字符为前缀的标记。

这些元素也可以组合和嵌套:例如“{by:user5 -{tag:k by:user3}} 等”。

我正在考虑编写一个上下文无关语法来表示这些规则,然后将其解析到树中。 这有必要吗? (使用简单的正则表达式可以实现这一点吗?)

建议使用哪些模块来解析上下文无关语法?

(最终这将用于使用 DBIx::Class 生成数据库查询。)

I want to parse a search string similar to that provided by Gmail using Perl. An example input would be "tag:thing by:{user1 user2} {-tag:a by:user3}". I want to put it into a tree structure, such as

{and => [
    "tag:thing",
    {or => [
       "by:user1",
       "by:user2",
    ]},
    {or => [
       {not => "tag:a"},
       "by:user3",
    ]},
}

The general rules are:

  1. Tokens separated by space default to the AND operator.
  2. Tokens in braces are alternative options (OR). The braces can go before or after the field specifier. i.e. "by:{user1 user2}" and "{by:user1 by:user2}" are equivalent.
  3. Tokens prefixed with a hyphen are excluded.

These elements can also be combined and nested: e.g. "{by:user5 -{tag:k by:user3}} etc".

I'm thinking of writing a context-free grammar to represent these rules, and then parsing it into the tree. Is this unnecessary? (Is this possible using simple regexps?)

What modules are recommended for doing parsing context-free grammars?

(Eventually this will be used to generate an database query with DBIx::Class.)

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(4

冷夜 2024-08-03 20:12:41

如果您的查询不是树形结构,那么正则表达式将为您完成这项工作。

例如:

my $search = "tag:thing by:{user1 user2} {-tag:a by:user3}"
my @tokens = split /(?![^{]*})\s+/, $search;
foreach (@tokens) {
    my $or = s/[{}]//g; # OR mode
    my ($default_field_specifier) = /(\w+):/;
}

即使您的查询是树形结构,正则表达式也可以使递归解析变得更加愉快:

$_ = "by:{user1 z:{user2 3} } x {-tag:a by:user3} zz";
pos($_) = 0;
scan_query("");

sub scan_query {
    my $default_specifier = shift;
    while (/\G\s*((?:[-\w:]+)|(?={))({)?/gc) {
        scan_query($1), next if $2;
        my $query_token = $default_specifier . $1;
    }
    /\G\s*\}/gc;
}

正则表达式很棒:)!

If your query isn't tree structured, then regexes will do the job for you.

For example:

my $search = "tag:thing by:{user1 user2} {-tag:a by:user3}"
my @tokens = split /(?![^{]*})\s+/, $search;
foreach (@tokens) {
    my $or = s/[{}]//g; # OR mode
    my ($default_field_specifier) = /(\w+):/;
}

Even if your query is tree structured, regexes can make recursive parsing much more pleasant:

$_ = "by:{user1 z:{user2 3} } x {-tag:a by:user3} zz";
pos($_) = 0;
scan_query("");

sub scan_query {
    my $default_specifier = shift;
    while (/\G\s*((?:[-\w:]+)|(?={))({)?/gc) {
        scan_query($1), next if $2;
        my $query_token = $default_specifier . $1;
    }
    /\G\s*\}/gc;
}

Regexes are awesome :)!

秋日私语 2024-08-03 20:12:41

Parse::Recdescent 可以生成这种排序的解析器的东西。 不过,您可能需要一些解析器方面的经验才能有效地使用它。

Parse::Recdescent can generate parsers for this sort of thing. You probably need some experience with parsers to use it effectively though.

霊感 2024-08-03 20:12:40

正则表达式不能很好地处理嵌套的事情(例如括号)。 当您获得正则表达式计数括号并正确捕获时,您可能就拥有了一个不错的 CFG 解析器。 CFG 可以在逻辑上保证正确的解析,而使用正则表达式解决方案则需要更多的魔法。 我不能推荐任何 Perl CFG 库,但编写一个听起来很宣泄。

Regex doesn't do nested things (like parenthesis) very well. By the time you get your regex counting parenthesis and capturing correctly, you could probably have a decent CFG parser. CFGs can logically guarantee correct parsing, while with a regex solution you're leaving a lot up to the magic. I can't recommend any Perl CFG libaries, but coding one sounds very cathartic.

很酷又爱笑 2024-08-03 20:12:40

YAPP 可能会做你想做的事。 您可以使用它来生成然后使用 LALR(1) 解析自动机。

YAPP might do what you want. You can use it to generate and then use a LALR(1) Parsing Automaton.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文