对于类似 google 的搜索查询来说,什么是好的 python 解析器?

发布于 2024-08-23 11:25:49 字数 277 浏览 10 评论 0原文

对于一些基于搜索的代码(在 Python 中),我需要编写一个查询语法解析器来解析类似 google 的简单查询语法。例如:

所有这些词“用这个短语” 或者那个或者这个网站:within.site 文件类型:ps 来自:上周

随着搜索变得越来越流行,我希望能够轻松找到一个 python 库来执行此操作,从而避免重新发明轮子。可悲的是,谷歌搜索并没有产生太多结果。

对于这个简单的任务,你会推荐什么作为 python 解析库?

For some search-based code (in Python), I need to write a query syntax parser that would parse a simple google like query syntax. For example:

all of these words "with this phrase"
OR that OR this site:within.site
filetype:ps from:lastweek

As search becomes more an more popular, I expected to be able to easily find a python library for doing this and thus avoid having to re-invent the wheel. Sadly, searches on google doesn't yield much.

What would you recommend as a python parsing library for this simple task?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(7

望喜 2024-08-30 11:25:49

虽然 ply 是一种更经典的方法(lexx + yacc 的 Pythonic 变体),因此如果您已经熟悉此类传统工具,则可能更容易上手,pyparsing 非常Pythonic,将是我的首要推荐,特别是对于这样简单的任务(这实际上更像是词法分析而不是“成熟的任务”) “解析......至少在你想要允许可能嵌套的括号之前,但 pyparsing 也不会真正受到这些困扰;-)。

While ply is a more classical approach (a Pythonic variant of lexx + yacc) and thus may be easier to get started with if you're already familiar with such traditional tools, pyparsing is highly pythonic and would be my top recommendation, especially for such simple tasks (which are really more like lexing than "full-blown" parsing... at least until you want to allow possibly-nested parentheses, but pyparsing won't really be troubled by those either;-).

━╋う一瞬間旳綻放 2024-08-30 11:25:49

一些不错的选择:

  • Whoosh:唯一的问题是它们的解析示例很少,因为解析器可能不是其主要功能/焦点,但这绝对是一个不错的选择

  • modgrammar:我没有尝试过,但它看起来非常灵活和简单

  • ply

  • pyparsing:强烈推荐。网上有一些很好的解析示例

如果您完成了该项目,您最终选择了什么?

A few good options:

  • Whoosh: the only problem is that they have few parsing examples since the parser might not be its main feature/focus, but it's definitely a good option

  • modgrammar: I didn't try it, but it seems pretty flexible and simple

  • ply

  • pyparsing: highly recommended. there are some good parsing examples online

If you're done with the project, what did you end up choosing?

独闯女儿国 2024-08-30 11:25:49

抱歉 - Lepl 不再开发。

还有 LEPL - http://www.lepl.org/lepl。 acooke.org/lepl

这是我在早餐时写的一个快速解决方案:

pl6 src: python3                                                      
Python 3.1 (r31:73572, Oct 24 2009, 05:39:09)                         
[GCC 4.4.1 [gcc-4_4-branch revision 150839]] on linux2                
Type "help", "copyright", "credits" or "license" for more information.
>>> from lepl import *                                                
>>>                                                                   
>>> class Alternatives(Node):                                         
...     pass                                                          
...
>>> class Query(Node):
...     pass
...
>>> class Text(Node):
...     pass
...
>>> def compile():
...     qualifier      = Word() & Drop(':')           > 'qualifier'
...     word           = ~Lookahead('OR') & Word()
...     phrase         = String()
...     text           = phrase | word
...     word_or_phrase = (Optional(qualifier) & text) > Text
...     space          = Drop(Space()[1:])
...     query          = word_or_phrase[1:, space]    > Query
...     separator      = Drop(space & 'OR' & space)
...     alternatives   = query[:, separator]          > Alternatives
...     return alternatives.string_parser()
...
>>> parser = compile()
>>>
>>> alternatives = parser('all of these words "with this phrase" '
...                       'OR that OR this site:within.site '
...                       'filetype:ps from:lastweek')[0]
>>>
>>> print(str(alternatives))
Alternatives
 +- Query
 |   +- Text
 |   |   `- 'all'
 |   +- Text
 |   |   `- 'of'
 |   +- Text
 |   |   `- 'these'
 |   +- Text
 |   |   `- 'words'
 |   `- Text
 |       `- 'with this phrase'
 +- Query
 |   `- Text
 |       `- 'that'
 `- Query
     +- Text
     |   `- 'this'
     +- Text
     |   +- qualifier 'site'
     |   `- 'within.site'
     +- Text
     |   +- qualifier 'filetype'
     |   `- 'ps'
     `- Text
         +- qualifier 'from'
         `- 'lastweek'
>>>

我认为 LEPL 不是一个“玩具”——尽管它是递归下降,但它包括记忆和蹦床,这有助于避免 LEPL 的一些限制那种方法。

然而,它是纯Python,所以它不是超级快,而且它正在积极开发中(新版本4.0,有相当多的修复和改进,即将推出)。

SORRY - Lepl is no longer being developed.

There's also LEPL - http://www.acooke.org/lepl

Here's a quick solution I wrote during breakfast:

pl6 src: python3                                                      
Python 3.1 (r31:73572, Oct 24 2009, 05:39:09)                         
[GCC 4.4.1 [gcc-4_4-branch revision 150839]] on linux2                
Type "help", "copyright", "credits" or "license" for more information.
>>> from lepl import *                                                
>>>                                                                   
>>> class Alternatives(Node):                                         
...     pass                                                          
...
>>> class Query(Node):
...     pass
...
>>> class Text(Node):
...     pass
...
>>> def compile():
...     qualifier      = Word() & Drop(':')           > 'qualifier'
...     word           = ~Lookahead('OR') & Word()
...     phrase         = String()
...     text           = phrase | word
...     word_or_phrase = (Optional(qualifier) & text) > Text
...     space          = Drop(Space()[1:])
...     query          = word_or_phrase[1:, space]    > Query
...     separator      = Drop(space & 'OR' & space)
...     alternatives   = query[:, separator]          > Alternatives
...     return alternatives.string_parser()
...
>>> parser = compile()
>>>
>>> alternatives = parser('all of these words "with this phrase" '
...                       'OR that OR this site:within.site '
...                       'filetype:ps from:lastweek')[0]
>>>
>>> print(str(alternatives))
Alternatives
 +- Query
 |   +- Text
 |   |   `- 'all'
 |   +- Text
 |   |   `- 'of'
 |   +- Text
 |   |   `- 'these'
 |   +- Text
 |   |   `- 'words'
 |   `- Text
 |       `- 'with this phrase'
 +- Query
 |   `- Text
 |       `- 'that'
 `- Query
     +- Text
     |   `- 'this'
     +- Text
     |   +- qualifier 'site'
     |   `- 'within.site'
     +- Text
     |   +- qualifier 'filetype'
     |   `- 'ps'
     `- Text
         +- qualifier 'from'
         `- 'lastweek'
>>>

I would argue that LEPL isn't a "toy" - although it's recursive descent, it includes memoisation and trampolining, which help avoid some of the limitations of that approach.

However, it is pure Python, so it's not super-fast, and it's in active development (a new release, 4.0, with quite a few fixes and improvements, is coming relatively soon).

我是男神闪亮亮 2024-08-30 11:25:49

PyParsing 将是正确的选择,尽管相当乏味,这就是为什么我开发了一个受 lucene 和 gmail 语法启发的查询解析器。它唯一的依赖是 PyParsing,我们已经在几个项目中使用了它。它是完全可定制和可扩展的,而且它可以让您摆脱 pyparsing 问题。您可以在这里查看:

http://www.github.com/sebastiandev/plyse

文档非常齐全,因此您可以找到有关如何进行查询、配置等的文档。

PyParsing would be the right choice, although is quite tedious, thats why I have developed a query parser inspired on lucene and gmail syntax. It's only dependency is PyParsing, and we have used it on several projects. It is fully customizable and extendable, plus it abstracts you from the pyparsing issues. You can check it out here:

http://www.github.com/sebastiandev/plyse

Its pretty well documented so you'll find docs on how to do the querying, configs, etc.

花想c 2024-08-30 11:25:49

PLY 很棒。它基于 Lex/Yacc 习惯用法,因此可能已经很熟悉了。它允许您为任何任务(包括您需要的任务)创建任意复杂的词法分析器和解析器。

使用 PLY 这样强大的工具而不是简单的玩具是一个好主意,因为随着时间的推移,您的需求可能会变得更加复杂,并且您希望继续使用相同的工具。

PLY is great. It is based on the Lex/Yacc idiom and thus may already be familiar. It allows you to create arbitrarily complex lexers and parsers for any task, including the one you need.

Using a powerful tool like PLY instead of a simple toy is a good idea, because your needs can become more complex with time and you'd like to stay with the same tool.

涙—继续流 2024-08-30 11:25:49

我知道这是一个老问题,但为了将来参考,我刚刚将我的包 searchstringparser 上传到 PyPi。它基于 ply 实现了一个不错的查询解析机制。它输出适合 PostgreSQL 函数 tsquery 的字符串。您可以查看词法分析器和解析器类,看看它们是否适合您的需要或进行相应的修改。

欢迎反馈!

I know this is an old question but for future reference I just uploaded my package searchstringparser to PyPi. Which implements a decent query parsing machinery based on ply. It outputs a string suitable for the PostgreSQL function tsquery. You can look at the lexer and parser classes to see if they fit your need or modify accordingly.

Feedback welcome!

自此以后,行同陌路 2024-08-30 11:25:49

Whoosh 有一个全面的搜索查询解析器模块 whoosh.qparser 和 QueryParser 类,它们应该相当容易适应您的用例。

请参阅 http://pythonhosted.org/Whoosh/parsing.htmlhttps://bitbucket.org/mchaput/whoosh/src/55f9c484047a8306101c8eaa 59e9a110f960a1c2/src/whoosh/qparser

Whoosh has a comprehensive search query parser module whoosh.qparser and class QueryParser that should be reasonably easy to adapt to your use case.

See http://pythonhosted.org/Whoosh/parsing.html and https://bitbucket.org/mchaput/whoosh/src/55f9c484047a8306101c8eaa59e9a110f960a1c2/src/whoosh/qparser

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文