PostgreSQL 和 Django 中的不精确全文搜索
我是 PostgreSQL 的新手,我不知道如何进行不精确全文搜索。并不是说它太重要,但我正在使用 Django。换句话说,我正在寻找类似以下内容的内容:
q = 'hello world'
queryset = Entry.objects.extra(
where=['body_tsv @@ plainto_tsquery(%s)'],
params=[q])
for entry in queryset:
print entry.title
其中条目列表应包含确切的“hello world”或类似的内容。然后,应根据列表的值与指定字符串的距离对列表进行排序。例如,我希望查询包含包含“Hello World”、“hEllo world”、“helloworld”、“hell world”等的条目,并通过某种排名指示每个项目与完美的距离,未更改的查询字符串。
你会怎样做呢?
I'm new to PostgreSQL, and I'm not sure how to go about doing an inexact full-text search. Not that it matters too much, but I'm using Django. In other words, I'm looking for something like the following:
q = 'hello world'
queryset = Entry.objects.extra(
where=['body_tsv @@ plainto_tsquery(%s)'],
params=[q])
for entry in queryset:
print entry.title
where I the list of entries should contain either exactly 'hello world', or something similar. The listings should then be ordered according to how far away their value is from the specified string. For instance, I would like the query to include entries containing "Hello World", "hEllo world", "helloworld", "hell world", etc., with some sort of ranking indicating how far away each item is from the perfect, unchanged query string.
How would you go about doing this?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
你最好的选择是使用 Django 原始查询集,我将它与MySQL执行全文匹配。如果数据全部在数据库中并且Postgres提供了匹配功能,那么使用它是有意义的。另外,Postgres 在全文查询的词干提取等方面提供了一些非常有用的东西。
基本上,它可以让您编写所需的实际查询但返回模型(只要您显然正在查询模型表)。
这给您带来的好处是,您可以测试您将首先在 Postgres 中使用的精确查询,文档很好地涵盖了全文查询。
目前原始查询集的主要问题是它们不支持计数。因此,如果您将返回大量数据并且应用程序有内存限制,您可能需要采取一些巧妙的措施。
然而,“不精确”匹配实际上并不是全文搜索功能的一部分。相反,您需要 postgres fuzzystrmatch contrib 模块。它的用途是 此处使用索引进行描述。
Your best bet is to use Django raw querysets, I use it with MySQL to perform full text matching. If the data is all in the database and Postgres provides the matching capability then it makes sense to use it. Plus Postgres offers some really useful things in terms of stemming etc with full text queries.
Basically it lets you write the actual query you want yet returns models (as long as you are querying a model table obviously).
The advantage this gives you is that you can test the exact query you will be using first in Postgres, the documentation covers full text queries pretty well.
The main gotcha with raw querysets at the moment is they don't support count. So if you will be returning lots of data and have memory constraints on your application you might need to do something clever.
"Inexact" matching however isn't really part of the full text searching capabilities. Instead you want the postgres fuzzystrmatch contrib module. It's use is described here with indexes.
最好的方法是使用搜索引擎来实现此目的。 Django-haystack 支持三种不同搜索引擎的集成。
The best would be to use a search engine for this purpose. Django-haystack supports the integration of three different search engines.
2022 年,Django 支持使用 postgres 进行全文搜索。完整文档在这里: https://docs.djangoproject.com/en /4.0/ref/contrib/postgres/search/
In 2022, Django supports full text search with postgres. Full documentation here: https://docs.djangoproject.com/en/4.0/ref/contrib/postgres/search/