如何使用 Haystack 进行部分场匹配?

发布于 2024-10-07 01:09:49 字数 741 浏览 0 评论 0原文

我的 django 网站需要一个简单的搜索工具,所以我选择了 Haystack 和 Solr。我已正确设置所有内容,并且在输入精确短语时可以找到正确的搜索结果,但在输入部分短语时却找不到任何结果。

例如:“John”返回“John Doe”,但“Joh”不返回任何内容。

模型:

class Person(models.Model):
    first_name = models.CharField(max_length=50)
    last_name = models.CharField(max_length=50)

搜索索引:

class PersonIndex(SearchIndex):
    text = CharField(document=True, use_template=True)
    first_name = CharField(model_attr = 'first_name')
    last_name = CharField(model_attr = 'last_name')

site.register(Person, PersonIndex)

我猜我缺少一些启用部分字段匹配的设置。我在一些论坛上看到人们谈论 EdgeNGramFilterFactory(),我也用 Google 搜索过它,但我不太确定它的实现。另外,我希望有一种特定于干草堆的方法来做到这一点,以防我切换搜索后端。

I needed a simple search tool for my django-powered web site, so I went with Haystack and Solr. I have set everything up correctly and can find the correct search results when I type in the exact phrase, but I can't get any results when typing in a partial phrase.

For example: "John" returns "John Doe" but "Joh" doesn't return anything.

Model:

class Person(models.Model):
    first_name = models.CharField(max_length=50)
    last_name = models.CharField(max_length=50)

Search Index:

class PersonIndex(SearchIndex):
    text = CharField(document=True, use_template=True)
    first_name = CharField(model_attr = 'first_name')
    last_name = CharField(model_attr = 'last_name')

site.register(Person, PersonIndex)

I'm guessing there's some setting I'm missing that enables partial field matching. I've seen people talking about EdgeNGramFilterFactory() in some forums, and I've Googled it, but I'm not quite sure of its implementation. Plus, I was hoping there was a haystack-specific way of doing it in case I ever switch out the search backend.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(5

安穩 2024-10-14 01:09:50

您可以通过将索引的文本字段设置为 EdgeNgramField 来实现该行为:

class PersonIndex(SearchIndex):
    text = EdgeNgramField(document=True, use_template=True)
    first_name = CharField(model_attr = 'first_name')
    last_name = CharField(model_attr = 'last_name')

You can achieve that behavior by making your index's text field an EdgeNgramField:

class PersonIndex(SearchIndex):
    text = EdgeNgramField(document=True, use_template=True)
    first_name = CharField(model_attr = 'first_name')
    last_name = CharField(model_attr = 'last_name')
∝单色的世界 2024-10-14 01:09:50

除了本页中其他人提到的 EdgeNgramField 提示(当然还有 NgramField,如果您使用亚洲语言),我认为值得一提的是 Django_haystack您可以通过以下命令在 Solr 上运行原始查询:

from haystack.query import SearchQuerySet
from haystack.inputs import Raw
SearchQuerySet().filter(text=Raw(query))

其中 text 是您要搜索的字段,query 可以是基于查询解析器语法的任何内容(版本 3.6,或 4.6)。

通过这种方式,您可以轻松地将查询设置为 ABC*ABC~ 或任何适合语法的其他内容。

In addition to the EdgeNgramField hint that others mentioned in this page (and of course NgramField, if you work with Asian languages), I think it is worth to mention that in Django_haystack you can run raw queries on Solr via following command:

from haystack.query import SearchQuerySet
from haystack.inputs import Raw
SearchQuerySet().filter(text=Raw(query))

where text is the field you want to search, and the query can be anything based on Query Parser Syntax (version 3.6, or 4.6) of Lucene.

In this way you can easily set the query to ABC* or ABC~ or anything else which fits to the syntax.

冬天旳寂寞 2024-10-14 01:09:50

我在搜索非英语单词时遇到了类似的问题,例如:

ABC
ABCD

如果我想搜索关键字ABC,我会期望以上两个结果。通过将关键字转换为小写并使用 startswith,我能够实现以下目标:

keywords = 'ABC'
results.filter(code__startswith=keywords.lower())

I had a similar issue while searching for non english words, for instance:

ABC
ABCD

If I want to search for keywords ABC, I will expect the above two results. I was able to achieve the following by converting the keyword to lowercase and using startswith:

keywords = 'ABC'
results.filter(code__startswith=keywords.lower())
暖心男生 2024-10-14 01:09:50

我遇到了同样的问题,获得我想要的结果的唯一方法是修改 solr 配置文件以包含 ngram 过滤,因为默认标记生成器基于空白。因此,请改用 NGramTokenizer 。我很想知道是否有一种大海捞针的方式来做同样的事情。

我现在不在我的机器旁,但这应该可以解决问题。

<tokenizer class="solr.NGramTokenizerFactory" minGramSize="3" maxGramSize="15" />

I had the same problem and the only way to get the results I wanted was to modify the solr configuration file to include ngram filtering as the default tokenizer is based on white space. So use NGramTokenizer instead. I'd love to know if there was a haystack way of doing the same thing.

I'm not at my machine right now but this should do the trick.

<tokenizer class="solr.NGramTokenizerFactory" minGramSize="3" maxGramSize="15" />
鹿港巷口少年归 2024-10-14 01:09:50

@riz 我还不能发表评论,或者我会的,我知道这是一个旧评论,但万一其他人超越了这个:请确保管理.py update_index

Blockquote @Liarez 你是如何让它发挥作用的?我正在使用 haystack/elastic search,但无法让它工作。

@riz I can't comment yet or I would and I know it's an old comment but in case anyone else runs past this: Make sure to manage.py update_index

Blockquote @Liarez how did you get this to work? I'm using haystack/elastic search and I wasn't able to get it to work.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文