如何使用 Haystack 进行部分场匹配?
我的 django 网站需要一个简单的搜索工具,所以我选择了 Haystack 和 Solr。我已正确设置所有内容,并且在输入精确短语时可以找到正确的搜索结果,但在输入部分短语时却找不到任何结果。
例如:“John”返回“John Doe”,但“Joh”不返回任何内容。
模型:
class Person(models.Model):
first_name = models.CharField(max_length=50)
last_name = models.CharField(max_length=50)
搜索索引:
class PersonIndex(SearchIndex):
text = CharField(document=True, use_template=True)
first_name = CharField(model_attr = 'first_name')
last_name = CharField(model_attr = 'last_name')
site.register(Person, PersonIndex)
我猜我缺少一些启用部分字段匹配的设置。我在一些论坛上看到人们谈论 EdgeNGramFilterFactory()
,我也用 Google 搜索过它,但我不太确定它的实现。另外,我希望有一种特定于干草堆的方法来做到这一点,以防我切换搜索后端。
I needed a simple search tool for my django-powered web site, so I went with Haystack and Solr. I have set everything up correctly and can find the correct search results when I type in the exact phrase, but I can't get any results when typing in a partial phrase.
For example: "John" returns "John Doe" but "Joh" doesn't return anything.
Model:
class Person(models.Model):
first_name = models.CharField(max_length=50)
last_name = models.CharField(max_length=50)
Search Index:
class PersonIndex(SearchIndex):
text = CharField(document=True, use_template=True)
first_name = CharField(model_attr = 'first_name')
last_name = CharField(model_attr = 'last_name')
site.register(Person, PersonIndex)
I'm guessing there's some setting I'm missing that enables partial field matching. I've seen people talking about EdgeNGramFilterFactory()
in some forums, and I've Googled it, but I'm not quite sure of its implementation. Plus, I was hoping there was a haystack-specific way of doing it in case I ever switch out the search backend.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(5)
您可以通过将索引的文本字段设置为 EdgeNgramField 来实现该行为:
You can achieve that behavior by making your index's text field an EdgeNgramField:
除了本页中其他人提到的
EdgeNgramField
提示(当然还有NgramField
,如果您使用亚洲语言),我认为值得一提的是 Django_haystack您可以通过以下命令在 Solr 上运行原始查询:其中
text
是您要搜索的字段,query
可以是基于查询解析器语法的任何内容(版本 3.6,或 4.6)。通过这种方式,您可以轻松地将查询设置为
ABC*
或ABC~ 或任何适合语法的其他内容。
In addition to the
EdgeNgramField
hint that others mentioned in this page (and of courseNgramField
, if you work with Asian languages), I think it is worth to mention that in Django_haystack you can run raw queries on Solr via following command:where
text
is the field you want to search, and thequery
can be anything based on Query Parser Syntax (version 3.6, or 4.6) of Lucene.In this way you can easily set the query to
ABC*
orABC~
or anything else which fits to the syntax.我在搜索非英语单词时遇到了类似的问题,例如:
如果我想搜索关键字
ABC
,我会期望以上两个结果。通过将关键字转换为小写并使用startswith
,我能够实现以下目标:I had a similar issue while searching for non english words, for instance:
If I want to search for keywords
ABC
, I will expect the above two results. I was able to achieve the following by converting the keyword to lowercase and usingstartswith
:我遇到了同样的问题,获得我想要的结果的唯一方法是修改 solr 配置文件以包含 ngram 过滤,因为默认标记生成器基于空白。因此,请改用 NGramTokenizer 。我很想知道是否有一种大海捞针的方式来做同样的事情。
我现在不在我的机器旁,但这应该可以解决问题。
I had the same problem and the only way to get the results I wanted was to modify the solr configuration file to include ngram filtering as the default tokenizer is based on white space. So use NGramTokenizer instead. I'd love to know if there was a haystack way of doing the same thing.
I'm not at my machine right now but this should do the trick.
@riz 我还不能发表评论,或者我会的,我知道这是一个旧评论,但万一其他人超越了这个:请确保管理.py update_index
@riz I can't comment yet or I would and I know it's an old comment but in case anyone else runs past this: Make sure to manage.py update_index