Django Haystack:搜索带或不带重音符号的术语
我正在使用 django haystack 在我的 django 项目上实现一个搜索系统。问题是我的模型中的某些字段有一些法语口音,我想找到包含带或不带口音的查询内容的条目。
我认为最好的想法是创建一个 SearchIndex,其中包含带重音符号的字段和不带重音符号的同一字段。
对此有什么想法或暗示吗?
这是一些代码
想象一下以下模型:
Cars(models.Model):
name = models.CharField()
和以下干草堆索引:
Cars(indexes.SearchIndex):
name = indexes.CharField(model_attr='name')
cleaned_name = indexes.CharField(model_attr='name')
def prepare_cleaned_name(self, object):
return strip_accents(object.name)
现在,在我的索引模板中,我放置了两个字段:
{{ object.cleaned_name }}
{{ object.name }}
所以,这是一些伪代码,我不知道它是否有效,但如果你有任何想法关于这一点,请告诉我!
I'm implementing a search system onto my django project, using django haystack. The problem is that some fields in my models have some french accents, and I would like to find the entries which contents the query with and without accents.
I think the best Idea is to create a SearchIndex with both the fields with the accents, and the same field without the accents.
Any idea or hint on this ?
Here is some code
Imagine the following models :
Cars(models.Model):
name = models.CharField()
and the following Haystack Index:
Cars(indexes.SearchIndex):
name = indexes.CharField(model_attr='name')
cleaned_name = indexes.CharField(model_attr='name')
def prepare_cleaned_name(self, object):
return strip_accents(object.name)
now, in my index template, I put the both fields :
{{ object.cleaned_name }}
{{ object.name }}
So, thats some pseudo code, I don't know if it works, but if you have any idea on this, let me know !
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
我找到了一种方法来索引模型中同一字段的两个值。
首先,在模型中编写一个返回字段的 ascii 值的方法:
以便在用于生成索引的模板中,您可以执行以下操作:
然后,您只需重建索引即可!
I find a way to index both value from the same field in my Model.
First, write a method in your model which returns the ascii value of the fields:
So that in your template used to generate the index, you could do this:
Then, you just have to rebuild your indexes !
是的,您走在正确的道路上。有时您确实希望多次存储字段,并应用不同的转换。
我的应用程序中的一个示例是我有两个
title
字段。一个用于搜索,它被词干化(测试〜=测试〜=测试者的过程),另一个用于单独排序(词干干扰排序顺序)。这是一个类似的案例。
在我的 schema.xml 中,这是通过以下方式处理的:
“string”类型负责存储标题的“原样”版本。
顺便说一句,如果您删除重音符号只是为了使单词更易于搜索,那么这可能值得研究:
http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.ISOLatin1AccentFilterFactory
Yes, you're on the right track here. Sometimes you do want to store fields multiple times, with different transformations applied.
An example of this in my application is that I have two
title
fields. One for searching which gets stemmed (the process by which test ~= test ~= tester), and another for sorting which is left alone (the stemming interferes with the sort order).This is an analogous case.
In my schema.xml this is handled by:
The type "string" is responsible for storing the "as-is" version of the title.
By the way, it you're stripping accents just to make words easier to search for, this is something that might be worth looking into:
http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.ISOLatin1AccentFilterFactory
您必须执行以下操作:
我不喜欢这个解决方案。我想知道一些方法来配置我的搜索后端来为我做这件事。我用“嗖嗖”。
You must do something like follow:
I don't like this solution. I would like to know some way to configure my seach backend to do it for me. I use whoosh.