Django Haystack：搜索带或不带重音符号的术语

发布于 2024-08-20 14:58:44 字数 696 浏览 5 评论 0原文

我正在使用 django haystack 在我的 django 项目上实现一个搜索系统。问题是我的模型中的某些字段有一些法语口音，我想找到包含带或不带口音的查询内容的条目。

我认为最好的想法是创建一个 SearchIndex，其中包含带重音符号的字段和不带重音符号的同一字段。

对此有什么想法或暗示吗？

这是一些代码

想象一下以下模型：

Cars(models.Model):
    name = models.CharField()

和以下干草堆索引：

Cars(indexes.SearchIndex):
    name = indexes.CharField(model_attr='name')
    cleaned_name = indexes.CharField(model_attr='name')

    def prepare_cleaned_name(self, object):
        return strip_accents(object.name)

现在，在我的索引模板中，我放置了两个字段：

{{ object.cleaned_name }}
{{ object.name }}

所以，这是一些伪代码，我不知道它是否有效，但如果你有任何想法关于这一点，请告诉我！

原文

I'm implementing a search system onto my django project, using django haystack. The problem is that some fields in my models have some french accents, and I would like to find the entries which contents the query with and without accents.

I think the best Idea is to create a SearchIndex with both the fields with the accents, and the same field without the accents.

Any idea or hint on this ?

Here is some code

Imagine the following models :

Cars(models.Model):
    name = models.CharField()

and the following Haystack Index:

Cars(indexes.SearchIndex):
    name = indexes.CharField(model_attr='name')
    cleaned_name = indexes.CharField(model_attr='name')

    def prepare_cleaned_name(self, object):
        return strip_accents(object.name)

now, in my index template, I put the both fields :

{{ object.cleaned_name }}
{{ object.name }}

So, thats some pseudo code, I don't know if it works, but if you have any idea on this, let me know !

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

优雅的叶子 2024-08-27 14:58:44

我找到了一种方法来索引模型中同一字段的两个值。

首先，在模型中编写一个返回字段的 ascii 值的方法：

class Car(models.Model):
    name = model.CharField()

    def ascii_name(self):
        return strip_accents(self.name)

以便在用于生成索引的模板中，您可以执行以下操作：

{{ object.name }}
{{ object.ascii_name }}

然后，您只需重建索引即可！

I find a way to index both value from the same field in my Model.

First, write a method in your model which returns the ascii value of the fields:

class Car(models.Model):
    name = model.CharField()

    def ascii_name(self):
        return strip_accents(self.name)

So that in your template used to generate the index, you could do this:

{{ object.name }}
{{ object.ascii_name }}

Then, you just have to rebuild your indexes !

回复收藏 0 原文

情感失落者 2024-08-27 14:58:44

是的，您走在正确的道路上。有时您确实希望多次存储字段，并应用不同的转换。

我的应用程序中的一个示例是我有两个 title 字段。一个用于搜索，它被词干化（测试〜=测试〜=测试者的过程），另一个用于单独排序（词干干扰排序顺序）。

这是一个类似的案例。

在我的 schema.xml 中，这是通过以下方式处理的：

<field name="title" type="text" indexed="true" stored="true" multiValued="false" />
<field name="title_sort" type="string" indexed="true" stored="true" multiValued="false" />

“string”类型负责存储标题的“原样”版本。

顺便说一句，如果您删除重音符号只是为了使单词更易于搜索，那么这可能值得研究：
http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.ISOLatin1AccentFilterFactory

Yes, you're on the right track here. Sometimes you do want to store fields multiple times, with different transformations applied.

An example of this in my application is that I have two title fields. One for searching which gets stemmed (the process by which test ~= test ~= tester), and another for sorting which is left alone (the stemming interferes with the sort order).

This is an analogous case.

In my schema.xml this is handled by:

<field name="title" type="text" indexed="true" stored="true" multiValued="false" />
<field name="title_sort" type="string" indexed="true" stored="true" multiValued="false" />

The type "string" is responsible for storing the "as-is" version of the title.

By the way, it you're stripping accents just to make words easier to search for, this is something that might be worth looking into:
http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.ISOLatin1AccentFilterFactory

回复收藏 0 原文

耳钉梦 2024-08-27 14:58:44

您必须执行以下操作：

Cars(indexes.SearchIndex):
    name = indexes.CharField(model_attr='name')

    def prepare(self, obj):
        self.prepared_data = super(Cars, self).prepare(obj)
        self.prepared_data['name'] += '\n' + strip_accents(self.prepared_data['name'])
        return self.prepared_data

我不喜欢这个解决方案。我想知道一些方法来配置我的搜索后端来为我做这件事。我用“嗖嗖”。

You must do something like follow:

Cars(indexes.SearchIndex):
    name = indexes.CharField(model_attr='name')

    def prepare(self, obj):
        self.prepared_data = super(Cars, self).prepare(obj)
        self.prepared_data['name'] += '\n' + strip_accents(self.prepared_data['name'])
        return self.prepared_data

I don't like this solution. I would like to know some way to configure my seach backend to do it for me. I use whoosh.

回复收藏 0 原文

~没有更多了~