Django-haystack 按标题对结果排序

发布于 2024-12-04 16:38:16 字数 454 浏览 4 评论 0原文

我想按标题对 django-haystack 查询的结果进行排序。

from haystack.query import SearchQuerySet
for result in SearchQuerySet().all().order_by('result_title_sort'):
    print result.result_title_sort

但是我不断收到此错误：

字段“result_title_sort”中的术语比文档多，但无法对标记化字段进行排序

这是我的干草堆字段定义：

result_title_sort = CharField(indexed=True, model_attr='title')

我应该如何定义该字段，以便可以对其进行排序？

原文

I'd like to sort the results of my django-haystack query by title.

from haystack.query import SearchQuerySet
for result in SearchQuerySet().all().order_by('result_title_sort'):
    print result.result_title_sort

I keep getting this error however:

there are more terms than documents in field "result_title_sort", but it's impossible to sort on tokenized fields

This is my haystack field definition:

result_title_sort = CharField(indexed=True, model_attr='title')

How should I define the field, so I can sort on it?

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

煞人兵器 2024-12-11 16:38:16

continue

Thank you Mark Chackerian, your solution does work for sorting. I however I still felt slightly uncomfortable modifying the output of the auto-generated schema.xml. I found a solution by using Solr's <dynamicField> field types. The Django-Haystack docs aren't that clear on how to go about using dynamic fields, but basically if you just include a new key in the dict returned by the SearchIndex's prepare() and a dynamicField will be added to the document at index time.

Remove your existing attribute from the SearchIndex

#result_title_sort = CharField(indexed=True, model_attr='title') 
def prepare(self, obj):
    prepared_data['result_title_sort_s'] #notice the "_s"

The above will create a dynamic string field in the document called result_title_sort_s by which you will be able to sort your results.

回复收藏 0 原文

寻梦旅人 2024-12-11 16:38:16

continue

You need to make sure that your sorting field is non-tokenized in SOLR. It's not very clear from the Haystack documentation how to make it non-tokenized using Haystack. My solution was to change the SOLR schema.xml generated by Haystack so that the field type is "string" instead of "text". So instead of having something like this in your schema.xml:

<field name="result_title_sort" type="text" indexed="true" stored="true" multiValued="false" />

you need to have this:

<field name="result_title_sort" type="string" indexed="true" stored="true" multiValued="false" />

Since you might be regenerating your schema.xml many times, I recommend creating a build script to create the schema file, which will automatically change the schema for you. Something like this:

./manage.py build_solr_schema | sed 's/<field name=\"result_title_sort\" type=\"text\"/<field name=\"result_title_sort\" type=\"string\"/' > schema.xml

(or for Haystack 2.0)

./manage.py build_solr_schema | sed 's/<field name=\"name_sort\" type=\"text_en\"/<field name=\"name_sort\" type=\"string\"/' > schema.xml

After I did this, my sorting worked in alphabetical order. However, there were still some problems because the sorting was ASCII order, which put lowercase and non-Roman characters at the end. So I created the following method to prepare the text for sorting, which uses the unidecode module to convert non-Roman characters to ASCII. It also removes initial spaces, "the" and "a":

def format_text_for_sort(sort_term,remove_articles=False):
    ''' processes text for sorting field:
        * converts non-ASCII characters to ASCII equivalents
        * converts to lowercase
        * (optional) remove leading a/the
        * removes outside spaces
    '''
    sort_term = unidecode(sort_term).lower().strip()
    if remove_articles:
        sort_term =  re.sub(r'^(a\s+|the\s+)', '', sort_term )
    return sort_term

Then you need to add a prepare method in your search_indexes.py to call the formatter, something like

def prepare_result_title_sort(self,obj):
    return format_text_for_sort( obj.title, remove_articles=True )

回复收藏 0 原文

吻泪 2024-12-11 16:38:16

最终我通过滥用faceted=True找到了解决方法。它使 haystack 为 charfield 生成一个 type="string" 字段。这是 SOLR schema.xml 中唯一发生变化的内容

result_title_sort = CharField(indexed=True, faceted=True)

def prepare_result_title_sort(self, article):
    return slugify(article.title.lower())

，现在可以对结果进行排序：

results = results.order_by('result_title_sort_exact')

Eventually I found out a workaround for this by abusing faceted=True. It causes haystack to generate a type="string" field for the charfield. It is is the only thing that changes in the SOLR schema.xml

result_title_sort = CharField(indexed=True, faceted=True)

def prepare_result_title_sort(self, article):
    return slugify(article.title.lower())

and the result can now be sorted:

results = results.order_by('result_title_sort_exact')

回复收藏 0 原文

回忆追雨的时光 2024-12-11 16:38:16

如果字符串被标记化（即有空格），Solr 不会让您对字符串列进行排序。我希望你的标题有多个标记（单词），因此会出现错误。

“字符串术语值可以包含任何有效的字符串，但不应标记化。这些值根据其自然顺序排序。”来自 http://lucene.apache。 org/java/3_0_3/api/core/org/apache/lucene/search/Sort.html

回复收藏 0 原文

悲凉≈ 2024-12-11 16:38:16

只是对已接受答案的后续，我发现只需对我想要排序的文本使用 FacetCharField 而不是 CharField 就足以将其输出为模式中的字符串，从而使其可排序。

我对 haystack / Solr 相当陌生，所以我不确定使用 FacetCharField 的其他影响，但这对我有用。

回复收藏 0 原文

~没有更多了~

关于作者

似最初

暂无简介

文章

27 人气

关注发私信

友情链接

文江博客

Django-haystack 按标题对结果排序

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（5）

关于作者

相关话题

热门标签

推荐作者

tomoekana

无边思念无边月

眼角的笑意。

在风中等你

是你

syong71

友情链接

Django-haystack 按标题对结果排序

如果你对这篇内容有疑问，欢迎到本站社区发帖提问 参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（5）

关于作者

相关话题

热门标签

推荐作者

tomoekana

无边思念无边月

眼角的笑意。

在风中等你

是你

syong71

友情链接

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。