Django-haystack 按标题对结果排序

发布于 2024-12-04 16:38:16 字数 454 浏览 4 评论 0原文

我想按标题对 django-haystack 查询的结果进行排序。

from haystack.query import SearchQuerySet
for result in SearchQuerySet().all().order_by('result_title_sort'):
    print result.result_title_sort

但是我不断收到此错误:

字段“result_title_sort”中的术语比文档多,但无法对标记化字段进行排序

这是我的干草堆字段定义:

result_title_sort = CharField(indexed=True, model_attr='title')

我应该如何定义该字段,以便可以对其进行排序?

I'd like to sort the results of my django-haystack query by title.

from haystack.query import SearchQuerySet
for result in SearchQuerySet().all().order_by('result_title_sort'):
    print result.result_title_sort

I keep getting this error however:

there are more terms than documents in field "result_title_sort", but it's impossible to sort on tokenized fields

This is my haystack field definition:

result_title_sort = CharField(indexed=True, model_attr='title')

How should I define the field, so I can sort on it?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(5

煞人兵器 2024-12-11 16:38:16

continue

Thank you Mark Chackerian, your solution does work for sorting. I however I still felt slightly uncomfortable modifying the output of the auto-generated schema.xml. I found a solution by using Solr's <dynamicField> field types. The Django-Haystack docs aren't that clear on how to go about using dynamic fields, but basically if you just include a new key in the dict returned by the SearchIndex's prepare() and a dynamicField will be added to the document at index time.

Remove your existing attribute from the SearchIndex

#result_title_sort = CharField(indexed=True, model_attr='title') 
def prepare(self, obj):
    prepared_data['result_title_sort_s'] #notice the "_s"

The above will create a dynamic string field in the document called result_title_sort_s by which you will be able to sort your results.

寻梦旅人 2024-12-11 16:38:16

continue

You need to make sure that your sorting field is non-tokenized in SOLR. It's not very clear from the Haystack documentation how to make it non-tokenized using Haystack. My solution was to change the SOLR schema.xml generated by Haystack so that the field type is "string" instead of "text". So instead of having something like this in your schema.xml:

<field name="result_title_sort" type="text" indexed="true" stored="true" multiValued="false" />

you need to have this:

<field name="result_title_sort" type="string" indexed="true" stored="true" multiValued="false" />

Since you might be regenerating your schema.xml many times, I recommend creating a build script to create the schema file, which will automatically change the schema for you. Something like this:

./manage.py build_solr_schema | sed 's/<field name=\"result_title_sort\" type=\"text\"/<field name=\"result_title_sort\" type=\"string\"/' > schema.xml

(or for Haystack 2.0)

./manage.py build_solr_schema | sed 's/<field name=\"name_sort\" type=\"text_en\"/<field name=\"name_sort\" type=\"string\"/' > schema.xml

After I did this, my sorting worked in alphabetical order. However, there were still some problems because the sorting was ASCII order, which put lowercase and non-Roman characters at the end. So I created the following method to prepare the text for sorting, which uses the unidecode module to convert non-Roman characters to ASCII. It also removes initial spaces, "the" and "a":

def format_text_for_sort(sort_term,remove_articles=False):
    ''' processes text for sorting field:
        * converts non-ASCII characters to ASCII equivalents
        * converts to lowercase
        * (optional) remove leading a/the
        * removes outside spaces
    '''
    sort_term = unidecode(sort_term).lower().strip()
    if remove_articles:
        sort_term =  re.sub(r'^(a\s+|the\s+)', '', sort_term )
    return sort_term

Then you need to add a prepare method in your search_indexes.py to call the formatter, something like

def prepare_result_title_sort(self,obj):
    return format_text_for_sort( obj.title, remove_articles=True )
吻泪 2024-12-11 16:38:16

最终我通过滥用faceted=True找到了解决方法。它使 haystack 为 charfield 生成一个 type="string" 字段。这是 SOLR schema.xml 中唯一发生变化的内容

result_title_sort = CharField(indexed=True, faceted=True)

def prepare_result_title_sort(self, article):
    return slugify(article.title.lower())

,现在可以对结果进行排序:

results = results.order_by('result_title_sort_exact')

Eventually I found out a workaround for this by abusing faceted=True. It causes haystack to generate a type="string" field for the charfield. It is is the only thing that changes in the SOLR schema.xml

result_title_sort = CharField(indexed=True, faceted=True)

def prepare_result_title_sort(self, article):
    return slugify(article.title.lower())

and the result can now be sorted:

results = results.order_by('result_title_sort_exact')
回忆追雨的时光 2024-12-11 16:38:16

如果字符串被标记化(即有空格),Solr 不会让您对字符串列进行排序。我希望你的标题有多个标记(单词),因此会出现错误。

“字符串术语值可以包含任何有效的字符串,但不应标记化。这些值根据其自然顺序排序。”来自 http://lucene.apache。 org/java/3_0_3/api/core/org/apache/lucene/search/Sort.html

Solr won't let you sort on a string column if the the string is tokenized (i.e. has spaces). I expect your titles have more than one token (words), hence the error.

"String term values can contain any valid String, but should not be tokenized. The values are sorted according to their natural order." From http://lucene.apache.org/java/3_0_3/api/core/org/apache/lucene/search/Sort.html

悲凉≈ 2024-12-11 16:38:16

只是对已接受答案的后续,我发现只需对我想要排序的文本使用 FacetCharField 而不是 CharField 就足以将其输出为模式中的字符串,从而使其可排序。

我对 haystack / Solr 相当陌生,所以我不确定使用 FacetCharField 的其他影响,但这对我有用。

Just a followup to the accepted answer, I found that simply using a FacetCharField instead of CharField for the text I want to sort by was enough to output it as a string in the schema, and thus make it sortable.

I'm fairly new to haystack / Solr so I'm unsure of the other implications of using FacetCharField but this worked for me.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文