django-haystack:选择较大的 SearchIndex 而不是索引数据库
我有一个包含 35 个字段的 UserProfile 模型(Char、Int、Bool、Dec、M2M、FK)。作为搜索视图功能的一部分,其中一个字段需要全文搜索,而其余 34 个字段将用于提供“高级搜索过滤”(使用:__gte、__lte、__exact、__in、__startswith)。 “搜索”查询可以使用 5-35 个字段作为搜索视图条件。
我正在使用 haystack 构建 SearchIndex,目前已添加所有 35 个字段,但这似乎无效,因为我绕过了 django ORM(?)。
Filter Django Haystack results like QuerySet? 的答案表明我可以将单个全文搜索字段存储在 SearchIndex 中,并将 SearchQuerySet 与 django 的 QuerySet 组合起来以获取剩余的 34 个过滤字段。然后我会在 django 模型中的部分或全部这些字段上使用 db_index=True 吗?使用这种两阶段查询合并方法是否可以很好地扩展到数千个结果?
由于我的 UserProfile 模型可能会增长到 300K-2M 条目,因此我试图了解如何最好地索引该模型。作为数据库索引和搜索的新手,我正在寻找有关如何最好地优化数据库的任何见解。
I have a UserProfile Model with 35 fields (Char,Int,Bool,Dec,M2M,FK). As part of the search view functionality, one of the fields requires full-text searching while the remaining 34 fields will be used to provide 'advanced search filtering' (using: __gte,__lte,__exact, __in, __startswith). A 'search' query may use between 5-35 fields as the search view criteria.
I'm using haystack to build a SearchIndex and currently have all 35 fields added, but this seems ineffective since I am bypassing the django ORM (?).
an answer from Filter Django Haystack results like QuerySet? suggests that I could just store the single full-text search field in the SearchIndex and combine the SearchQuerySet with django's QuerySet for the remaining 34 filter fields. Would I then use db_index=True on some or all of these fields in my django model? Would using this 2-stage query merge approach scale well to thousands of results?
Since my UserProfile model could grow to 300K-2M entries, I am trying to understand how best to index this Model. Being new to db indexing and searching, I am looking for any insight on how best to optimize my database.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
混合两者可能无法扩展。当您构建查询集或 SearchQuerySet 时,查询实际上不会运行,直到您在某处请求结果为止,因此从这个意义上说它们是懒惰的。
但是,如果您执行类似
results = [ r.pk for r in searchqueryset ]
的操作,它实际上会针对 haystack/solr 执行该查询。如果您总共查看 200 万个条目,这意味着您的列表可能会返回 200 万个项目。现在您将一个 2M 列表发送到 MySQL(使用 ORM)进行进一步过滤。这显然永远不会扩大规模。
如果你只是坚持使用 haystack 然后继续构建你的搜索查询集,它只会在访问结果时执行一次。还要记住减少 {{ result.object }},因为这也会影响每个结果的数据库。
你可以看看 load_all()、faceting 等。
Mixing the two is probably not scalable. When you're building a queryset or a SearchQuerySet, the query doesn't actually run until you ask for the result somewhere, so they're lazy in that sense.
But if you do something like
results = [ r.pk for r in searchqueryset ]
It actually executes that query against haystack/solr. If you're looking at 2M entries in total, that means your list can potentially return 2M items. And now you're sending a 2M list to MySQL(using ORM) to do further filtering. This will never scale up obviously.
If you just stick to haystack then keep building your searchqueryset, it will only be executed once when the results are accessed. Also keep in mind to reduce {{ result.object }}, because that also hits the database for each result.
You can look at load_all(), faceting etc.