在 Django 中跨列聚合
我试图弄清楚是否有一种方法可以使用 Django 的 ORM 进行有点复杂的聚合,或者我是否必须使用 extra() 来坚持一些原始 SQL。
这是我的对象模型(仅显示要点):
class Submission(Models.model)
favorite_of = models.ManyToManyField(User, related_name="favorite_submissions")
class Response(Models.model)
submission = models.ForeignKey(Submission)
voted_up_by = models.ManyToManyField(User, related_name="voted_up_responses")
我想要做的是对给定提交的所有投票进行求和:即对其任何响应的所有投票,然后还包括人数谁将提交标记为最喜欢的。
我使用以下代码进行第一部分的工作;这将返回每个提交的所有回复的总票数:(
submission_list = Response.objects\
.values('submission')\
.annotate(votes=Count('voted_up_by'))\
.filter(votes__gt=0)\
.order_by('-votes')[:TOP_NUM]
因此,在获得总票数后,我按降序排序并返回前 TOP_NUM 提交,以获得“最佳”列表。)
该部分有效。您是否可以建议将喜欢每个提交内容的人数纳入投票中? (为了可移植性,我宁愿避免 extra() ,但我认为这可能是必要的,并且我愿意使用它。)
编辑:在阅读下面的建议后,我意识到我应该更清楚我的问题的描述。理想的解决方案是允许我按总票数(voted_up_by
和 favorited
的总和)排序,然后仅选择数据库中的前几名。如果这是不可能的,那么我愿意加载每个响应的一些字段并在 Python 中进行处理;但由于我将处理 100,000 多条记录,因此最好避免这种开销。 (另外,对 Adam 和 Dmitry:我很抱歉延迟回复!)
I'm trying to figure out if there's a way to do a somewhat-complex aggregation in Django using its ORM, or if I'm going to have to use extra() to stick in some raw SQL.
Here are my object models (stripped to show just the essentials):
class Submission(Models.model)
favorite_of = models.ManyToManyField(User, related_name="favorite_submissions")
class Response(Models.model)
submission = models.ForeignKey(Submission)
voted_up_by = models.ManyToManyField(User, related_name="voted_up_responses")
What I want to do is sum all the votes for a given submission: that is, all of the votes for any of its responses, and then also including the number of people who marked the submission as a favorite.
I have the first part working using the following code; this returns the total votes for all responses of each submission:
submission_list = Response.objects\
.values('submission')\
.annotate(votes=Count('voted_up_by'))\
.filter(votes__gt=0)\
.order_by('-votes')[:TOP_NUM]
(So after getting the vote total, I sort in descending order and return the top TOP_NUM submissions, to get a "best of" listing.)
That part works. Is there any way you can suggest to include the number of people who have favorited each submission in its votes? (I'd prefer to avoid extra() for portability, but I'm thinking it may be necessary, and I'm willing to use it.)
EDIT: I realized after reading the suggestions below that I should have been clearer in my description of the problem. The ideal solution would be one that allowed me to sort by total votes (the sum of voted_up_by
and favorited
) and then pick just the top few, all within the database. If that's not possible then I'm willing to load a few of the fields of each response and do the processing in Python; but since I'll be dealing with 100,000+ records, it'd be nice to avoid that overhead. (Also, to Adam and Dmitry: I'm sorry for the delay in responding!)
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
一种可能性是稍微重新安排当前的查询。如果您尝试如下操作会怎么样:
这将返回 Response 对象的查询集(具有相同提交的对象将被集中在一起)。为了访问相关的提交和/或 favorite_of 列表/计数,您有两个选择:
或者...
基本上第一个选项的优点是仍然是查询集,但您必须确保访问提交对象为了获取有关提交的任何信息(因为查询集中的每个对象在技术上都是一个响应)。第二个选项的优点是成为包含 favorite_of 列表和投票的提交列表,但它不再是查询集(因此请确保您之后不再需要更改查询)。
One possibility would be to re-arrange your current query slightly. What if you tried something like the following:
This will return a queryset of Response objects (objects with the same Submission will be lumped together). In order to access the related submission and/or the favorite_of list/count, you have two options:
or...
Basically the first option has the benefit of still being a queryset, but you have to be sure to access the submission object in order to get any info about the submission (since each object in the queryset is technically a Response). The second option has the benefit of being a list of the submissions with both the favorite_of list as well as the votes, but it is no longer a queryset (so be sure you don't need to alter the query anymore afterwards).
您可以在另一个查询中计算收藏夹数量,例如
在添加两个列表中的值之后:
我在字典中使用 ids,因为提交对象不会相同。如果您需要提交本身,您可以再使用一本字典或存储元组(提交、投票)而不仅仅是投票。
补充:这个解决方案比以前的更好,因为您只有两个数据库请求。
You can count favorites in another query like
After that you add the values from two lists:
I am using ids in the dictionary because Submission objects will not be identical. If you need the Submissions themselves, you may use one more dictionary or store tuple (submission, votes) instead of just votes.
Added: this solution is better than the previous because you have only two DB requests.