Django：在 Django ORM 中使用多列进行计算和分组

发布于 2025-01-18 04:34:29 字数 1526 浏览 1 评论 0原文

美好的一天，

现在我试图提高对Django Orm的了解，但要努力处理以下任务：

但是首先，数据库看起来像这样：

class DeathsWorldwide(models.Model):
    causes_name = models.CharField(max_length=50, null=False, blank=False)
    death_numbers = models.PositiveIntegerField(default=0)
    country = models.CharField(max_length=50, null=True, blank=True)
    year = models.PositiveIntegerField(null=False, blank=False, validators=[MinValueValidator(1990), MaxValueValidator(2019)])


causes_name    |    death_numbers    |    country    |    year
Alcohol dis.   |    25430            |    Germany    |    1998
Poisoning      |    4038             |    Germany    |    1998
...
Maternal dis.  |    9452             |    Germany    |    1998
Alcohol dis.   |    21980            |    Germany    |    1999
Poisoning      |    5117             |    Germany    |    1999
...
Maternal dis.  |    8339             |    Germany    |    1999

总是每年的所有疾病，每个国家等等...从1990年到2019年的几年范围。

我 - 或更确切地说，要实现这项任务，是所有具有计算的死亡人数的国家的清单，例如...

country    |    death_numbers
France     |    78012
Germany    |    70510
Austria    |    38025

...但是还有一个其他功能：在1990 - 1999年之间，每个国家的死亡人数必须从2000年至2019年的死亡人数中减去。因此，完整列表实际上看起来像这样：

country    |    death_numbers    |    19xx    |    2xxx
France     |    78012            |    36913   |    114925
Germany    |    70510            |    ...     |    ...
Austria    |    38025            |    ...     |    ...

只有一个查询就可以实现这样的结果吗？

感谢您的帮助，祝您有美好的一天！

原文

Good day,

right now I'm trying to improve my knowledge about Django ORM but struggling with the task below:

But first, the database looks like this:

class DeathsWorldwide(models.Model):
    causes_name = models.CharField(max_length=50, null=False, blank=False)
    death_numbers = models.PositiveIntegerField(default=0)
    country = models.CharField(max_length=50, null=True, blank=True)
    year = models.PositiveIntegerField(null=False, blank=False, validators=[MinValueValidator(1990), MaxValueValidator(2019)])


causes_name    |    death_numbers    |    country    |    year
Alcohol dis.   |    25430            |    Germany    |    1998
Poisoning      |    4038             |    Germany    |    1998
...
Maternal dis.  |    9452             |    Germany    |    1998
Alcohol dis.   |    21980            |    Germany    |    1999
Poisoning      |    5117             |    Germany    |    1999
...
Maternal dis.  |    8339             |    Germany    |    1999

Always a block of all diseases for each year, every country and so on...The range of years goes from 1990 to 2019.

What I - or rather let's say the task - want to achieve, is a list of all countries with calculated numbers of deaths, like this...

country    |    death_numbers
France     |    78012
Germany    |    70510
Austria    |    38025

...but with one additional feature: the number of deaths for each country between 1990-1999 must be subtracted from those of 2000-2019. So a full list would actually look something like this:

country    |    death_numbers    |    19xx    |    2xxx
France     |    78012            |    36913   |    114925
Germany    |    70510            |    ...     |    ...
Austria    |    38025            |    ...     |    ...

Is it possible to achieve such a result with only one query?

Thanks for your help and have a great day!

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

一片旧的回忆 2025-01-25 04:34:29

像下面这样的东西应该可以解决问题。

from django.db.models import Sum, Q

t19 = Sum('death_numbers', filter=Q(year__lt=2000))
t20 = Sum('death_numbers', filter=Q(year__gte=2000))
DeathsWorldwide.objects.values('country').annotate(t19=t19, t20=t20, total=t20 - t19)

它生成以下 SQL：

SELECT 
    "base_deathsworldwide"."country", 
    SUM("base_deathsworldwide"."death_numbers") 
        FILTER (WHERE "base_deathsworldwide"."year" < 2000)
        AS "t19", 
    SUM("base_deathsworldwide"."death_numbers") 
        FILTER (WHERE "base_deathsworldwide"."year" >= 2000)
        AS "t20", 
    (
        SUM("base_deathsworldwide"."death_numbers") 
            FILTER (WHERE "base_deathsworldwide"."year" >= 2000) 
        - SUM("base_deathsworldwide"."death_numbers") 
            FILTER (WHERE "base_deathsworldwide"."year" < 2000)
    ) AS "total" 
FROM "base_deathsworldwide" 
GROUP BY "base_deathsworldwide"."country"

这是一个查询，但重复计算。看起来 ORM 不支持它，但我们可以尝试以最小的努力构建原始 SQL：

from django.db import connection  # replace it with connections if using multiple databases
from django.db.models import Sum, Q

t19 = Sum('death_numbers', filter=Q(year__lt=2000))
t20 = Sum('death_numbers', filter=Q(year__gte=2000))
base_query = DeathsWorldwide.objects.values('country').annotate(t19=t19, t20=t20).query
sql, params = base_query.sql_with_params()
template = 'SELECT country, t19, t20, (t20 - t19) AS total FROM ({}) "temp"'

with connection.cursor() as cursor:
    data = cursor.execute(template.format(sql), params).fetchall()

print(data)

我建议测量两个选项的性能并进行比较。如果差异很小（或者后者更小，这也是可能的，因为计划优化可能需要更多时间）或者前者对于您的用例来说足够快，那么最好坚持使用 ORM 解决方案。

Something like the following should do the trick.

from django.db.models import Sum, Q

t19 = Sum('death_numbers', filter=Q(year__lt=2000))
t20 = Sum('death_numbers', filter=Q(year__gte=2000))
DeathsWorldwide.objects.values('country').annotate(t19=t19, t20=t20, total=t20 - t19)

It produces the following SQL:

SELECT 
    "base_deathsworldwide"."country", 
    SUM("base_deathsworldwide"."death_numbers") 
        FILTER (WHERE "base_deathsworldwide"."year" < 2000)
        AS "t19", 
    SUM("base_deathsworldwide"."death_numbers") 
        FILTER (WHERE "base_deathsworldwide"."year" >= 2000)
        AS "t20", 
    (
        SUM("base_deathsworldwide"."death_numbers") 
            FILTER (WHERE "base_deathsworldwide"."year" >= 2000) 
        - SUM("base_deathsworldwide"."death_numbers") 
            FILTER (WHERE "base_deathsworldwide"."year" < 2000)
    ) AS "total" 
FROM "base_deathsworldwide" 
GROUP BY "base_deathsworldwide"."country"

It is one query, but repeats calculation. It looks like ORM doesn't support it, but we can try to build raw SQL with minimum efforts:

from django.db import connection  # replace it with connections if using multiple databases
from django.db.models import Sum, Q

t19 = Sum('death_numbers', filter=Q(year__lt=2000))
t20 = Sum('death_numbers', filter=Q(year__gte=2000))
base_query = DeathsWorldwide.objects.values('country').annotate(t19=t19, t20=t20).query
sql, params = base_query.sql_with_params()
template = 'SELECT country, t19, t20, (t20 - t19) AS total FROM ({}) "temp"'

with connection.cursor() as cursor:
    data = cursor.execute(template.format(sql), params).fetchall()

print(data)

I suggest measuring performance of both options and comparing. If difference is small (or the latter is smaller, which is also possible because plan optimization can take more time) or the former is fast enough for your use case, it's much better to stick with ORM solution.

回复收藏 0 原文

~没有更多了~