处理多个数据库聚合和分页的最佳实践是什么

发布于 2025-01-11 10:40:21 字数 1013 浏览 0 评论 0原文

我有一个 REST 服务（该项目使用 django 和 DRF），它提供来自 MSSQL 数据库的数据。数据库中存储的模型之一是 invcoies，如下所示：

class Invoice(models.Model):
   id = models.IntegerField(primary_key=True)
   customer = models.ForeignKey('project_models.Customer', on_delete=models.CASCADE,related_name='invoices')
   material = models.ForeignKey('project_models.Material', on_delete=models.SET_NULL)
   quantity = models.FloatField()
   revenue = models.FloatField()
   invoice_date = models.DateField()

我想公开每种材料的这些发票的一些聚合数据，例如：

今年的收入总和
过去一年的收入总和
过去一年的收入总和截至今天的日期（例如今天是 2022-03-02，总和将从 2021-01-01 到 2021-03-02）

为了解决这个问题，我可以用最少的时间执行查询聚合（例如每天的收入总和）并迭代结果集以创建包含所有必要信息的条目列表。然而，由于有大量材料，该解决方案可能会导致性能问题。通常可以通过对查询集进行分页来解决此问题。鉴于我仍然获取、计算以及迭代数据库的所有数据，这似乎不是最好的解决方案。

因此我的问题是：在保持分页的同时聚合多个数据的最佳方法是什么。

原文

I have a REST service(the Project uses django and DRF) which serves data from a MSSQL database. One of the models stored inside the DB are invcoies which looks like this:

class Invoice(models.Model):
   id = models.IntegerField(primary_key=True)
   customer = models.ForeignKey('project_models.Customer', on_delete=models.CASCADE,related_name='invoices')
   material = models.ForeignKey('project_models.Material', on_delete=models.SET_NULL)
   quantity = models.FloatField()
   revenue = models.FloatField()
   invoice_date = models.DateField()

I want to expose some aggregated data for these invoices per material like:

Sum of revenue for the current year
Sum of revenue for the past year
Sum of revenue for the past year up to todays date (e.g. today is 2022-03-02, Sum would be from 2021-01-01 to 2021-03-02)

To solve this I could execute a query with minimal aggregation(e.g.revenue sum per day) and iterate over the result set to create a list of entries which contains all necessary informations. However as there are lots of materials this solution can lead to performance issues. This problem would normally be solved by paginating the queryset. Seeing as I still fetch and calculate as well as iterate over all the data of the DB it seems to be not the best solution.

Therefore my question is: what would be a the best approach to aggregate multiple data while still mainting pagination.

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

夜巴黎 2025-01-18 10:40:21

如果您关心的是检索发票信息及其关系（客户、材料）时的性能，您可以查看 select_lated 和 prefetch_lated。

两者都旨在通过减少数据库调用次数来提高性能。

select_lated

select_lated 在单个数据库中收集对象信息（在本例中为数量、收入等）及其关系的检索。表连接是在数据库级别进行的（实际上是在 SQL 中）。它仅适用于外键关系。

prefetch_lated

prefetch_lated 对对象（在我们的例子中为 Invoice）进行一次调用，并对每个关系（customer 和 material）进行一次调用代码> 在我们的例子中）。表连接是在 Python 级别进行的。它适用于多对多和多对一关系。

实施

因此，您应该重写查询，并保持计算不变（当年的收入总和，...）。给定一个阈值日期，查询如下所示：

invoices = Invoice.objects.filter(invoice_date__gt=threshold_date).prefetch_related('customer', 'material')

但是，如果您只对计算结果感兴趣而不是检索对象的信息（针对发票、客户或材料），我建议在数据库级别。

If your concern is performance when retrieving the information of an invoice and its relations (customer, material), you could have a look at select_related and prefetch_related.

Both aim at boosting performances by reducing the number of database calls.

select_related

select_related gathers in a single database call the retrieval of the object's information (in this case quantity, revenue, ...) and of its relations. The table join is made at database level (in SQL actually). It works only with foreign-key relationships.

prefetch_related

prefetch_related makes one call for the object (Invoice in our case), and one call per relation (customer and material in our case). The table join is made at Python level. It works with many-to-many and many-to-one relationships.

Implementation

So you should just rewrite your query, and leave the calculations intact (sum of revenue for the current year, ...). Given a threshold_date, the query looks like :

invoices = Invoice.objects.filter(invoice_date__gt=threshold_date).prefetch_related('customer', 'material')

However, if you are only interested in computing the result and not retrieving the objects' information (for invoice, customer or material), I would recommend performing the calculation at database level.

回复收藏 0 原文

~没有更多了~