将 Django 模型/表拆分为两个模型/表是否有性能优势？

发布于 2024-12-09 14:55:46 字数 1005 浏览 1 评论 0原文

在SO问题7531153中，我询问了将Django模型分成两部分的正确方法 - 使用Django的多表继承或显式定义 OneToOneField。

根据 Luke Sneeringer 的评论，我很好奇将模型一分为二是否会带来性能提升。

我考虑将模型分成两部分的原因是因为我有一些字段始终会完成，而其他字段通常为空（直到项目关闭）。

将通常为空的字段（例如actual_completion_date 和actual_project_costs ）放入 Django 中的单独模型/表中是否会带来性能提升？

分为两种模型

class Project(models.Model):
    project_number = models.SlugField(max_length=5, blank=False,
            primary_key=True)
    budgeted_costs = models.DecimalField(max_digits=10, decimal_places=2)
    submitted_on = models.DateField(auto_now_add=True)

class ProjectExtendedInformation(models.Model):
    project = models.OneToOneField(CapExProject, primary_key=True)
    actual_completion_date = models.DateField(blank=True, null=True)
    actual_project_costs = models.DecimalField(max_digits=10, decimal_places=2,
            blank=True, null=True)

原文

In SO question 7531153, I asked the proper way to split a Django model into two—either using Django's Multi-table Inheritance or explicitly defining a OneToOneField.

Based Luke Sneeringer's comment, I'm curious if there's a performance gain from splitting the model in two.

The reason I was thinking about splitting the model in two is because I have some fields that will always be completed, while there are other fields that will typically be empty (until the project is closed).

Are there performance gains from putting typically empty fields, such as actual_completion_date and actual_project_costs, into a separate model/table in Django?

Split into Two Models

class Project(models.Model):
    project_number = models.SlugField(max_length=5, blank=False,
            primary_key=True)
    budgeted_costs = models.DecimalField(max_digits=10, decimal_places=2)
    submitted_on = models.DateField(auto_now_add=True)

class ProjectExtendedInformation(models.Model):
    project = models.OneToOneField(CapExProject, primary_key=True)
    actual_completion_date = models.DateField(blank=True, null=True)
    actual_project_costs = models.DecimalField(max_digits=10, decimal_places=2,
            blank=True, null=True)

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

菊凝晚露 2024-12-16 14:55:46

事实上，恰恰相反。任何时候涉及多个表时，都需要 SQL JOIN，这对于数据库来说执行起来比简单的 SELECT 查询要慢。就性能而言，这些字段为空这一事实是毫无意义的。

根据表的大小和列数，仅选择需要交互的字段子集可能会更快，但这在 Django 中使用 only 方法就足够简单了

Project.objects.only('project_number', 'budgeted_costs', 'submitted_on')

：产生类似于：

SELECT ('project_number', 'budgeted_costs', 'submitted_on') FROM yourapp_project;

使用单独的模型（和表）仅对模块化的目的有意义 - 这样您就可以对 Project 进行子类化来创建需要其他字段但仍需要所有字段的特定类型的项目通用项目的字段。

Actually, quite the opposite. Any time multiple tables are involved, a SQL JOIN will be required, which is inherently slower for a database to perform than a simple SELECT query. The fact that the fields are empty is meaningless in terms of performance one way or another.

Depending on the size of the table and the number of columns, it may be faster to only select a subset of fields that you need to interact with, but that's easy enough in Django with the only method:

Project.objects.only('project_number', 'budgeted_costs', 'submitted_on')

Which produces something akin to:

SELECT ('project_number', 'budgeted_costs', 'submitted_on') FROM yourapp_project;

Using separate models (and tables) only makes sense for the purposes of modularization -- such that you subclass Project to create a specific kind of project that requires additional fields but still needs all the fields of a generic Project.

回复收藏 0 原文

佼人 2024-12-16 14:55:46

对于您的情况，如果有一些信息仅在关闭时可用，我确实建议制作一个单独的模型。

加盟还不错。特别是在您的情况下，如果您在一个表中包含所有行，而在另一个表中包含更少的行，则连接会更快。我经常使用数据库，在大多数情况下，纯粹通过猜测来判断连接是好是坏。在许多情况下，甚至全表扫描也比使用索引更好。如果性能是一个问题，您需要查看解释，并在可能的情况下分析数据库工作（我知道 Oracle 支持这一点）。但在性能成为问题之前，我更喜欢更快的开发。

我们在 Django 中有一个包含 5M 行的表。我们需要一个仅对于 1K 行不为空的列。光是改一下桌子就需要半天的时间。从头开始重建也需要几个小时。我们选择制作一个单独的模型。

我参加过一场关于领域驱动设计的讲座，其中作者解释说，分离模型，而不是将所有内容都塞进一个类中，这一点很重要，尤其是在开发新应用程序时。

假设您有 CargoAircraft 类别和 PassengerAircraft 类别。将他们放在一个班级并“无缝”工作真是太诱人了，不是吗？但与它们的交互（调度、预订、重量或容量计算）完全不同。

因此，通过将所有内容放在一个类中，您会强迫自己在每个方法中使用一堆 IF 子句，在 Manager 中使用额外的方法，在数据库中使用更大的表进行更困难的调试。基本上你让自己花更多时间开发是为了什么呢？仅用于两件事：1）更少的连接 2）更少的类名。

如果将类分开，事情就会变得容易得多：