将 Django 模型/表拆分为两个模型/表是否有性能优势?
在SO问题7531153中,我询问了将Django模型分成两部分的正确方法 - 使用Django的多表继承或显式定义 OneToOneField。
根据 Luke Sneeringer 的评论,我很好奇将模型一分为二是否会带来性能提升。
我考虑将模型分成两部分的原因是因为我有一些字段始终会完成,而其他字段通常为空(直到项目关闭)。
将通常为空的字段(例如actual_completion_date 和actual_project_costs )放入 Django 中的单独模型/表中是否会带来性能提升?
分为两种模型
class Project(models.Model):
project_number = models.SlugField(max_length=5, blank=False,
primary_key=True)
budgeted_costs = models.DecimalField(max_digits=10, decimal_places=2)
submitted_on = models.DateField(auto_now_add=True)
class ProjectExtendedInformation(models.Model):
project = models.OneToOneField(CapExProject, primary_key=True)
actual_completion_date = models.DateField(blank=True, null=True)
actual_project_costs = models.DecimalField(max_digits=10, decimal_places=2,
blank=True, null=True)
In SO question 7531153, I asked the proper way to split a Django model into two—either using Django's Multi-table Inheritance or explicitly defining a OneToOneField.
Based Luke Sneeringer's comment, I'm curious if there's a performance gain from splitting the model in two.
The reason I was thinking about splitting the model in two is because I have some fields that will always be completed, while there are other fields that will typically be empty (until the project is closed).
Are there performance gains from putting typically empty fields, such as actual_completion_date
and actual_project_costs
, into a separate model/table in Django?
Split into Two Models
class Project(models.Model):
project_number = models.SlugField(max_length=5, blank=False,
primary_key=True)
budgeted_costs = models.DecimalField(max_digits=10, decimal_places=2)
submitted_on = models.DateField(auto_now_add=True)
class ProjectExtendedInformation(models.Model):
project = models.OneToOneField(CapExProject, primary_key=True)
actual_completion_date = models.DateField(blank=True, null=True)
actual_project_costs = models.DecimalField(max_digits=10, decimal_places=2,
blank=True, null=True)
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
事实上,恰恰相反。任何时候涉及多个表时,都需要 SQL JOIN,这对于数据库来说执行起来比简单的 SELECT 查询要慢。就性能而言,这些字段为空这一事实是毫无意义的。
根据表的大小和列数,仅选择需要交互的字段子集可能会更快,但这在 Django 中使用
only
方法就足够简单了:产生类似于:
使用单独的模型(和表)仅对模块化的目的有意义 - 这样您就可以对
Project
进行子类化来创建需要其他字段但仍需要所有字段的特定类型的项目通用项目
的字段。Actually, quite the opposite. Any time multiple tables are involved, a SQL JOIN will be required, which is inherently slower for a database to perform than a simple SELECT query. The fact that the fields are empty is meaningless in terms of performance one way or another.
Depending on the size of the table and the number of columns, it may be faster to only select a subset of fields that you need to interact with, but that's easy enough in Django with the
only
method:Which produces something akin to:
Using separate models (and tables) only makes sense for the purposes of modularization -- such that you subclass
Project
to create a specific kind of project that requires additional fields but still needs all the fields of a genericProject
.对于您的情况,如果有一些信息仅在关闭时可用,我确实建议制作一个单独的模型。
加盟还不错。特别是在您的情况下,如果您在一个表中包含所有行,而在另一个表中包含更少的行,则连接会更快。我经常使用数据库,在大多数情况下,纯粹通过猜测来判断连接是好是坏。在许多情况下,甚至全表扫描也比使用索引更好。如果性能是一个问题,您需要查看解释,并在可能的情况下分析数据库工作(我知道 Oracle 支持这一点)。但在性能成为问题之前,我更喜欢更快的开发。
我们在 Django 中有一个包含 5M 行的表。我们需要一个仅对于 1K 行不为空的列。光是改一下桌子就需要半天的时间。从头开始重建也需要几个小时。我们选择制作一个单独的模型。
我参加过一场关于领域驱动设计的讲座,其中作者解释说,分离模型,而不是将所有内容都塞进一个类中,这一点很重要,尤其是在开发新应用程序时。
假设您有 CargoAircraft 类别和 PassengerAircraft 类别。将他们放在一个班级并“无缝”工作真是太诱人了,不是吗?但与它们的交互(调度、预订、重量或容量计算)完全不同。
因此,通过将所有内容放在一个类中,您会强迫自己在每个方法中使用一堆 IF 子句,在 Manager 中使用额外的方法,在数据库中使用更大的表进行更困难的调试。基本上你让自己花更多时间开发是为了什么呢?仅用于两件事:1)更少的连接 2)更少的类名。
如果将类分开,事情就会变得容易得多:
,因此开发速度更快。
For your case, if there's some info that's available only when it's closed, I'd indeed advise making a separate model.
Joins aren't bad. Especially in your case the join will be faster if you have all rows in one table and much fewer rows in the other one. I've worked with databases a lot, and in most cases it's a pure guess to tell if a join will be better or worse. Even a full table scan is better than using an index in many cases. You need to look at the EXPLAINs, if performance is a concern, and profile the Db work if possible (I know Oracle supports this.) But before performance becomes an issue, I prefer quicker development.
We have a table in Django with 5M rows. And we needed a column that would have been not null only for 1K rows. Just altering the table would have taken half a day. Rebuilding from scratch also takes a few hours. We've chosen to make a separate model.
I've been to a lecture on Domain Driven Design in which the author explained that it is important, especially in development of a new app, to separate models, to not stuff everything in one class.
Let's say you have a CargoAircraft class and PassengerAircraft. It's so tempting to put them in one class and work "seamlessly", isn't it? But interactions with them (scheduling, booking, weight or capacity calculations) are completely different.
So, by putting everything in one class you force yourself to bunch of IF clauses in every method, to extra methods in Manager, to harder debugging, to bigger tables in the DB. Basically you make yourself spend more time developing for the sake of what? For only two things: 1) fewer joins 2) fewer class names.
If you separate the classes, things go much easier:
hence, faster development.