具有百万行的 Django 表

发布于 2024-08-17 14:16:00 字数 613 浏览 7 评论 0原文

我有一个包含 2 个应用程序的项目（书籍和阅读器）。

图书应用程序有一个包含 400 万行的表，其中包含以下字段：

 book_title = models.CharField(max_length=40)
 book_description = models.CharField(max_length=400)

为了避免查询包含 400 万行的数据库，我正在考虑按主题划分它（20 个模型，20 个表，200.000 行（ book_horror、book_drammatic、ecc ）。

在“reader”应用程序中，我正在考虑插入此字段：

reader_name = models.CharField(max_length=20, blank=True)
book_subject = models.IntegerField()
book_id = models.IntegerField()

因此，我正在考虑使用整数“book_subject”（允许访问适当的表）和“book_id”（允许访问书籍）而不是外键在“book_subject”指定的表中）

一个好的解决方案可以避免查询具有 400 万行的表？

是否有

原文

I have a project with 2 applications ( books and reader ).

Books application has a table with 4 milions of rows with this fields:

 book_title = models.CharField(max_length=40)
 book_description = models.CharField(max_length=400)

To avoid to query the database with 4 milions of rows, I am thinking to divide it by subject ( 20 models with 20 tables with 200.000 rows ( book_horror, book_drammatic, ecc ).

In "reader" application, I am thinking to insert this fields:

reader_name = models.CharField(max_length=20, blank=True)
book_subject = models.IntegerField()
book_id = models.IntegerField()

So instead of ForeignKey, I am thinking to use a integer "book_subject" (which allows to access the appropriate table) and "book_id" (which allows to access the book in the table specified in "book_subject").

Is a good solution to avoid to query a table with 4 milions of rows ?

Is there an alternative solution?

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

三五鸿雁 2024-08-24 14:16:00

正如许多人所说，将表拆分为更小的表（水平分区甚至分片）还为时过早。数据库是为了处理这种大小的表而设计的，因此您的性能问题可能出在其他地方。

索引是第一步，听起来你已经做到了这一点。数据库可以使用索引处理 400 万行。

其次，检查您正在运行的查询数量。您可以使用 django 调试工具栏之类的工具来完成此操作，并且您经常会惊讶地发现有多少不必要的查询被执行。

下一步是缓存，对大多数用户未更改的页面或部分页面使用 memcached。在这里，您只需付出很少的努力就能获得最大的性能提升。

如果你真的真的需要拆分表，最新版本的 django (1.2 alpha) 可以处理分片（例如多数据库），并且你应该能够手动编写水平分区解决方案（postgres 提供了 in-db方法来做到这一点）。请不要使用流派来分割表格！选择一些你永远不会改变并且在查询时你总是知道的东西。就像作者一样，除以姓氏的第一个字母之类的。这需要付出很大的努力，并且对于不是特别大的数据库来说有很多缺点——这就是为什么这里的大多数人都建议不要这样做！

[编辑]

我遗漏了非规范化！将常见计数、总和等放入例如作者表中，以防止常见查询上的连接。缺点是你必须自己维护它（直到 django 添加 DenormalizedField）。我会在开发过程中查看这一点，以获取清晰、简单的情况，或者在缓存失败后查看这一点 --- 但在分片或水平分区之前。

回复收藏 0 原文