具有相似结构的 SQL 表 - 最佳实践
想象一下,我们有一个网站,用户可以在其中阅读文章、查看照片、观看视频等等。每个“项目”都可以被注释,因此我们需要空间来将注释保存在某处。让我们讨论一下这种情况下的存储可能性。
分布式解决方案
我们显然可以为每个“项目”创建单独的表,这样我们就有这样的表:
CREATE TABLE IF NOT EXISTS `article_comments` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`createdBy` int(11) DEFAULT NULL,
`createdAt` int(11) DEFAULT NULL,
`article` int(11) DEFAULT NULL,
`content` text,
PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 AUTO_INCREMENT=1 ;
然后显然是 photo_comments
、video_comments
和很快。这种方式的优点是:
- 我们可以为每个“item”表指定外键,
- 将数据库划分为逻辑部分。
- 导出此类数据没有问题。
缺点:
- 许多表
- 可能难以维护(添加字段等)
集中式解决方案
另一方面,我们可以将所有这些表合并为两个:
CREATE TABLE IF NOT EXISTS `comment_types` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`name` varchar(255) DEFAULT NULL,
PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 AUTO_INCREMENT=1 ;
表
CREATE TABLE IF NOT EXISTS `comments` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`createdBy` int(11) DEFAULT NULL,
`createdAt` int(11) DEFAULT NULL,
`type` int(11) DEFAULT NULL,
`content` text,
PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 AUTO_INCREMENT=1 ;
comment_types
是一个字典,它包含评论项“类型”及其名称的键值对,例如:
1:Articles
2:Photos
3:Videos
表comments
存储带有附加type
字段的常用数据。
优点:
- 维护(添加/删除字段),
- “即时”添加新的评论类型。
缺点:
- 难以迁移/导出,
- 查询大型数据集时可能会导致性能下降。
讨论:
- 哪种存储选项在查询性能方面会更好(假设数据集足够大),
- 同样的性能 - 将在
type
上添加 INDEX消除或大幅减少绩效下降? - 哪种存储选项在管理和未来可能的迁移方面会更好(当然,分布式会更好,但让我们看看集中式存储是否是遥远的选择)
Imagine that we have a website where users can read articles, view photos, watch videos, and many more. Every "item" may be commented, so that we need space to save that comments somewhere. Let's discuss storage possibilities for this case.
Distributed solution
We can obviously create separate tables for each "item", so that we have tables like:
CREATE TABLE IF NOT EXISTS `article_comments` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`createdBy` int(11) DEFAULT NULL,
`createdAt` int(11) DEFAULT NULL,
`article` int(11) DEFAULT NULL,
`content` text,
PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 AUTO_INCREMENT=1 ;
and then obviously photo_comments
, video_comments
, and so on. The advantages of this way are as follows:
- we can specify Foreign Key to every "item" table,
- database is divided into logical parts.
- there is no problem with export of such data.
Disadvantages:
- many tables
- probably hard to maintain (adding fields, etc.)
Centralized solution
On the other hand we can merge all those tables into two:
CREATE TABLE IF NOT EXISTS `comment_types` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`name` varchar(255) DEFAULT NULL,
PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 AUTO_INCREMENT=1 ;
and
CREATE TABLE IF NOT EXISTS `comments` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`createdBy` int(11) DEFAULT NULL,
`createdAt` int(11) DEFAULT NULL,
`type` int(11) DEFAULT NULL,
`content` text,
PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 AUTO_INCREMENT=1 ;
Table comment_types
is a dictionary, it contains key-value pairs of commented item "type" and its name, for example :
1:Articles
2:Photos
3:Videos
Table comments
stores usual data with additional type
field.
Advantages:
- Maintenance (adding / removing fields),
- Adding new comment types "on the fly".
Disadvantages:
- Harder to migrate / export,
- Possible performance drop when querying large dataset.
Discussion:
- Which storage option will be better in terms of query performance (assume that dataset IS big enough for that to be the case),
- Again performance - will adding INDEX on
type
remove or drastically reduce that percormance drop? - Which storage option will be better in terms of management and possible migration in the future (distributed will be better, of course, but let's see if centralized one isn't the one far away)
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
我不确定您为选项 2 列出的缺点是否严重,数据导出可以通过简单的 WHERE 子句轻松完成,而且我不担心性能。选项 2 已正确规范化,并且在现代关系数据库中性能应该非常出色(如果需要,可以使用适当的索引等进一步调整)。
如果我能证明它对于性能、可扩展性或其他原因是必要的,我只会考虑第一个选项 - 但必须说这似乎不太可能。
I'm not sure either of the disadvantages you list for option 2 are serious, data export is easily accomplished with a simple WHERE clause and I wouldn't worry about performance. Option 2 is properly normalised and in a modern relational database performance should be excellent (and can be tweaked further with appropriate indexes etc if necessary).
I would only consider the first option if I could prove that it was necessary for performance, scalability or other reasons - but it must be said that seems unlikely.