自己的统计数据列或汇总表?
例如,我们有三个表(Post、PostTag 和 Tag):
Post
=======
id
title
body
PostTag
=======
id
post_id
tag_id
Tag
=======
id
name
normalized_name
我们需要计算标签“sport”中包含多少帖子。计数查询很慢(我们有几百万行)。
对于解决方案,我们必须将计数存储在数据库中(添加标签后,计数将更新为 n + 1,删除 n - 1 后)。
这些标签经常由社区/审核添加/删除。
我的想法:
创建自己的汇总表来存储标签数量:
PostTagStat
========
tag_id
frequency
索引是根据列频率创建的。
但是,如果我们在 Tag 表中添加新列来保存 INNER JOIN 查询,这样会更好吗:
Tag
========
id
name
normalized_name
frequency
你觉得怎么样?自己的表或新列?为什么?
For example we have three tables (Post, PostTag and Tag):
Post
=======
id
title
body
PostTag
=======
id
post_id
tag_id
Tag
=======
id
name
normalized_name
We need to count how much posts contain in a tag "sport". The count query is slow (we have some million rows).
For the solution we have to store the count in the database (after adding tags the count is updated with n + 1, after deleting n - 1).
The tags are added / removed frequently by community / moderation.
My thoughts:
Create own summary table for storing the number of tags:
PostTagStat
========
tag_id
frequency
The index is created on column frequency.
But would it better if we add new column into Tag table to save INNER JOIN query like this:
Tag
========
id
name
normalized_name
frequency
What do you think? Own table or new column? And why?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
对于数据查询:
选择单表还是两张表取决于哪种查询的使用次数明显多于另一种查询。
如果大部分查询都是关于标签的基本列,那么两个表会有更好的性能。如果大多数查询需要标签的频率,单表会更好。
然而,这两种策略在性能上几乎没有什么不同,因为只添加了一列(“频率”列,我猜是 4 个字节)。
对于数据的修改:
两表策略比一表策略稍慢,因为数据库需要更多表数据的日志记录。
但我认为修改的性能并不是你添加列以加快查询的动机。所以修改的性能不应该成为你考虑的问题。
当系统多次运行后,您可以根据使用情况改变策略。
For the query of data:
The choice between single table or two tables depends on which kind of query is used significantly more than another one.
If most of queries is about the tags' basic columns, two tables would have better performance. Single table would be better if most of queries require frequency of tags.
These two strategies, however, are hardly so different to performance because there is only one column(the "frequency" column, 4 bytes I guess) added.
For the modification of data:
The two tables strategy is slightly slower than one table strategy because databases need more logging for data of tables.
But I think the performance of modification is not your motivation to add a column for quicker query. So the performance of modification should not be the issue of your considerations.
After the system will be working time after time, you may alter the strategy based on the facts of usage.