数据库表原子性相关问题
我正在创建一个论坛页面,为其创建了以下数据库模式:
Forum(questionId, postedByUserId, questionSubject, questionBody, TagIds);
Tags(tagId, tagName);
论坛中的条目将类似于:
(1, 1, 'sample subject', 'sample body', '1 4 2') ...
标签的示例条目将是:
(1, 'C'), (2, 'C++'), (3, 'Java'), (4, 'Data Structure') ...
现在的问题是,第一个范式表示所有字段都应该是原子的,而这不是原子的在这种情况下很满意,但我认为节省了空间,就像我创建一个新的 forum_tag(questionId, tagId); 表格一样,然后我认为这将占用数据库更多的空间,但从概念上讲是正确的。
所以我不知道我应该做什么,是做我现在正在做的事情,还是按照标准化使列原子化。
请解释哪个更好以及为什么,因为我发现这样的问题的情况很多,但我一直不清楚我应该做什么!
所以请帮忙。
提前致谢 :)
I am creating a forum page for which i have created following database schema:
Forum(questionId, postedByUserId, questionSubject, questionBody, TagIds);
Tags(tagId, tagName);
Entries in Forum will be something like:
(1, 1, 'sample subject', 'sample body', '1 4 2') ...
And sample entries of Tags will be:
(1, 'C'), (2, 'C++'), (3, 'Java'), (4, 'Data Structure') ...
Now the problem is that first normal form says that all the fields should be atomic which is not satisfied in this case but i think space is saved as if i were creating a new table of forum_tag(questionId, tagId);
then i think this will take more space on database, but would be correct conceptually.
So i don't know what should i do whether to do what i am doing right now or to make the coloumns atomic as per the normalization.
Please explain which is better and why because there are many cases when i found such problem but all the time i remain ambiguous that what should i do!
So please help.
Thanks in advance :)
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
数据库中的空间很便宜。检索时间随空间的不同而变化,因此成本要低得多。
但是,检索时间也会受到键控访问策略是否有效以及由查询优化器选择的影响。效果可能是巨大的。
考虑对您建议的模式进行以下检索:查找相关标签之一为“4”的所有论坛条目。对于大多数 DBMS,此查询需要对整个论坛表进行顺序扫描。根据数据量,这可能是数百万个磁盘 I/O。
现在考虑一个联结表
进一步,假设除了 (ForumId, TagId) 上的自动索引之外,在 TagId 上还有一个索引。
相同的查询将导致在索引之一中查找值“4”的索引,并且需要为仅有十几个磁盘 I/O。
标准化的目标之一是对所有数据进行密钥访问。第一范式就是为了实现这一目标。
我在现实生活中遇到过这样的情况:第一个范式或更好的模式可以与带有嵌入列表的模式进行比较。这些情况下的速度差异约为 50 比 1。
Space is cheap in a database. Retrieval time, which varies with space, is much less cheap.
however, retrieval time can also be affected by whether a keyed access strategy will work, and will be chosen by the query optimizer. The effect can be dramatic.
Consider the following retrieval oon the schema you proposed: find all the forum entries where one of the related tags is "4". For most DBMSes, this query will require a sequntial scan through the entire Forums table. Depending on data volume, this could be millions of disk I/O s.
Now consider a junction table
Further, let's say that there's an index on TagId in addition to the automatic index on (ForumId, TagId)
The same query would result in a index lookup of value "4" in one of the indexes, and require as few as a dozen disk I/Os.
One of the goals of normalization is keyed access to all data. first normal form is pursuant to that goal.
I've had real life situations where a first normal form or better schema can be compared to a schema with embedded lists. The speed difference in those cases was something like 50 to 1.
我会同意让你的字段原子化。大多数时候,您有一个字段将值混杂到一个字段中,当您必须不断地撬开这些数据以进行报告或分析时,您最终会感到头疼。如果您想做一些像获取标签计数这样简单的事情该怎么办?由于非原子数据,您甚至无法快速执行
SELECT COUNT()
。创建交叉引用具有不同标签的论坛帖子的查询时,您还会遇到大问题。假设您想要查询所有标有“编程”的论坛帖子?当您尝试查询或分析数据时,预先使数据原子化可以让您更容易使用。这么说吧,数据在进入数据库之前就开始泛化了,但您总是希望从中获得细节。尝试将数据保存在离散的块中,以便更容易了解细节。
I would go with making your fields atomic. Most times you have a field that jumbles values up into one field, you end up with headaches later when you have to constantly pry that data apart for reporting or analysis. What if you wanted to do something as simple as getting a count of your tags? Because of non-atomic data you wouldn't even be able to do a quick
SELECT COUNT()
. You'd also have big issues creating queries that cross reference forum posts with different tags. Say you wanted a query of all forum posts tagged with "programming"?Making the data atomic up front makes it much easier to work with down the road, when you're trying to query or analyze it. Put it this way, data starts out generalized before it enters your DB, but you're always going to want specifics from it. Try to keep the data in discrete chunks so that it's easier to get at the specifics.
您应该创建第三个表来表示论坛和标签之间的关系:
ForumTags(ftID, Forum, Tag)
这样,您的数据库就可以正确规范化,因此向论坛添加和删除标签变得更加容易。不要担心数据库中可能占用的额外空间,就像 Walter Mitty 所说:空间很便宜,检索则更便宜。一般规则:标准化始终是一个好主意,除非明确证明不是这样
You should make a third table representing the relationship between Forum and Tags:
ForumTags(ftID, Forum, Tag)
This way, your database is properly normalized, so adding and removing tags to forums becomes much easier. Don't worry about the extra space it might take in the database, like Walter Mitty says: Space is cheap, retrieval much less so. As a general rule: Normalization is always a good idea, unless explicitly proven otherwise