像这样设计标签表有什么好处,什么时候最好这样做?
我正在阅读这篇文章:http://sqlserverpedia.com/wiki/Understanding_the_StackOverflow_Database_Schema,作者写了一些特别的东西关于标签和帖子标签表:
标签 - 好的,是时候讲一下要点了。 StackOverflow 将每个问题限制为五个标签(答案不是 tagged),所有五个都存储在该字段中。例如,对于 问题305223,Tags字段是 “
”。由你决定 使这些正常化。 Sam Saffron 的 SoSlow 实用程序自动创建 Tags 和 PostsTags 表来规范化这些。接下来检查内容 仔细检查标签字段。 StackOverflow 允许标签中包含句点, 例如 .NET 标记和 ASP.NET 标记。然而,在数据库中,这些是 存储为“aspûnet”。只是需要注意一些事情。
<有趣><与编程无关>
我想了解的是,通过这种制作标签表并将其绑定到帖子的方式,我们能获得什么真正的好处?
更新:我询问的是以下功能:标签在同一帖子记录中保存为字符串,然后在帖子上循环使用外部工具并将标签复制到标签和帖子标签表中。
I was reading this article: http://sqlserverpedia.com/wiki/Understanding_the_StackOverflow_Database_Schema and the writer wrote something special about the tags and posts tags tables:
Tags - okay, time to blow out of the bullet points for a second.
StackOverflow limits you to five tags per question (answers aren't
tagged), and all five are stored in this field. For example, for
question 305223, the Tags field is
"<offtopic><fun><not-programming-related><jon-skeet>
". It's up to you
to normalize these. Sam Saffron's SoSlow utility automatically creates
Tags and PostsTags tables to normalize these. Next, check the contents
of the Tag field carefully. StackOverflow allows periods in the tag,
like the .NET tag and ASP.NET tag. However, in the database, these are
stored as "aspûnet". Just something to be aware of.
What I want to understand are what are the real benefits we can gain from this way of making the Tags table and binding it to Posts?
Update: I am asking about the feature where the tags saved as string in the same Post record, then an external tool loop on Posts and copy tags to Tags and PostsTags tables.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
他们这样做是为了获得非规范化表和规范化表的好处。
他们将标签保存在带有问题的列中,因此在显示问题时很容易显示它们。它们不需要与 Tags 或 PostTags 表连接。从问题表中获取所有必要的信息。速度。
在添加新问题时,同样的方法也更适合事务处理。使用规范化方法时,您需要将信息保存到如下三个表中。
这比简单地插入 Question 表会花费更多的时间(可能需要锁定 Tags、PostsTags)。您再次获得速度,但插入。用户感知的网站速度增加。做同样事情的后台进程对用户来说是不可见的,因此不会影响他。
另一方面,他们正在解析同一列并将其内容复制到 Tags 和 PostsTags。它带来什么?第一个新标签定期创建,从问题表中找到这个新标签会很慢。第二篇帖子标签将为您提供哪些问题与哪些标签相关的标准化视图。很多人只浏览他们感兴趣的标签。因此,提供此信息至关重要。你又获得了什么速度。
They are doing this to gain benefits of both denormalized table and normalized tables.
They are saving tags in a column with question therefore they are easy to show when a question is displayed. They do not need to join with Tags or PostTags table. Take all necessary information from 1 table that is Question. Speed.
Also same approach is better for transaction processing, when adding new question.When normalized approach used, you need to save information to three tables like below.
This will take more time (may need to lock Tags, PostsTags) than simply inserting Question table. Again you gain speed but for inserting. User perceived web site speed increases. A background process which is doing same thing is invisible to user therefore does not affect him.
On the other hand they are parsing this same column and copying its contents to Tags and PostsTags. What does it bring? First new Tags created periodically, finding this new tags from Question Table would be slow. Second PostsTags will give you normalized view of which questions are related to which tags. A lot of people only browse their interesting tags only. Therefore providing this information is essential. What do you gain again speed.
这称为非规范化表 - 这意味着您不需要加入标签表并且所有标签都可以通过帖子一次性检索。
这样做是出于性能原因 - 由于数据库中的数据量很大,连接成为真正的瓶颈。以这种方式让标签违反 1NF 确保在获取帖子时不需要加入。
This is called denormalizing a table - it means you don't need to join on a tags table and that all the tags can be retrieved in one go with the post.
It is done for performance reasons - with the amount of data SO has in its database, joins become a real bottleneck. Having the tags violate 1NF this way ensure no join is needed when getting a post.