在一张表中对数据进行建模与使用两张表相比的优点
假设您正在使用 MySQL 对问答数据库进行建模,我知道有两种方法可以实现模型架构:
- 使用“typeId”为问题和答案创建一个表
- 创建两个单独的表;一个用于提问,一个用于回答
任何人都可以详细说明这两种方法的优点和缺点,以及为什么您会使用一种方法而不是另一种方法?
我自己的观察:
- 方法 2 更加规范化
- 方法 2 需要两个用于 Q 和 A 的“评论”表,或者一个具有复合 PK 的表; (Q 和 A 可能具有相同的 ID)
- 方法 1 可能因自连接等而变得非常复杂
Assuming you were modeling a Q&A database using MySQL, I am aware of two ways to approach the model architecture:
- Create a single table for questions and answers with a "typeId"
- Create two separate tables; one for questions and one for answers
Can anyone elaborate on the advantages and disadvantages of both approaches, and why you would use one approach over the other?
My own observations:
- Approach 2 is more normalized
- Approach 2 requires two "comments" tables for Q's and A's or a single table with composite PKs; (Q's & A's may identical IDs)
- Approach 1 can become very complicated with self joins and so on
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(4)
设计的具体内容实际上取决于您的要求、您想要实现的目标以及您的数据库有多大。
单表方法:
如果您只为每个问题提供/允许一个答案(常见问题解答),您可能可以使用单个表,此时您只有
id,question,answer
字段,而问题没有添加到数据库直到给出答案,或者在答案可用时更新行。2表方法:
一旦每个问题可能有多个答案/评论。我可以选择一个与 @Spredzy 稍有不同的模型,因为为了简单起见,我只包含“电子邮件”之类的所有内容:
message_id、in_reply_to、timestamp、text
。这种简单性不允许您标记特定的内容(答案 VS 评论,除非只有一个答案并且 in_reply_to 答案变成像 SO 那样的评论)。问题是带有in_reply_to IS NULL
的问题。3/更多表方法:
如果您确实希望通过在主表上设置 FIXED-ROW 长度来提高性能,并且不需要显示问题和答案的摘录,而只想知道数字。您可以将文本、任何附件等分开。或者只是因为您想避免自连接,如 @orangepips 所建议的:“最后,自连接很糟糕,并且是一种降低性能的绝佳方法。” )并为所有内容都有一个单独的表。
The specific of the design would really depend of your requirements and what you want to achieve and how huge your database would be.
1-table approach:
You may be able to use a single table in the case where you only provide/allow one answer per question (à la FAQ), where you would only have
id,question,answer
fields and questions are not added to DB until answer is given, or update the row when answer is available.2-table approch:
As soon as there may be more than one answer/comment per question. I could choose a model a little bit different than @Spredzy's as I would just include everything just like "emails":
message_id, in_reply_to, timestamp, text
for simplicity. This simplicity will not allow you to tag specific (answers VS comments unless only one answer and in_reply_to answer becomes comments like on SO). Questions are those within_reply_to IS NULL
.3/more-table approach:
If you really want performance by having FIXED-ROW length on the main table and don't need to display excerpt of questions and answers, but only want to know numbers. You would separate the text, any attachments, etc. Or just because you would want to avoid self joins as suggested by @orangepips: "Finally, self joins suck and present an excellent way to kill performance.") and have a separate tables for everything.
将其建模为两个表。问题可以有多个答案。为问题和答案创建单独的评论表;我认为最有可能的用例不会看到注释数据混合在单个 DML 语句中。
如果您表示对象模型的继承,则由类型列区分的单个表可能有意义,但这里的情况并非如此。此外,对于任何查看该模式的人来说,该表的意图都是混乱的,因为他们需要知道该类型的枚举可能性;我认为可能是一个查找表,但对于两种可能性 - 没有更多 - 似乎是一种浪费。
最后,自连接很糟糕,并提供了一种扼杀性能的极好方法。
Model this as two tables. Questions can have more than one Answer. Create separate Comment tables for Questions and Answers; most likely use case I imagine does not see the comment data intermingling in a single DML statement.
A single table distinguished by a type column might make sense if you were representing an object model's inheritance, but that's not the case here. In addition, the intent of the table is muddied for anyone who reviews the schema because they'd need to know the enumerated possibilities for the type; could be a lookup table I supposed, but for two possibilities - and no more - seems a waste.
Finally, self joins suck and present an excellent way to kill performance.
我会创建 2 个表:
一个代表问题、答案和评论。如果您仔细观察,它们具有相同的核心数据:user_id、文本、日期,加上 type_id 字段以及您可能需要的所有其他字段。
另一个表将是一个非常简单的表:类型
通过这样做,您的模型将具有高度可扩展性,速度更快,并且不会重复数据(规范化)。
最后,从技术上讲,要获得所有问题或一个问题的所有答案,这只是一个简单的连接。
希望它能有所帮助,
I Would create 2 tables :
One that represents Question, Answer and Comment. IIf you look carefully they have the same core data : user_id, text, date, plus a type_id field and all the other field you might need.
The other table would be a pretty simple table : type
By doing that, your model will be highly scalable, faster with no duplication of data (normalization).
Finally, technically talking to get all the question or all the answer of one question it is just a simple join.
Hope it could help,
每种数据类型一张表。如果问题和答案相同(就像 OOP 中的对象一样),一张表就足够了。如果没有,就不会。
具有复合 PK 的单个评论表是正确的,因为评论仍然属于一种类型的对象:评论。他们可以引用 Q 和 A 的事实并不影响这一点。
One table per type of data. If questions and answers are identical (as if objects in OOP), one table suffices. If not, not.
A single comment table with composite PK's is right because the comments are still of one type of object: Comment. The fact that they can reference both Q's and A's doesn't affect that.