MySQL - 我应该在每个子表上使用多列主键吗?
设置:
当我在 stackexchange 上发现这篇很棒的文章时,我试图理解识别关系和非识别关系之间的区别。 识别关系和非识别关系有什么区别?
阅读了一些评论后,我想起了另一个关于我一直遇到的问题的问题。
问题:
我应该在每个子表上使用多列主键吗?这样做的优点/缺点是什么?
为了更好地说明我的问题,我在下面创建了一个示例。我还包括了导致我提出这个问题的评论。
示例:
在我的情况下,我知道 building_id
并且需要获取 bed.data
。
#1 - 我当前的数据库结构:
TABLE { FIELDS }
-----------------------------------------------------------------------
building { id, data }
floor { id, building_id, data }
room {id, floor_id, data }
bed {id, room_id, data }
这种类型的表结构需要我使用一些联接来获取我需要的数据。没什么大不了的,但有点痛苦,因为我经常遇到这种情况。
#2 - 我对 Bill Karwin 建议的数据库结构的解释(请参阅下面的文章评论):
TABLE { FIELDS }
-----------------------------------------------------------------------
building { id, data }
floor { id, building_id, data }
room {id, building_id, floor_id, data }
bed {id, building_id, floor_id, room_id, data }
在我的情况下,此表结构似乎消除了对联接的需要。那么这种表结构有什么缺点呢?我真的很喜欢不做那么多连接语句的想法。
文章评论:
@hobodave:这是“约定优于配置”的论点。一些思想流派认为,每个表都应该为名为 id 的单列伪键定义其主键,该伪键会自动生成其值。像 Rails 这样的应用程序框架已经将其作为默认值进行普及。他们将自然键和多列键视为与使用“遗留”数据库时所需的约定不同的约定。许多其他框架也效仿了这一做法。 – Bill Karwin 2010 年 3 月 10 日 23:06
似乎“正确”构建识别关系会导致令人讨厌的巨大主键。例如,大楼有楼层有房间有床。 Bed 的 PK 为(bed_id、floor_id、room_id、building_id)。奇怪的是,我从未在实践中见过这种情况,也没有听到有人建议将其作为一种做任何事情的方法。 PK中有很多冗余数据。 – 霍博达夫 2010 年 3 月 10 日 23:34
@hobodave:我见过更大的多列主键。但我同意你的观点。考虑多列主键传达更多信息;您可以在 Beds 表中查询特定建筑物中的所有床位,而无需进行任何连接。 – Bill Karwin 2010 年 3 月 11 日 1:00
Setup:
I was trying to understand the difference between identifying and non-identifying relationships when I found this great article on stackexchange. What's the difference between identifying and non-identifying relationships?
After reading a few comments it brought another question to mind about a problem I have been having.
Question:
Should I use multi-column primary keys on every child table and what are the advantages/disadvantages to doing so?
To better illustrate my question I have created an example below. I also included the comments that caused me to ask this question.
Example:
In my situation, I know the building_id
and I need to get bed.data
.
#1 - My current DB structure:
TABLE { FIELDS }
-----------------------------------------------------------------------
building { id, data }
floor { id, building_id, data }
room {id, floor_id, data }
bed {id, room_id, data }
This type of table structure would require me to use a few joins to get the data I need. Not a big deal but kind of a pain since I run into this situation a lot.
#2 - My interpretation of Bill Karwin's suggested DB structure (see article comments below):
TABLE { FIELDS }
-----------------------------------------------------------------------
building { id, data }
floor { id, building_id, data }
room {id, building_id, floor_id, data }
bed {id, building_id, floor_id, room_id, data }
This table structure seems to eliminate the need for joins in my situation. So what are the disadvantages to this table structure? I really like the idea of not doing so many join statements.
Comments From Article:
What's the difference between identifying and non-identifying relationships?
@hobodave: It's the "convention over configuration" argument. Some schools of thought are that every table should define its primary key for a single-column pseudokey named id that auto-generates its values. Application frameworks like Rails have popularized this as a default. They treat natural keys and multi-column keys as divergent from their conventions, needed when using "legacy" databases. Many other frameworks have followed this lead. – Bill Karwin Mar 10 '10 at 23:06
It seems like "properly" constructing identifying relationships would lead to obnoxiously huge primary keys. e.g. Building has Floor has Room has Bed. The PK for Bed would be (bed_id, floor_id, room_id, building_id). It seem's strange that I've never seen this in practice, nor heard it suggested as a way to do anything. That's a lot of redundant data in the PK. – hobodave Mar 10 '10 at 23:34
@hobodave: I have seen multi-column primary keys that are even larger. But I take your point. Consider that multi-column primary keys convey more information; you can query the Beds table for all beds in a specific building without doing any joins. – Bill Karwin Mar 11 '10 at 1:00
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
该数据已标准化
该表不是(坏主意)
选择
仅索引,则表本身将永远被访问。那么为什么要复制索引呢?无论如何,MySQL 永远不需要读取主表。id+building_id+ 之外,您不能在表
,这意味着您可能必须使用比模型 A 中所需的更多的键空间。要么您需要添加一个额外的索引(这将拖曳 PK 的完整副本)。bed
中使用floor_id
Floor_id简而言之
我认为 Model B 的好处绝对为零,但有很多缺点,永远不要使用它!
This data is normalized
This table is not (bad idea)
select
only indexes, the tables themselves will never be accessed. So why are you duplicating the indexes? MySQL will never need to read the main table anyway.floor_id
in tablebed
other than usingid+building_id+floor_id
, This means that you may have to use much more key-space than needed in Model A. Either that or you need to add an extra index (which will drag around a full copy of the PK).In short
I see absolutly zero benefit and a whole lot of drawbacks in Model B, never use it!
我认为你的#2 不太可能是比尔·卡文的意思。通常,“id”表示自动数字序列。我认为他的意思更有可能是这样的。组成主键的列位于星号之间。
不过,我不确定您可能还有哪些其他“床”专栏。双人床、双人床、大号床、特大号床?这可能是有道理的。如果是这样的话,那么这个表
还远未达到“非规范化”。事实上,它是在5NF中。
如果您测试这两种模式的性能,您可能会发现这一种模式大多数时候都围绕您的#1 运行。在我运行的这批查询中,速度快了大约 30 倍。
I think it's pretty unlikely that your #2 is what Bill Karwin meant. Usually, "id" implies an automatic numeric sequence. I think it's more likely he meant something along these lines. Columns that make up primary keys are between asterisks.
I'm not sure what other columns you might have for "bed", though. Twin, Full, Queen, King? That could make sense. If that's the case, then this table
is far from "denormalized". In fact, it's in 5NF.
If you test the performance of these two schemas, you'll probably find that this one runs rings around your #1 most of the time. In the batch of queries I ran, it's about 30 times faster.
第一个表结构是规范化的经典结构。但不幸的是,这不适用于大型项目。因为如果您的表构建包含许多数据行,例如数百万行,具体取决于您使用的城市或国家,那么您的连接将会非常慢。
因此,在实际项目中,使用包含所有聚合信息的非规范化表。您可以直接使用此类表,也可以使用 sphinx 等独立服务器来搜索数据。关于三个字段的主键,我认为在这种情况下这个是多余的。因为
The first tables structure is normalized,classical structure. But unfortunately this one is not applicable for the big project. Because if your table building contains many data rows e.g. million depending on which cities or countries you use your join will be very slow.
So in real projects denormalized tables are used which contain all aggregated info. You can work with such tables directly or use standalone servers like sphinx for searching data. Regarding primary key on three fields I think in this case this one is redundant. Because