MySQL - 我应该在每个子表上使用多列主键吗?

发布于 2024-12-03 22:23:42 字数 1934 浏览 0 评论 0原文

设置:

当我在 stackexchange 上发现这篇很棒的文章时,我试图理解识别关系和非识别关系之间的区别。 识别关系和非识别关系有什么区别?

阅读了一些评论后,我想起了另一个关于我一直遇到的问题的问题。


问题:

我应该在每个子表上使用多列主键吗?这样做的优点/缺点是什么?

为了更好地说明我的问题,我在下面创建了一个示例。我还包括了导致我提出这个问题的评论。


示例:

在我的情况下,我知道 building_id 并且需要获取 bed.data

#1 - 我当前的数据库结构:

TABLE { FIELDS }
-----------------------------------------------------------------------
building { id, data } 
floor { id, building_id, data }
room {id, floor_id, data }
bed {id, room_id, data }

这种类型的表结构需要我使用一些联接来获取我需要的数据。没什么大不了的,但有点痛苦,因为我经常遇到这种情况。

#2 - 我对 Bill Karwin 建议的数据库结构的解释(请参阅下面的文章评论):

TABLE { FIELDS }
-----------------------------------------------------------------------
building { id, data } 
floor { id, building_id, data }
room {id, building_id, floor_id, data }
bed {id, building_id, floor_id, room_id, data }

在我的情况下,此表结构似乎消除了对联接的需要。那么这种表结构有什么缺点呢?我真的很喜欢不做那么多连接语句的想法。


文章评论:

识别和非识别之间有什么区别-识别关系?

@hobodave:这是“约定优于配置”的论点。一些思想流派认为,每个表都应该为名为 id 的单列伪键定义其主键,该伪键会自动生成其值。像 Rails 这样的应用程序框架已经将其作为默认值进行普及。他们将自然键和多列键视为与使用“遗留”数据库时所需的约定不同的约定。许多其他框架也效仿了这一做法。 – Bill Karwin 2010 年 3 月 10 日 23:06

似乎“正确”构建识别关系会导致令人讨厌的巨大主键。例如,大楼有楼层有房间有床。 Bed 的 PK 为(bed_id、floor_id、room_id、building_id)。奇怪的是,我从未在实践中见过这种情况,也没有听到有人建议将其作为一种做任何事情的方法。 PK中有很多冗余数据。 – 霍博达夫 2010 年 3 月 10 日 23:34

@hobodave:我见过更大的多列主键。但我同意你的观点。考虑多列主键传达更多信息;您可以在 Beds 表中查询特定建筑物中的所有床位,而无需进行任何连接。 – Bill Karwin 2010 年 3 月 11 日 1:00

Setup:

I was trying to understand the difference between identifying and non-identifying relationships when I found this great article on stackexchange. What's the difference between identifying and non-identifying relationships?

After reading a few comments it brought another question to mind about a problem I have been having.


Question:

Should I use multi-column primary keys on every child table and what are the advantages/disadvantages to doing so?

To better illustrate my question I have created an example below. I also included the comments that caused me to ask this question.


Example:

In my situation, I know the building_id and I need to get bed.data.

#1 - My current DB structure:

TABLE { FIELDS }
-----------------------------------------------------------------------
building { id, data } 
floor { id, building_id, data }
room {id, floor_id, data }
bed {id, room_id, data }

This type of table structure would require me to use a few joins to get the data I need. Not a big deal but kind of a pain since I run into this situation a lot.

#2 - My interpretation of Bill Karwin's suggested DB structure (see article comments below):

TABLE { FIELDS }
-----------------------------------------------------------------------
building { id, data } 
floor { id, building_id, data }
room {id, building_id, floor_id, data }
bed {id, building_id, floor_id, room_id, data }

This table structure seems to eliminate the need for joins in my situation. So what are the disadvantages to this table structure? I really like the idea of not doing so many join statements.


Comments From Article:

What's the difference between identifying and non-identifying relationships?

@hobodave: It's the "convention over configuration" argument. Some schools of thought are that every table should define its primary key for a single-column pseudokey named id that auto-generates its values. Application frameworks like Rails have popularized this as a default. They treat natural keys and multi-column keys as divergent from their conventions, needed when using "legacy" databases. Many other frameworks have followed this lead. – Bill Karwin Mar 10 '10 at 23:06

It seems like "properly" constructing identifying relationships would lead to obnoxiously huge primary keys. e.g. Building has Floor has Room has Bed. The PK for Bed would be (bed_id, floor_id, room_id, building_id). It seem's strange that I've never seen this in practice, nor heard it suggested as a way to do anything. That's a lot of redundant data in the PK. – hobodave Mar 10 '10 at 23:34

@hobodave: I have seen multi-column primary keys that are even larger. But I take your point. Consider that multi-column primary keys convey more information; you can query the Beds table for all beds in a specific building without doing any joins. – Bill Karwin Mar 11 '10 at 1:00

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

鱼窥荷 2024-12-10 22:23:42

该数据已标准化

TABLE { FIELDS }
-----------------------------------------------------------------------
building { id, data } 
floor { id, building_id, data }
room {id, floor_id, data }
bed {id, room_id, data }

该表不是(坏主意)

TABLE { FIELDS }
-----------------------------------------------------------------------
building { id, data } 
floor { id, building_id, data }
room {id, building_id, floor_id, data }
bed {id, building_id, floor_id, room_id, data }
  1. 在第一个(好)表中,您没有不需要的重复数据。
  2. 在第一个表中插入会快得多。
  3. 第一个表将更容易装入内存,从而加快查询速度。
  4. InnoDB 是针对模型 A 进行优化的,而不是针对模型 B。
  5. 后者(坏)表有重复的数据,如果不同步,你就会陷入混乱。 DB A 不同步要困难得多,因为数据只列出一次。
  6. 如果我想合并来自建筑物、楼层、房间和床的数据,我需要合并模型 A 和模型 B 中的所有四个表,您如何在这里节省时间。
  7. InnoDB 将索引数据存储在自己的文件中,如果选择仅索引,则表本身将永远被访问。那么为什么要复制索引呢?无论如何,MySQL 永远不需要读取主表。
  8. InnoDB 将 PK 存储在每个二级索引中,使用复合且较长的 PK,您会减慢每个使用索引的选择并导致文件大小膨胀;没有任何收获。
  9. 您有严重的速度问题吗?如果没有,您是否正在对表进行非规范化?
  10. 甚至不要考虑使用 MyISAM,它受这些问题的影响较小,它没有针对多连接数据库进行优化,并且不支持引用完整性或事务,并且与此工作负载不匹配。
  11. 使用复合键时,您只能使用键的最右边部分,即除了使用 id+building_id+ 之外,您不能在表 bed 中使用 floor_id Floor_id,这意味着您可能必须使用比模型 A 中所需的更多的键空间。要么您需要添加一个额外的索引(这将拖曳 PK 的完整副本)。

简而言之
我认为 Model B 的好处绝对为零,但有很多缺点,永远不要使用它!

This data is normalized

TABLE { FIELDS }
-----------------------------------------------------------------------
building { id, data } 
floor { id, building_id, data }
room {id, floor_id, data }
bed {id, room_id, data }

This table is not (bad idea)

TABLE { FIELDS }
-----------------------------------------------------------------------
building { id, data } 
floor { id, building_id, data }
room {id, building_id, floor_id, data }
bed {id, building_id, floor_id, room_id, data }
  1. In the first (good) table you do not have unneeded duplicated data.
  2. Inserts in the first table will be much faster.
  3. The first tables will fit more easily in memory, speeding up your queries.
  4. InnoDB is optimized with model A in mind, not with model B.
  5. The latter (bad) table has duplicated data, if that gets out of sync, you will have a mess. DB A cannot is much harder to get out of sync, because the data is only listed once.
  6. If I want to combine data from the building, floor, room and bed I will need to combine all four tables in model A as well as model B, how are you saving time here.
  7. InnoDB stores indexed data in its own file, if you select only indexes, the tables themselves will never be accessed. So why are you duplicating the indexes? MySQL will never need to read the main table anyway.
  8. InnoDB stores the PK in each an every secondary index, with a composite and thus long PK, you are slowing down every select that uses an index and balooning the filesize; for no gain what so ever.
  9. Do you have serious speed problem? If not, you are you denormalizing your tables?
  10. Don't even think about using MyISAM which suffers less from these issues, it is not optimized for multi-join databases and does not support referential intregrity or transactions and is a poor match for this workload.
  11. When using a composite key you can only ever use the rightmost-part of the key, i.e. you cannot use floor_id in table bed other than using id+building_id+floor_id, This means that you may have to use much more key-space than needed in Model A. Either that or you need to add an extra index (which will drag around a full copy of the PK).

In short
I see absolutly zero benefit and a whole lot of drawbacks in Model B, never use it!

空宴 2024-12-10 22:23:42

我认为你的#2 不太可能是比尔·卡文的意思。通常,“id”表示自动数字序列。我认为他的意思更有可能是这样的。组成主键的列位于星号之间。

TABLE    { COLUMNS }
-----------------------------------------------------------------------
building { *building_id*, other columns } 
floor    { *building_id, floor_num*, other columns }
room     { *building_id, floor_num, room_num*, other columns }
bed      { *building_id, floor_num, room_num, bed_num* (?), other columns }

不过,我不确定您可能还有哪些其他“床”专栏。双人床、双人床、大号床、特大号床?这可能是有道理的。如果是这样的话,那么这个表

bed      { *building_id, floor_num, room_num, bed_num*, bed_size }

还远未达到“非规范化”。事实上,它是在5NF中。

如果您测试这两种模式的性能,您可能会发现这一种模式大多数时候都围绕您的#1 运行。在我运行的这批查询中,速度快了大约 30 倍。

I think it's pretty unlikely that your #2 is what Bill Karwin meant. Usually, "id" implies an automatic numeric sequence. I think it's more likely he meant something along these lines. Columns that make up primary keys are between asterisks.

TABLE    { COLUMNS }
-----------------------------------------------------------------------
building { *building_id*, other columns } 
floor    { *building_id, floor_num*, other columns }
room     { *building_id, floor_num, room_num*, other columns }
bed      { *building_id, floor_num, room_num, bed_num* (?), other columns }

I'm not sure what other columns you might have for "bed", though. Twin, Full, Queen, King? That could make sense. If that's the case, then this table

bed      { *building_id, floor_num, room_num, bed_num*, bed_size }

is far from "denormalized". In fact, it's in 5NF.

If you test the performance of these two schemas, you'll probably find that this one runs rings around your #1 most of the time. In the batch of queries I ran, it's about 30 times faster.

少女净妖师 2024-12-10 22:23:42

第一个表结构是规范化的经典结构。但不幸的是,这不适用于大型项目。因为如果您的表构建包含许多数据行,例如数百万行,具体取决于您使用的城市或国家,那么您的连接将会非常慢。
因此,在实际项目中,使用包含所有聚合信息的非规范化表。您可以直接使用此类表,也可以使用 sphinx 等独立服务器来搜索数据。关于三个字段的主键,我认为在这种情况下这个是多余的。因为

  1. 如果你使用 innodb 这个键将被添加到这个表中的所有辅助键中。
  2. 如果您使用界面来管理床位,则使用一个字段 ID 来处理特定行比使用三个字段更方便。
  3. 如果要保证行的唯一性,可以在这 3 个字段上使用 UNIQUE KEY。

The first tables structure is normalized,classical structure. But unfortunately this one is not applicable for the big project. Because if your table building contains many data rows e.g. million depending on which cities or countries you use your join will be very slow.
So in real projects denormalized tables are used which contain all aggregated info. You can work with such tables directly or use standalone servers like sphinx for searching data. Regarding primary key on three fields I think in this case this one is redundant. Because

  1. If you use innodb this key will be added to all secondary keys in this table.
  2. If you use interface for managing beds it will be convenient to use one field id for working with a specific rows than with three fields.
  3. If you want to guarantee uniqueness of row you can use UNIQUE KEY on these 3 fields.
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文