为什么连接表上有主键不好?

发布于 2024-07-20 05:40:09 字数 616 浏览 4 评论 0原文

我正在观看一个截屏视频,其中作者说在连接表上有主键不好,但没有解释原因。

示例中的联接表在 Rails 迁移中定义了两列,作者为每列添加了索引,但没有主键。

为什么在这个例子中没有主键是不好的?

create_table :categories_posts, :id => false do |t|
  t.column :category_id, :integer, :null => false
  t.column :post_id, :integer, :null => false
end
add_index :categories_posts, :category_id
add_index :categories_posts, :post_id

编辑:正如我向 Cletus 提到的,我可以理解自动编号字段作为主键的潜在用处,即使对于连接表也是如此。 然而,在上面列出的示例中,作者明确避免在“create table”语句中使用语法“:id => false”创建自动编号字段。 通常,Rails 会自动将自动编号 id 字段添加到在这样的迁移中创建的表中,这将成为主键。 但是对于这个join表,作者特意阻止了。 我不确定他为什么决定采用这种方法。

I was watching a screencast where the author said it is not good to have a primary key on a join table but didn't explain why.

The join table in the example had two columns defined in a Rails migration and the author added an index to each of the columns but no primary key.

Why is it not good to have a primary key in this example?

create_table :categories_posts, :id => false do |t|
  t.column :category_id, :integer, :null => false
  t.column :post_id, :integer, :null => false
end
add_index :categories_posts, :category_id
add_index :categories_posts, :post_id

EDIT: As I mentioned to Cletus, I can understand the potential usefulness of an auto number field as a primary key even for a join table. However in the example I listed above, the author explicitly avoids creating an auto number field with the syntax ":id => false" in the "create table" statement. Normally Rails would automatically add an auto-number id field to a table created in a migration like this and this would become the primary key. But for this join table, the author specifically prevented it. I wasn't sure why he decided to follow this approach.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(8

苏别ゝ 2024-07-27 05:40:09

一些注意事项:

  1. category_id 和 post_id 的组合本身是唯一的,因此额外的 ID 列是多余且浪费的
  2. 截屏视频中的短语“不好拥有主键”是不正确的。 你仍然有一个主键——它只是由两列组成(例如 CREATE TABLE foo( cid, pid, PRIMARY KEY( cid, pid ) )。对于习惯于在任何地方附加 ID 值的人来说,这似乎是奇怪,但在关系理论中,它是非常正确和自然的;截屏视频作者最好说“有一个名为‘ID’的隐式整数属性作为主键是不好的”,
  3. 因为你有额外的列 。无论如何,都会在category_id和post_id的组合上放置一个唯一索引,以确保不会插入重复的行。
  4. 最后,虽然常见的术语将其称为“复合键”,但这也是多余的。关系理论中的术语“键”实际上是“复合键”。唯一标识行的零个或多个属性的集合,因此可以说主键是category_id、post_id
  5. 将最具选择性的列放在主键声明中的第一个讨论 b(+/*) 的构造。树超出了这个答案的范围(对于一些较低级别的讨论,请参阅:http://www .akadia.com/services/ora_index_selectivity.html ),但就您而言,您可能希望将其放在 post_id、category_id 上,因为 post_id 在表中出现的频率较低,从而使索引更有用。 当然,由于表很小,而且索引本质上就是数据行,所以这不是很重要。 在更广泛的情况下,桌子会更宽。

Some notes:

  1. The combination of category_id and post_id is unique in of itself, so an additional ID column is redundant and wasteful
  2. The phrase "not good to have a primary key" is incorrect in the screencast. You still have a Primary Key -- it is just made up of the two columns (e.g. CREATE TABLE foo( cid, pid, PRIMARY KEY( cid, pid ) ). For people who are used to tacking on ID values everywhere this may seem odd but in relational theory it is quite correct and natural; the screencast author would better have said it is "not good to have an implicit integer attribute called 'ID' as the primary key".
  3. It is redundant to have the extra column because you will place a unique index on the combination of category_id and post_id anyway to ensure no duplicate rows are inserted
  4. Finally, although common nomenclature is to call it a "composite key" this is also redundant. The term "key" in relational theory is actually the set of zero or more attributes that uniquely identify the row, so it is fine to say that the primary key is category_id, post_id
  5. Place the MOST SELECTIVE column FIRST in the primary key declaration. A discussion of the construction of b(+/*) trees is out of the scope of this answer ( for some lower-level discussion see: http://www.akadia.com/services/ora_index_selectivity.html ) but in your case, you'd probably want it on post_id, category_id since post_id will show up less often in the table and thus make the index more useful. Of course, since the table is so small and the index will be, essentially, the data rows, this is not very important. It would be in broader cases where the table is wider.
顾挽 2024-07-27 05:40:09

在任何表上都没有主键是一个坏主意(如果 DBMS 是关系 DBMS - 或 SQL DBMS)。 主键是数据库完整性的重要组成部分。

我想如果你不介意你的数据库不准确并且经常提供不正确的答案,那么你可以不用......但大多数人都希望从他们的 DBMS 中得到准确的答案,对于这样的人来说,主键至关重要。

It is a bad idea not to have a primary key on any table, period (if the DBMS is a relational DBMS - or an SQL DBMS). Primary keys are a crucial part of the integrity of your database.

I suppose if you don't mind your database being inaccurate and providing incorrect answers every so often, then you could do without...but most people want accurate answers from their DBMS and for such people, primary keys are crucial.

好多鱼好多余 2024-07-27 05:40:09

DBA 会告诉您,这种情况下的主键实际上是两个 FK 列的组合。 由于 Rails/ActiveRecord 不能很好地处理复合 PK(至少在默认情况下),这可能就是原因。

A DBA would tell you that the primary key in this case is actually the combination of the two FK columns. Since Rails/ActiveRecord doesn't play nice with composite PKs (by default, at least), that may be the reason.

故事未完 2024-07-27 05:40:09

外键的组合可以为主键(称为复合主键)。 就我个人而言,我更喜欢使用技术主键而不是(自动编号字段、序列等)。 为什么? 嗯,它使得识别记录变得容易容易,如果您要删除它,您可能需要这样做。

想一想:如果您要呈现一个包含所有链接的网页,那么使用主键来标识记录会使事情变得更加容易。

The combination of foreign keys can be a primary key (called a composite primary key). Personally I favour using a technical primary key instead of that (auto number field, sequence, etc). Why? Well, it makes it much easier to identify the record, which you may need to do if you're going to delete it.

Think about it: if you're going to present a Webpage of all the linkages, having a primary key to identify the record makes it much easier.

成熟稳重的好男人 2024-07-27 05:40:09

基本上是因为没有必要。 两个外键字段的组合足以唯一地标识任何行。

但这仅仅说明了为什么它不是一个好主意……但为什么它会是一个坏主意呢?

考虑添加标识列会增加的开销。 该表将多占用 50% 的磁盘空间。 更糟糕的是指数情况。 对于身份字段,您必须维护身份计数以及第二个索引。 您的磁盘空间将增加两倍,每次插入所需执行的工作量将增加两倍。 唯一的优点是 DELETE 命令中的 WHERE 子句稍微短一些。

另一方面,如果复合键字段是整个表,那么索引可以是表。

Basically because there's no need for it. The combination of the two foreign key field adequately uniquely identifies any row.

But that merely says why it's not a Good Idea.... but why would it be a Bad Idea?

Consider the overhead adding a identity column would add. The table would take up 50% more disk space. Worse is the index situation. With a identity field, you have to maintain the identity count, plus a second index. You'll be tripling the disk space and tripling the work the needs to be performed on every insert. With the only advantage being a slightly shorter WHERE clause in a DELETE command.

On the other hand, If the composite key fields are the entire table, then the index can be the table.

电影里的梦 2024-07-27 05:40:09

将最具选择性的列放在前面应该只与 INDEX 声明相关。 在 KEY 声明中,这应该无关紧要(因为,正如已经正确指出的那样,KEY 是一个 SET,并且在集合内部,顺序并不重要 - 集合 {a1,a2} 与 {a2} 是相同的集合,a1})。

如果 DBMS 产品的 KEY 声明中的属性顺序会产生影响,那么该 DBMS 产品就会犯下错误:未能正确地区分数据库的逻辑设计(执行 KEY 声明的部分)和数据库的物理设计。数据库(进行 INDEX 声明的部分)。

Placing the most selective column first should only be relevant in the INDEX declaration. In the KEY declaration, it should not matter (because, as has been correctly pointed out, the KEY is a SET, and inside a set, order doesn't matter - the set {a1,a2} is the same set as {a2,a1}).

If a DBMS product is such that ordering of attributes inside a KEY declaration makes a difference, then that DBMS product is guilty of not properly distinguishing between the logical design of a database (the part where you do the KEY declaration) and the physical design of the database (the part where you do the INDEX declaration).

十六岁半 2024-07-27 05:40:09

我想对以下评论发表评论:“说零个或更多是不正确的”。

我想指出的是,添加此评论的文本根本不包含文本“零或更多”,因此我想评论的评论的作者正在批评其他人没有说过的事情。

我还想评论说,说“零个或更多”是不正确的,这是不正确的。 今天,关系理论在少数仍然费心研究该理论细节的人中广为人知,实际上需要没有属性的密钥的可能性。

但是当我按下“评论”按钮时,系统回复我说评论需要50分(或类似的声誉分数)。

这是一个悲伤的例子,说明世界似乎忘记了科学不是民主,在科学中,真理并不取决于谁碰巧是多数,也不是取决于谁碰巧拥有“足够的声誉”。

I wanted to comment on the following comment : "It is not correct to say zero or more".

I wanted to remark that the text to which this comment was added simply did not contain the text "zero or more", so the author of the comment I wanted to comment on was criticizing someone else for something that hadn't been said.

I also wanted to comment that it is not correct to say that it is not correct say "zero or more". Relational theory as commonly known today among the few people who still bother to study the details of that theory, actually REQUIRES the possibility of a key with no attributes.

But when I pressed the button "comment", the system responded to me that commenting requires a reputation score of 50 (or some such).

A sad illustration of how the world seems to have forgotten that science is not democracy, and that in science, the truth is not determined by whoever happens to be the majority, nor by whoever happens to have "enough reputation".

2024-07-27 05:40:09

拥有单个 PK 的优点

  • 唯一标识具有单个值的行
  • 如果需要,可以轻松地从其他地方引用关系
  • 有些工具希望您拥有单个整数值 pk

使用单个 PK 的缺点

  • 使用更多的磁盘空间
  • 需要 3 个索引而不是 1 个
  • 如果没有唯一约束,您最终可能会得到同一关系的多行

注释

  • 您需要定义如果您想避免重复,则使用唯一约束
  • 在我看来,如果您的表很大,则不要使用单个 pk,否则为了方便需要牺牲一些磁盘空间。 是的,这很浪费,但谁会在乎现实应用程序中磁盘上的几 MB 空间呢。

Pros of having a single PK

  • Uniquely identifies a row with a single value
  • Makes it easy to reference the relationship from elsewhere if needed
  • Some tools want you to have a single integer value pk

Cons of having a single PK

  • Uses more disk space
  • Need 3 indexes rather than 1
  • Without a unique constraint you could end up with multiple rows for the same relationship

Notes

  • You need to define a unique constraint if you want to avoid duplicates
  • In my opinion don't use the single pk if you're table is going to be huge, otherwise trade off some disk space for the convenience. Yes it's wasteful, but who cares about a few MB on disk in real world applications.
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文