表中主键的最佳实践是什么？

歌入人心 2024-07-16 13:38:35

除了所有这些好的答案之外，我只想分享我刚刚读过的一篇好文章，伟大的主键辩论。

仅引用几点：

开发人员在为每个表选择主键时必须应用一些规则：

主键必须唯一标识每条记录。
记录的主键值不能为空。
主键值在创建记录时必须存在。
主键必须保持稳定 - 您无法更改主键字段。
主键必须紧凑并包含尽可能少的属性。
主键值无法更改。

自然键（往往）会违反规则。代理键符合规则。（你最好阅读那篇文章，它值得你花时间！）

回复收藏 0 原文

场罚期间 2024-07-16 13:38:35

从各个字段创建主键没有问题，这是一个自然键。

您可以使用标识列（与候选字段上的唯一索引关联）来创建代理键。

这是一个古老的讨论。在大多数情况下我更喜欢代理键。

但没有钥匙是没有借口的。

回复：编辑

是的，对此有很多争议：D

除了自然键是自然选择这一事实之外，我没有看到自然键有任何明显的优势。您总是会想到姓名、社交号码 - 或类似的东西 - 而不是idPerson。

代理键是自然键所存在的一些问题的答案（例如传播更改）。

当你习惯了代理后，它看起来更干净、更易于管理。

但最终，你会发现这只是品味或心态的问题。人们使用自然键“思考得更好”，而其他人则不然。

回复收藏 0 原文

绝對不後悔。 2024-07-16 13:38:35

表应该始终有一个主键。如果没有，它应该是一个自动增量字段。

有时人们会忽略主键，因为他们传输大量数据，这可能会减慢（取决于数据库）过程。但是，它应该添加在它之后。

关于链接表的一些评论，这是正确的，这是一个例外，但字段应该是 FK 以保持完整性，并且在某些情况下，如果链接中的重复未经授权，这些字段也可以是主键。 ..但是为了保持简单的形式，因为异常是编程中经常发生的事情，应该存在主键以保持数据的完整性。

回复收藏 0 原文

北方。的韩爷 2024-07-16 13:38:35

以下是我在 25 年以上的开发经验后得出的自己的经验法则。

所有表都应该有一个自动的单列主键
增量。
将其包含在任何可更新的视图中
主键在应用程序的上下文中不应具有任何意义。这意味着它不应该是 SKU、帐号、员工 ID 或对您的应用程序有意义的任何其他信息。它只是与实体关联的唯一键。

主键由数据库用于优化目的，应用程序不应将其用于识别特定实体或与特定实体相关的其他用途。

始终拥有单值主键使得执行 UPSERT 变得非常简单。

与多列索引相比，更喜欢单列上的多个索引。
例如，如果您有一个两列键，那么与创建两列索引相比，更倾向于在每列上创建索引。如果我们在名字+姓氏上创建多列键，那么如果不提供名字，我们就无法对姓氏进行索引查找。在两列上都有索引允许优化器对其中一列或两列执行索引查找，无论它们在 WHERE 子句中如何表达。
如果您的表很大，请尝试根据最突出的搜索条件将表分区。
如果您的表中包含大量 Id 字段，请考虑删除除具有 id (PK)、org_id（原始表的 FK）和 id_type 列的单个表的主键以外的所有字段。为新表上的所有列创建索引并将其与原始表关联。通过这种方式，您现在可以仅使用单个索引对任意数量的 id 执行索引查找。

回复收藏 0 原文

彩虹直至黑白 2024-07-16 13:38:35

自然密钥（如果可用）通常是最好的。因此，如果 datetime/char 唯一地标识该行并且两个部分都对该行有意义，那就太好了。

如果只是日期时间有意义，并且只是附加字符以使其唯一，那么您不妨使用识别字段。

回复收藏 0 原文

半衾梦 2024-07-16 13:38:35

主键有什么特别之处？

模式中表的用途是什么？表的键的用途是什么？主键有什么特别之处？关于主键的讨论似乎忽略了一点：主键是表的一部分，而该表是模式的一部分。最适合表和表关系的应该驱动所使用的键。

表（和表关系）包含有关您希望记录的信息的事实。这些事实应该是独立的、有意义的、易于理解的、不矛盾的。从设计角度来看，从架构中添加或删除的其他表不应影响相关表。存储仅与信息本身相关的数据必须有一个目的。了解表中存储的内容不需要进行科学研究项目。出于同一目的而存储的事实不应存储多次。键是记录的信息的全部或部分，是唯一的，主键是专门指定的键，它是表的主要访问点（即应该选择它来保证数据的一致性和使用，而不仅仅是插入表现）。

旁白：不幸的是，大多数数据库设计的副作用
由应用程序程序员（有时我也是）开发的是
通常什么最适合应用程序或应用程序框架
驱动表的主键选择。这导致整数和
GUID 密钥（因为这些对于应用程序框架来说很容易使用）和
整体表设计（因为这减少了应用程序的数量
表示内存中数据所需的框架对象）。这些
应用程序驱动的数据库设计决策会产生重要的数据
大规模使用时的一致性问题。应用框架
这样的设计方式自然会导致一次表的设计。
“部分记录”是在表格中创建的，并且随着时间的推移填充数据。
避免多表交互或者使用时导致不一致
当应用程序运行不正常时的数据。这些设计引领
对于无意义（或难以理解）的数据，数据传播
在表格上（你必须查看其他表格才能理解
当前表）和重复数据。

有人说主键应该尽可能小。我想说的是，键应该只有必要的大小。应避免向表中随机添加无意义的字段。更糟糕的是，从随机添加的无意义字段中创建键，尤其是当它破坏了从另一个表到非主键的连接依赖关系时。仅当表中没有好的候选键时，这才是合理的，但如果用于所有表，这种情况肯定是架构设计不佳的标志。

还有人说主键永远不应该改变，因为更新主键应该永远是不可能的。但更新与删除后插入相同。按照这种逻辑，您永远不应该使用一个键从表中删除一条记录，然后使用第二个键添加另一条记录。添加代理主键并不会消除表中存在其他键的事实。如果其他表通过代理键依赖于该含义，则更新表的非主键可能会破坏数据的含义（例如，带有代理键的状态表的状态描述从“已处理”更改为“已取消”） '肯定会损坏数据）。永远不要出现破坏数据意义的问题。

话虽如此，我还是很感激当今企业中存在的许多设计不佳的数据库（无意义的代理键控数据损坏的 1NF 庞然大物），因为这意味着了解正确数据库设计的人们有无穷无尽的工作要做。但悲伤的一面是，它有时确实让我觉得自己像西西弗斯，但我敢打赌他有一张 401k（在坠机之前）。对于重要的数据库设计问题，请远离博客和网站。如果您正在设计数据库，请查找 CJ Date。您还可以参考 Celko for SQL Server，但前提是您首先要保持警惕。在 Oracle 方面，请参考 Tom Kyte。

What is special about the primary key?

What is the purpose of a table in a schema? What is the purpose of a key of a table? What is special about the primary key? The discussions around primary keys seem to miss the point that the primary key is part of a table, and that table is part of a schema. What is best for the table and table relationships should drive the key that is used.

Tables (and table relationships) contain facts about information you wish to record. These facts should be self-contained, meaningful, easily understood, and non-contradictory. From a design perspective, other tables added or removed from a schema should not impact on the table in question. There must be a purpose for storing the data related only to the information itself. Understanding what is stored in a table should not require undergoing a scientific research project. No fact stored for the same purpose should be stored more than once. Keys are a whole or part of the information being recorded which is unique, and the primary key is the specially designated key that is to be the primary access point to the table (i.e. it should be chosen for data consistency and usage, not just insert performance).

ASIDE: The unfortunately side effect of most databases being designed
and developed by application programmers (which I am sometimes) is
that what is best for the application or application framework often
drives the primary key choice for tables. This leads to integer and
GUID keys (as these are simple to use for application frameworks) and
monolithic table designs (as these reduce the number of application
framework objects needed to represent the data in memory). These
application driven database design decisions lead to significant data
consistency problems when used at scale. Application frameworks
designed in this manner naturally lead to table at a time designs.
“Partial records” are created in tables and data filled in over time.
Multi-table interaction is avoided or when used causes inconsistent
data when the application functions improperly. These designs lead
to data that is meaningless (or difficult to understand), data spread
over tables (you have to look at other tables to make sense of the
current table), and duplicated data.

It was said that primary keys should be as small as necessary. I would says that keys should be only as large as necessary. Randomly adding meaningless fields to a table should be avoided. It is even worse to make a key out of a randomly added meaningless field, especially when it destroys the join dependency from another table to the non-primary key. This is only reasonable if there are no good candidate keys in the table, but this occurrence is surely a sign of a poor schema design if used for all tables.

It was also said that primary keys should never change as updating a primary key should always be out of the question. But update is the same as delete followed by insert. By this logic, you should never delete a record from a table with one key and then add another record with a second key. Adding the surrogate primary key does not remove the fact that the other key in the table exists. Updating a non-primary key of a table can destroy the meaning of the data if other tables have a dependency on that meaning through a surrogate key (e.g. a status table with a surrogate key having the status description changed from ‘Processed’ to ‘Cancelled’ would definitely corrupt the data). What should always be out of the question is destroying data meaning.

Having said this, I am grateful for the many poorly designed databases that exist in businesses today (meaningless-surrogate-keyed-data-corrupted-1NF behemoths), because that means there is an endless amount of work for people that understand proper database design. But on the sad side, it does sometimes make me feel like Sisyphus, but I bet he had one heck of a 401k (before the crash). Stay away from blogs and websites for important database design questions. If you are designing databases, look up CJ Date. You can also reference Celko for SQL Server, but only if you hold your nose first. On the Oracle side, reference Tom Kyte.

回复收藏 0 原文

澜川若宁 2024-07-16 13:38:35

我怀疑原始数据结构的设计者需要史蒂文·A·洛（Steven A. Lowe）的卷起报纸疗法。

顺便说一句，GUID 作为主键可能会消耗性能。我不会推荐它。

回复收藏 0 原文

止于盛夏 2024-07-16 13:38:35

对我来说，自然键与人工键的区别在于您希望数据库中有多少业务逻辑。社会安全号码 (SSN) 就是一个很好的例子。

“我数据库中的每个客户都将且必须拥有 SSN。” 砰，完成，将其设为主键并完成。请记住，当您的业务规则发生变化时，您就会被烧毁。

由于我在不断变化的业务规则方面的经验，我自己不喜欢自然键。但如果您确定它不会改变，它可能会阻止一些关键的连接。

回复收藏 0 原文

幻想少年梦 2024-07-16 13:38:35

我也总是使用数字 ID 列。在oracle中，我使用number(18,0)没有真正的原因高于number(12,0)（或者任何int而不是long），也许我只是不想担心获得几十亿行数据库！

我还包括一个创建和修改的列（类型时间戳）用于基本跟踪，这似乎很有用。

我不介意对其他列组合设置唯一约束，但我真的很喜欢我的 id、创建、修改的基线要求。

回复收藏 0 原文

涙—继续流 2024-07-16 13:38:35

我寻找自然主键并尽可能使用它们。

如果找不到自然键，我更喜欢 GUID 而不是 INT++，因为 SQL Server 使用树，并且总是将键添加到树的末尾是不好的。

在多对多耦合的表上，我使用外键的复合主键。

因为我很幸运能够使用 SQL Server，所以我可以使用探查器和查询分析器研究执行计划和统计信息，并轻松了解我的密钥的执行情况。

回复收藏 0 原文

带上头具痛哭 2024-07-16 13:38:35

您应该使用由多个字段组成的“复合”或“复合”主键。

这是一个完全可以接受的解决方案，请访问此处了解更多信息:)

回复收藏 0 原文

挽清梦 2024-07-16 13:38:35

我总是使用自动编号或身份字段。

我为一位客户工作过，他使用 SSN 作为主键，然后由于 HIPAA 法规，被迫更改为“MemberID”，这在更新相关表中的外键时导致了很多问题。坚持一致的身份列标准帮助我避免了所有项目中的类似问题。

回复收藏 0 原文

各空 2024-07-16 13:38:35

GUID 可以用作主键，但您需要创建正确类型的 GUID，以便它表现良好。

您需要生成 COMB GUID。关于它和性能统计的一篇好文章是
GUID 作为主键的成本 。

另外，SQL 中构建 COMB GUID 的一些代码位于 唯一标识符与身份(存档)。

回复收藏 0 原文

小ぇ时光︴ 2024-07-16 13:38:35

所有表都应该有一个主键。否则，您拥有的是堆 - 在某些情况下，这可能就是您想要的（例如，当数据通过服务代理复制到另一个数据库或表时，会产生大量插入负载）。

对于行数较少的查找表，您可以使用 3 CHAR 代码作为主键，因为这比 INT 占用的空间更少，但性能差异可以忽略不计。除此之外，我将始终使用 INT，除非您有一个引用表，该表可能具有由关联表中的外键组成的复合主键。

回复收藏 0 原文

画▽骨i 2024-07-16 13:38:35

如果您确实想通读有关这一古老争论的所有来回内容，请在 Stack Overflow 上搜索“自然键”。您应该得到结果页面。

回复收藏 0 原文

你的往事 2024-07-16 13:38:35

我们做了很多连接，复合主键刚刚成为性能消耗者。即使您引入了第二个候选键，一个简单的 int 或 long 也可以解决许多问题，但与三个字段相比，在一个字段上加入要容易得多且更容易理解。

回复收藏 0 原文

野稚 2024-07-16 13:38:35

我将坦率地表达我对自然键的偏好 - 尽可能使用它们，因为它们将使您的数据库管理生活变得更加轻松。我在公司建立了一个标准，所有表都有以下列：

Row ID (GUID)
Creator（字符串；默认为当前用户名（T-SQL 中的SUSER_SNAME()））
已创建(DateTime)
时间戳

行 ID 在每个表上都有一个唯一的键，并且在任何情况下都是按行自动生成的（并且权限阻止任何人对其进行编辑），并且合理地保证在所有表和数据库中都是唯一的。如果任何 ORM 系统需要单个 ID 密钥，则可以使用此密钥。

同时，如果可能的话，实际的PK是自然密钥。我的内部规则是这样的：

人员 - 使用代理键，例如 INT。如果是内部的，则 Active Directory 用户 GUID 是可接受的选择
查找表（例如 StatusCodes）- 使用短 CHAR 代码；它比 INT 更容易记住，并且在许多情况下，纸质表格和用户也会使用它来简洁（例如，状态=“E”表示“过期”，“A”表示“已批准”，“NADIS”表示“未检测到石棉”）在示例中”）
链接表 - FK 的组合（例如 EventId、AttendeeId）

因此，理想情况下，您最终会得到一个自然的、人类可读且令人难忘的 PK，以及一个 ORM 友好的单 ID-per-表 GUID。

警告：我维护的数据库倾向于数十万条记录，而不是数百万或数十亿条记录，因此，如果您有大型系统的经验，而这与我的建议相矛盾，请随意忽略我！

回复收藏 0 原文

我很OK 2024-07-16 13:38:34

我遵循一些规则：

主键应该尽可能小。首选数字类型，因为数字类型的存储格式比字符格式更紧凑。这是因为大多数主键将是另一个表中的外键并在多个索引中使用。键越小，索引越小，您将使用的缓存中的页面就越少。
主键永远不应该改变。更新主键应该是不可能的。这是因为它最有可能被用在多个索引中并用作外键。更新单个主键可能会导致更改的连锁反应。
不要使用“您的问题主键”作为您的逻辑模型主键。例如，护照号、社会保障号或员工合同号，因为这些“自然键”在现实世界中可能会发生变化。确保在必要时为这些添加 UNIQUE 约束以强制一致性。

关于代理与自然键，我参考了上面的规则。如果自然键很小并且永远不会改变，则可以将其用作主键。如果自然键很大或者可能会改变，我会使用代理键。如果没有主键，我仍然会创建一个代理键，因为经验表明您总是会向架构中添加表，并希望将主键放在适当的位置。

回复收藏 0 原文

深海夜未眠 2024-07-16 13:38:34

自然与人工密钥是数据库社区中的一种宗教辩论 - 请参阅这篇文章及其链接到的其他文章。我既不赞成总是拥有人造钥匙，也不赞成从不拥有它们。我会根据具体情况决定，例如：

美国各州：我会选择 state_code（德克萨斯州等为“TX”），而不是德克萨斯州
员工的 state_id=1：我通常会创建一个人工employee_id，因为很难找到其他有效的东西。 SSN 或同等信息可能有效，但可能会出现问题，例如新加入者尚未提供其 SSN。
员工薪资历史记录：（employee_id，start_date）。我不会创建一个人工的employee_salary_history_id。它能起到什么作用（除了“愚蠢的一致性”）

无论在哪里使用人工密钥，您都应该始终还声明对自然键的唯一约束。例如，如果必须的话，可以使用 state_id，但是你最好对 state_code 声明一个唯一约束，否则你最终肯定会得到：

state_id    state_code   state_name
137         TX           Texas
...         ...          ...
249         TX           Texas

Natural verses artifical keys is a kind of religious debate among the database community - see this article and others it links to. I'm neither in favour of always having artifical keys, nor of never having them. I would decide on a case-by-case basis, for example:

US States: I'd go for state_code ('TX' for Texas etc.), rather than state_id=1 for Texas
Employees: I'd usually create an artifical employee_id, because it's hard to find anything else that works. SSN or equivalent may work, but there could be issues like a new joiner who hasn't supplied his/her SSN yet.
Employee Salary History: (employee_id, start_date). I would not create an artifical employee_salary_history_id. What point would it serve (other than "foolish consistency")

Wherever artificial keys are used, you should always also declare unique constraints on the natural keys. For example, use state_id if you must, but then you'd better declare a unique constraint on state_code, otherwise you are sure to eventually end up with:

state_id    state_code   state_name
137         TX           Texas
...         ...          ...
249         TX           Texas

回复收藏 0 原文

装纯掩盖桑 2024-07-16 13:38:34

我避免使用自然键的原因很简单——人为错误。尽管自然唯一标识符通常可用（SSN、VIN、帐号等），但它们需要人工正确输入。如果您使用 SSN 作为主键，有人在数据输入过程中调换了几个数字，并且没有立即发现错误，那么您将面临更改主键的情况。

我的主键都是由数据库程序在后台处理的，用户永远不会意识到它们。

回复收藏 0 原文

二智少女猫性小仙女 2024-07-16 13:38:34

只是对经常被忽视的事情进行额外的评论。有时，不使用单个代理键作为主键对子表有好处。假设我们有一种设计，允许您在一个数据库中运行多个公司（可能是托管解决方案，或者其他什么）。

假设我们有这些表和列：

Company:
  CompanyId   (primary key)

CostCenter:
  CompanyId   (primary key, foreign key to Company)
  CostCentre  (primary key)

CostElement
  CompanyId   (primary key, foreign key to Company)
  CostElement (primary key)

Invoice:
  InvoiceId    (primary key)
  CompanyId    (primary key, in foreign key to CostCentre, in foreign key to CostElement)
  CostCentre   (in foreign key to CostCentre)
  CostElement  (in foreign key to CostElement)

如果最后一位没有意义，Invoice.CompanyId 是两个外键的一部分，一个是 CostCentre 表，另一个是 CostCentre 表。一个到 CostElement 表。主键是 (InvoiceId, CompanyId)。

在此模型中，不可能搞砸并引用一家公司的 CostElement 和另一家公司的 CostCentre。如果在 CostElement 和 CostCentre 表 上使用单个代理键作为主键，并且在 Invoice 表 中没有外键关系，则它将是。

搞砸的机会越少越好。

Just an extra comment on something that is often overlooked. Sometimes not using a single surrogate key as primary has benefits in the child tables. Let's say we have a design that allows you to run multiple companies within the one database (maybe it's a hosted solution, or whatever).

Let's say we have these tables and columns:

Company:
  CompanyId   (primary key)

CostCenter:
  CompanyId   (primary key, foreign key to Company)
  CostCentre  (primary key)

CostElement
  CompanyId   (primary key, foreign key to Company)
  CostElement (primary key)

Invoice:
  InvoiceId    (primary key)
  CompanyId    (primary key, in foreign key to CostCentre, in foreign key to CostElement)
  CostCentre   (in foreign key to CostCentre)
  CostElement  (in foreign key to CostElement)

In case that last bit doesn't make sense, Invoice.CompanyId is part of two foreign keys, one to the CostCentre table and one to the CostElement table. The primary key is (InvoiceId, CompanyId).

In this model, it's not possible to screw-up and reference a CostElement from one company and a CostCentre from another company. If a single surrogate key was used as primary on the CostElement and CostCentre tables, and without the foreign key relations in the Invoice table, it would be.

The fewer chances to screw up, the better.

回复收藏 0 原文

表中主键的最佳实践是什么？

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（21）

主键有什么特别之处？

What is special about the primary key?

关于作者

相关话题

热门标签

推荐作者

马化腾

thousandcents

辰『辰』

ailin001

再摆5分钟就干活

冷情妓

友情链接

表中主键的最佳实践是什么？

如果你对这篇内容有疑问，欢迎到本站社区发帖提问 参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（21）

主键有什么特别之处？

What is special about the primary key?

关于作者

相关话题

热门标签

推荐作者

马化腾

thousandcents

辰『辰』

ailin001

再摆5分钟就干活

冷情妓

友情链接

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。