您什么时候会选择不使用身份/自动编号字段作为主键?
我不相信灵丹妙药,但我真的很喜欢使用序列或自动编号标识列作为数据库表的主键列。 它们是唯一的,索引很好,而且我不必担心空值。
另一方面,在某些情况下,当表中存在可以用于相同目的的其他唯一列时,它们似乎是多余的。 例如,假设您正在构建一个将 9 位邮政编码映射到城市区域的表。 邮政编码字段也可以工作(前提是您可以保证数据格式并且没有重复的值)。
重点是:我的经验,就像我们任何人一样,是有限的。 还有哪些现实世界的示例导致人们选择不使用自动编号列作为表的主键,为什么?
这对我来说是一种“拓宽你的视野”类型的事情,我希望从那些使用过大量数据库并有令人信服的理由选择其他数据库的人那里学到一些东西。
I don't believe in a Silver Bullet, but I really like to use sequences or autonumber identity columns as my primary key columns for database tables. They're unique, they index well, and I don't have to worry about null values.
On the other hand, in some cases, they seem redundant when there are other unique columns in the table that could serve the same purpose. Say, for example, you're building a table that maps 9-digit ZIP codes to city zones. The ZIP code field could work just as well (provided you can guarantee the data format and no duplication of values).
To the point: My experience, as it is with any of us, is limited. What other real-world examples have lead folks to choose not to use an autonumber column as the primary key for a table, and why?
This is a "broaden your horizons" type of thing for me, and I hope to learn a bit from folks who have worked with a plethora of databases and had compelling reasons to choose otherwise.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(8)
恕我直言,使用标识列至关重要,因为即使是最简单的表将来也会变得更加重要。
唯一一次我不会使用 GUID 的情况是,在可能在断开连接的客户端上创建记录然后需要与中央系统同步的情况下。
IMHO it is crucial to use a identity column as even the simplest table can become more important in the future.
The only time i wouldn't use one would be where i used a GUID instead, for circumstances where perhaps records were created on disconnected clients that then needed to synchronize with a central system.
我的经验法则是:“如果您要在正常使用中添加记录,请使用自动增量 PK;如果它是静态表,请使用任何更“自然”的标识符”
IOW:用户、历史记录、资产; 全部获得自动增量 PK。 邮政编码/城市、类型/描述、机器 ID,通常会获得“自然”密钥。
my rule of thumb is: "if you're going to add records in normal usage, use an autoincrement PK; if it's a static table, use whatever identifier is more 'natural'"
IOW: users, history records, assets; all get an autoincrement PK. zip/city, type/descriptions, machine IDs, usually get a 'natural' key.
链接表是复合键最明显的选择
A link table springs to mind as the most obvious choice for a composite key
我几乎无一例外地坚信使用技术主键,所以我的答案必须是……永远不会。
I'm a firm believer in the use of technical primary keys almost without exception so my answer would have to be... never.
在需要频繁数据转储/加载/合并并且我有外键关系的情况下,我通常会避免使用 auto_increment 列。 尝试合并来自使用自动递增 id 的相同架构的两个表实例的数据是一个可怕的问题。
大多数情况下不会出现这种用法,但我的工作涉及大量批处理,然后每个批次都会合并到主数据库中以供以后分析/使用。
I will typically avoid auto_increment columns in situations where frequent data dumps / loads / merges are required and I have foreign key relationships. Attempting to merge data from two table instances of the same schema that use auto incrementing ids is a terrifying problem.
This sort of usage doesn't crop up for most, but my line of work involves a lot of batch processing where each batch then gets merged in to a master database for later analysis/use.
我未使用自动编号字段的一个区域是将 DateDimension 表定义为星型模式的一部分。 在本例中,我使用了一个整数来表示 yyyymmdd 格式的日期。 这允许中央事实表和 DateDimension 之间的快速连接(自动编号 ID 列也可以)。 但是...
DateDimension 表包含其他日期表示形式(例如smalldatetime 列、dayOfWeek 列等)。 如果用户只需要 yyyymmdd 格式的日期,则不需要连接,因为中央事实表中的日期维度键已经存储了此信息。
总的来说,我不太喜欢包含商业信息的密钥。 通常,您在设计模式时对主键所做的假设随着时间的推移将不再成立,您将陷入困境。 在这种情况下,我相当确定日期不会!
One area where I have not used an autonumber field is when defining a DateDimension table as part of a star schema. In this instance I used an integer representing the date in yyyymmdd format. This allowed for fast joins between the central fact table and the DateDimension (as an auto-number ID column also would). However ...
The DateDimension table contained other date representations (e.g. smalldatetime column, dayOfWeek column, etc). If users only wanted the date in yyyymmdd format the join wasn't necessary as the date dimension key in the central fact table already stored this information.
In general I'm not a big fan of keys containing business information. Typically, the assumption you make about the primary key when designing the schema won't hold true over time and you'll become unstuck. In this case I was fairly sure that the date would not!
实际上,我唯一能想到使用标识列的情况是当创建主键所需的字段数量很大,或者作为主键的字段非常大(例如 20 个字符的字符串)时。 在所有其他情况下,我宁愿不使用它们。
没有人提出关于身份的问题,即当数据发生问题时会发生什么。 由于密钥仅基于添加记录的时间,因此在发生灾难性事件后将数据重新加载到表中是一个真正的问题。 现在,数据库管理系统应该可以帮助您并防止有人截断表或切换主键的值......应该。 事情发生了,表被损坏,或者数据库更新遇到问题。 使用身份主键,突然间您会陷入混乱,试图找出哪些身份值与哪一行对应......等等,除非您不能,因为身份值对数据没有任何意义。 对于少量条目,您可能没问题,但是当您开始拥有可能有几百万个值的更大表时(发生这种情况时我们的值略高于 1100 万),很快就会出现真正的问题。 每个人都说,“这是最糟糕的情况,它永远不会发生。” 直到它发生为止。
Really the only time I can think of to use an identity column is when the number of fields required to make a primary key is large, or if the field which is the primary key is really big (like a 20 character string). In all other instances, I prefer not to use them.
The problem no one every brings up about identities, is what happens when something happens to the data. Since the key is based solely on when the record was added, reloading data into a table after a catastrophic event is a real problem. Now the dbms should help you out and prevent someone from truncating a table, or switching the values of the primary keys...should. Things happen, tables get corrupted, or database updates run into problems. With identity primary keys, all of a sudden you're left with a mess trying to figure out which identity values goes with which row....wait except you can't, because the identity value has no meaning in regards to the data. With handful of entries, you might be ok, but when you start having larger tables of maybe a couple million values (ours was a little over 11 million when this happened) that gets to be really problematic in a hurry. Everyone says, "that's a worse case scenario, it'll never happen." That is until it does.
Iain Hoult、Javier 和 TK 提出的原则的一个例外是使用员工编号或“徽章编号”作为人员表的 PK。 在这种情况下,PK只能被称为“有意义的密钥”,因为我们已经将其人事记录的PK交给了员工。
-阿尔。
One exception to the principle expressed by Iain Hoult, Javier, and TK would be the use of an employee number or "badge number" as the PK of a personnel table. In this case, the PK can be called a "meaningful key" only because we have handed the employee the PK of his personnel record.
-Al.