*不*使用 GUID 作为主键的原因是什么?

发布于 2024-09-06 00:05:57 字数 371 浏览 5 评论 0原文

每当我设计数据库时,我都会自动为每个表(查找表除外)自动生成 GUID 主键,

我知道我永远不会因为重复键、合并表等而失眠。对我来说,这只是让从哲学上讲,任何给定的记录在所有域中都应该是唯一的,并且这种唯一性应该以一致的方式在各个表中表示。

我意识到它永远不会是性能最好的选择,但抛开性能不谈,我想知道是否有反对这种做法的哲学论据?

根据回复,让我澄清一下:

我所说的是始终使用 GUID 代理键作为主键 - 无论是否以及如何在表上设计任何自然键或顺序键。这些是我的假设:

  1. 可以设计基于自然键的数据完整性,但不能假设。
  2. 主键的功能是引用完整性,与性能、排序或数据无关。

Whenever I design a database I automatically start with an auto-generating GUID primary key for each of my tables (excepting look-up tables)

I know I'll never lose sleep over duplicate keys, merging tables, etc. To me it just makes sense philosophically that any given record should be unique across all domains, and that that uniqueness should be represented in a consistent way from table to table.

I realize it will never be the most performant option, but putting performance aside, I'd like to know if there are philosophical arguments against this practice?

Based on the responses let me clarify:

I'm talking about consistently using a GUID surrogate key as a primary key- irrespective of whether and how any natural or sequential keys are designed on a table. These are my assumptions:

  1. Data integrity based on natural keys can be designed for, but not assumed.
  2. A primary key's function is referential integrity, irrespective of performance, sequencing, or data.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(8

千秋岁 2024-09-13 00:05:57

GUID 似乎是主键的自然选择 - 如果您确实必须这样做,您可能会争论将其用作表的主键。

我强烈建议不要这样做,即使用 GUID 列作为聚集键,这是 SQL Server 默认执行的操作,除非您明确告诉它不要这样做。造成这种情况的主要原因确实是性能,它会在路上困扰你......(它会的,相信我 - 只是时间问题) - 而且还浪费资源(SQL Server 中的磁盘空间和 RAM)机)这确实没有必要。

您确实需要区分两个问题:

1)主键是一个逻辑构造 - 唯一且可靠地标识表中每一行的候选键之一。这实际上可以是任何东西——一个 INT、一个 GUID、一个字符串——选择对你的场景最有意义的。

2)聚集键(定义表上“聚集索引”的一列或多列) - 这是一个物理与存储相关的东西,这里是一个小的、稳定、不断增加的数据类型是您的最佳选择 - INT 或 BIGINT 作为您的默认选项。

默认情况下,SQL Server 表上的主键也用作聚簇键 - 但不必如此!我个人看到,将以前基于 GUID 的主键/聚集键分解为两个单独的键 - GUID 上的主(逻辑)键和单独的 INT IDENTITY(1, 1)专栏。

正如金伯利·特里普 - 索引女王 - 和其他人已经说过很多次 - GUID 作为集群键并不是最佳的,因为由于它的随机性,它将导致大量页面和索引碎片以及通常较差的性能。

是的,我知道 - SQL Server 2005 及更高版本中有 newsequentialid() - 但即便如此,它也不是真正完全顺序的,因此也遇到了与 GUID 相同的问题 - 只是稍微不那么突出,所以。

然后还有另一个问题需要考虑:表上的聚集键也将添加到表上每个非聚集索引的每个条目中 - 因此您确实希望确保它尽可能小。通常,具有 2+ 十亿行的 INT 对于绝大多数表来说应该足够了 - 与作为集群键的 GUID 相比,您可以在磁盘和服务器内存中节省数百兆字节的存储空间。

快速计算 - 使用 INT 与 GUID 作为主键和聚集键:

  • 具有 1'000'000 行的基表(3.8 MB 与 15.26 MB)
  • 6 个非聚集索引(22.89 MB 与 91.55 MB)

总计:25 MB 与. 106 MB - 而且仅在一个表上!

还有一些值得深思的东西 - Kimberly Tripp 写的很棒的东西 - 读它,再读它,消化它!这确实是 SQL Server 索引的福音。

Marc

GUIDs may seem to be a natural choice for your primary key - and if you really must, you could probably argue to use it for the PRIMARY KEY of the table.

What I'd strongly recommend not to do is use the GUID column as the clustering key, which SQL Server does by default, unless you specifically tell it not to. The main reason for this is indeed performance, which will come and bite you down the road... (it will, trust me - just a matter of time) - plus also a waste of resources (disk space and RAM in your SQL Server machine) which is really not necessary.

You really need to keep two issues apart:

1) the primary key is a logical construct - one of the candidate keys that uniquely and reliably identifies every row in your table. This can be anything, really - an INT, a GUID, a string - pick what makes most sense for your scenario.

2) the clustering key (the column or columns that define the "clustered index" on the table) - this is a physical storage-related thing, and here, a small, stable, ever-increasing data type is your best pick - INT or BIGINT as your default option.

By default, the primary key on a SQL Server table is also used as the clustering key - but that doesn't need to be that way! I've personally seen massive performance gains when breaking up the previous GUID-based Primary / Clustered Key into two separate key - the primary (logical) key on the GUID, and the clustering (ordering) key on a separate INT IDENTITY(1,1) column.

As Kimberly Tripp - the Queen of Indexing - and others have stated a great many times - a GUID as the clustering key isn't optimal, since due to its randomness, it will lead to massive page and index fragmentation and to generally bad performance.

Yes, I know - there's newsequentialid() in SQL Server 2005 and up - but even that is not truly and fully sequential and thus also suffers from the same problems as the GUID - just a bit less prominently so.

Then there's another issue to consider: the clustering key on a table will be added to each and every entry on each and every non-clustered index on your table as well - thus you really want to make sure it's as small as possible. Typically, an INT with 2+ billion rows should be sufficient for the vast majority of tables - and compared to a GUID as the clustering key, you can save yourself hundreds of megabytes of storage on disk and in server memory.

Quick calculation - using INT vs. GUID as Primary and Clustering Key:

  • Base Table with 1'000'000 rows (3.8 MB vs. 15.26 MB)
  • 6 nonclustered indexes (22.89 MB vs. 91.55 MB)

TOTAL: 25 MB vs. 106 MB - and that's just on a single table!

Some more food for thought - excellent stuff by Kimberly Tripp - read it, read it again, digest it! It's the SQL Server indexing gospel, really.

Marc

酒绊 2024-09-13 00:05:57

杰夫·阿特伍德 (Jeff Atwood) 详细讨论了这一点:
http://www.codinghorror.com/ blog/2007/03/primary-keys-ids-versus-guids.html

Guid 优点:
每个表、每个数据库、每个服务器都是唯一的
允许轻松合并来自不同数据库的记录
允许在多个服务器之间轻松分布数据库
您可以在任何地方生成 ID,而无需往返数据库
无论如何,大多数复制方案都需要 GUID 列

Guid 缺点:
它比传统的 4 字节索引值大 4 倍;如果您不小心,这可能会严重影响性能和存储
调试起来很麻烦(其中 userid='{BAE7DF4-DDF-3RG-5TY3E3RF456AS10}')
生成的 GUID 应部分连续以获得最佳性能(例如,SQL 2005 上的 newsequentialid())并启用聚集索引

Jeff Atwood talks about this in great detail:
http://www.codinghorror.com/blog/2007/03/primary-keys-ids-versus-guids.html

Guid Pros:
Unique across every table, every database, every server
Allows easy merging of records from different databases
Allows easy distribution of databases across multiple servers
You can generate IDs anywhere, instead of having to roundtrip to the database
Most replication scenarios require GUID columns anyway

Guid Cons:
It is a whopping 4 times larger than the traditional 4-byte index value; this can have serious performance and storage implications if you're not careful
Cumbersome to debug (where userid='{BAE7DF4-DDF-3RG-5TY3E3RF456AS10}')
The generated GUIDs should be partially sequential for best performance (eg, newsequentialid() on SQL 2005) and to enable use of clustered indexes

尐偏执 2024-09-13 00:05:57

添加到 ewwwn:

优点

  • 它使开发人员几乎不可能“意外”向用户公开代理键(与整数不同,这种情况几乎总是发生)。
  • 使合并数据库比处理标识列简单几个数量级。

缺点

  • 较胖。它变大的真正问题是它会占用每页更多的空间以及索引中更多的空间,从而使它们变慢。坦率地说,Guids 的额外存储空间在当今世界是无关紧要的。
  • 您绝对必须小心如何创造新价值。真正的随机值不能很好地索引。您必须使用 COMB guid 或某些向 guid 添加顺序元素的变体。

Adding to ewwwn:

Pros

  • It makes it nearly impossible for developers to "accidentally" expose the surrogate key to users (unlike integers where it happens almost all the time).
  • Makes merging databases several orders of magnitude simpler than dealing with identity columns.

Cons

  • Fatter. The real problem with it being fatter is that it eats up more space per page and more space in your indexes making them slower. The additional storage space of Guids is frankly irrelevant in today's world.
  • You absolutely must be careful about how new values are created. Truly random values do not index well. You are compelled to use a COMB guid or some variant that adds a sequential element to the guid.
樱花坊 2024-09-13 00:05:57

您仍然实现每个表的自然键,不是吗? - GUID 密钥本身显然无法防止重复数据、冗余以及随之而来的数据完整性丢失。

假设您确实强制执行其他键,那么无一例外地将 GUID 添加到每个表可能只会增加不必要的复杂性和开销。它并没有真正使合并不同表中的数据变得更容易,因为无论如何您仍然必须修改/删除表中的其他键。我建议您应该根据具体情况评估 GUID 代理的使用。为每个表制定一揽子规则是没有必要或有帮助的,因为毕竟每个表都模拟不同的事物。

You still implement the natural key of each table as well don't you? - GUID keys alone obviously won't prevent duplicate data, redundancy and consequent loss of data integrity.

Assuming that you do enforce other keys then adding GUIDs to every table without exception is probably just adding unnecessary complexity and overhead. It doesn't really make it easier to merge data in different tables because you still have to modify/de-duplicate the other key(s) of the table anyway. I suggest you should evaluate the use of a GUID surrogate on a case-by-case basis. Having a blanket rule for every table isn't necessary or helpful because every table models a different thing after all.

痴情换悲伤 2024-09-13 00:05:57

简单回答:这没有关系。

记录(由 GUID 定义)可能是唯一的,但没有一个关联属性可以说是与该记录唯一出现的。

使用 GUID(或任何纯代理键)并不比声明平面文件具有关系性,因为每个记录都可以通过其行号来标识。

Simple answer: it's not relational.

The record (as defined by the GUID) may be unique, but none of the associated attributes can be said to be occuring uniquely with that record.

Using a GUID (or any purely surrogate key) is no more relational than declaring a flat file to be relational, on the basis that each record can be identified by its row number.

往昔成烟 2024-09-13 00:05:57

一个潜在的重要原因(但通常没有被想到)是,您将来是否必须提供与 Oracle 数据库的兼容性。

由于 Oracle 没有 uniqueid 列数据类型,因此当两个不同数据库中的同一个主键有两种不同的数据类型时,尤其是涉及 ORM 时,可能会导致一场噩梦。

A potentially big reason, but one often not thought of, is if you might have to provide compatibility with an Oracle database in the future.

Since Oracle doesn't have a uniqueid column data type, it can lead to a bit of a nightmare when you have two different data types for the same primary key across two different databases, especially when an ORM is involved.

尐籹人 2024-09-13 00:05:57

我想知道为什么没有标准的“miniGUID”类型?看起来,对 GUID 执行适当的哈希应该会产生一个 64 位数字,在任何没有 10 亿或更多东西的宇宙中,该数字的重复概率微乎其微。由于使用大多数 GUID/miniGUID 标识符的宇宙永远不会超过一百万个,更不用说十亿个了,所以我认为较小的 8 字节 miniGuid 会非常有用。

当然,这并不意味着它应该被用作聚集索引;而是意味着它应该被用作聚集索引。这会极大地影响性能。尽管如此,8 字节的 miniGUID 只会浪费完整 GUID 空间的三分之一(与 4 字节索引相比)。

I wonder why there's no standard "miniGUID" type? It would seem that performing a decent hash on a GUID should yield a 64-bit number which would have a trivial probability of duplication in any universe which doesn't have a billion or more things in it. Since the universe in which most GUID/miniGUID identifiers are used will never grow beyond a million things, much less a billion, I would think a smaller 8-byte miniGuid would be very useful.

That would not, of course, suggest that it should be used as a clustered index; that would greatly impede performance. Nonetheless, an 8-byte miniGUID would only waste a third the space of a full GUID (when compared to a 4-byte index).

冷血 2024-09-13 00:05:57

我可以看到给定应用程序或企业自己的标识符是唯一的,并且在所有自己的域中以一致的方式表示(即因为它们可能跨越多个数据库),但 GUID 是为了这些目的过度杀戮。我想它们之所以受欢迎,是因为它们开箱即用,并且设计和实现“企业密钥”需要时间和精力。设计人工标识符时的规则是使其尽可能简单,但又不简单。 IDENTITY 太简单了,GUID 还不够简单。

应用程序/企业外部存在的实体通常有自己的标识符(例如,汽车有 VIN,书籍有 ISBN 等),由外部可信源维护,在这种情况下,GUID 不添加任何内容。因此,我想反对我在这里得到的哲学论点是,在每个表上使用人工标识符是不必要的。

I can see the case for a given application's or enterprise's own identifiers to be unique and be represented in a consistent way across all its own domains (i.e. because they may span more than one database) but a GUID is overkill for these purposes. I guess they are popular because they are available out of the box and designing and implementing an 'enterprise key' takes time and effort. The rule when designing an artifical identifier is to make it as simple as possible but no simpler. IDENTITY is too simple, a GUID isn't simple enough.

Entities that exist outside of the application/enterprise usually have their own identifiers (e.g. a car has a VIN, a book has an ISBN, etc) maintained by an external trusted source and in such cases the GUID adds nothing. So I guess the philosphical argument against I'm getting at here is that using a artifical identifier on every table is unnecessary.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文