数据库中 ID 字段的 INT 与唯一标识符

发布于 2024-07-27 15:08:29 字数 329 浏览 4 评论 0原文

我正在使用 SQL Server 2005（不久的将来可能是 SQL Server 2008）为网站创建一个新数据库。作为应用程序开发人员，我见过许多数据库使用 integer（或 bigint 等）作为用于关系的表的 ID 字段。但最近我还看到一些数据库使用唯一标识符 (GUID) 作为 ID 字段。

我的问题是一个人比另一个人有优势吗？ integer 字段对于查询和连接等会更快吗？

更新：明确地说，这是表中的主键。

原文

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

马蹄踏│碎落叶 2024-08-03 15:08:29

由于高度随机性，GUID 作为聚集键存在问题。 Paul Randal 在上一期 Technet 杂志问答专栏中解决了此问题：我'我想使用 GUID 作为聚集索引键，但其他人认为这可能会导致索引的性能问题。这是真的吗？如果是，您能解释一下原因吗？

现在请记住，讨论专门针对聚集索引。您说您想使用该列作为“ID”，但不清楚您是将其作为聚集键还是仅作为主键。通常，两者重叠，因此我假设您想将其用作聚集索引。我上面提到的文章的链接解释了为什么这是一个糟糕的选择的原因。

对于非聚集索引，GUID 仍然存在一些问题，但不像它们是表的最左侧聚集键时那么大。同样，GUID 的随机性会引入页面分裂和碎片，无论是在非聚集索引级别（一个小得多的问题）。

关于 GUID 的使用有许多都市传说，它们根据 GUID 的大小（16 字节）与 int（4 字节）相比对它们进行谴责，并承诺如果使用它们，性能会受到严重影响。这有点夸张了。在正确设计的数据模型上，大小为 16 的密钥仍然可以是非常高性能的密钥。虽然确实是 int 的 4 倍大会导致索引中出现更多低密度非叶页，但对于绝大多数表来说，这并不是一个真正值得关注的问题。 B 树结构是一种自然良好的平衡树，并且树遍历的深度很少成为问题，因此基于 GUID 键而不是 INT 键来查找值在性能上是相似的。叶页遍历（即表扫描）不会查看非叶页，并且 GUID 大小对页大小的影响通常非常小，因为记录本身明显大于引入的额外 12 字节通过 GUID。因此，我对基于“是 16 字节与 4 字节”的传闻建议持保留态度。根据具体情况进行分析，并确定大小影响是否会产生真正的差异：表中有多少其他列（即 GUID 大小对叶页的影响有多大）以及有多少列引用正在使用它（即，由于需要存储更大的外键，因此会增加多少其他表）。

我在对 GUID 的一种临时辩护中指出了所有这些细节，因为它们最近受到了很多负面报道，其中一些是不值得的。它们有其优点，并且在任何分布式系统中都是不可或缺的（当您谈论数据移动时，无论是通过复制还是同步框架或其他方式）。我见过由于 GUID 的不良声誉而做出的错误决定，而这些决定是在没有经过适当考虑的情况下被回避的。但确实如此，如果您必须使用 GUID 作为聚集键，请确保解决随机性问题：尽可能使用顺序 guid。

最后，回答您的问题：如果您没有具体理由使用 GUID，请使用 INT。

GUIDs are problematic as clustered keys because of the high randomness. This issue was addressed by Paul Randal in the last Technet Magazine Q&A column: I'd like to use a GUID as the clustered index key, but the others are arguing that it can lead to performance issues with indexes. Is this true and, if so, can you explain why?

Now bear in mind that the discussion is specifically about clustered indexes. You say you want to use the column as 'ID', that is unclear if you mean it as clustered key or just primary key. Typically the two overlap, so I'll assume you want to use it as clustered index. The reasons why that is a poor choice are explained in the link to the article I mentioned above.

For non clustered indexes GUIDs still have some issues, but not nearly as big as when they are the leftmost clustered key of the table. Again, the randomness of GUIDs introduces page splits and fragmentation, be it at the non-clustered index level only (a much smaller problem).

There are many urban legends surrounding the GUID usage that condemn them based on their size (16 bytes) compared to an int (4 bytes) and promise horrible performance doom if they are used. This is slightly exaggerated. A key of size 16 can be a very peformant key still, on a properly designed data model. While is true that being 4 times as big as a int results in more a lower density non-leaf pages in indexes, this is not a real concern for the vast majority of tables. The b-tree structure is a naturally well balanced tree and the depth of tree traversal is seldom an issue, so seeking a value based on GUID key as opposed to a INT key is similar in performance. A leaf-page traversal (ie. a table scan) does not look at the non-leaf pages, and the impact of GUID size on the page size is typically quite small, as the record itself is significantly larger than the extra 12 bytes introduced by the GUID. So I'd take the hear-say advice based on 'is 16 bytes vs. 4' with a, rather large, grain of salt. Analyze on individual case by case and decide if the size impact makes a real difference: how many other columns are in the table (ie. how much impact has the GUID size on the leaf pages) and how many references are using it (ie. how many other tables will increase because of the fact they need to store a larger foreign key).

I'm calling out all these details in a sort of makeshift defense of GUIDs because they been getting a lot of bad press lately and some is undeserved. They have their merits and are indispensable in any distributed system (the moment you're talking data movement, be it via replication or sync framework or whatever). I've seen bad decisions being made out based on the GUID bad reputation when they were shun without proper consideration. But is true, if you have to use a GUID as clustered key, make sure you address the randomness issue: use sequential guids when possible.

And finally, to answer your question: if you don't have a specific reason to use GUIDs, use INTs.

回复收藏 0 原文