数据库中 ID 字段的 INT 与唯一标识符

发布于 2024-07-27 15:08:29 字数 329 浏览 4 评论 0原文

我正在使用 SQL Server 2005(不久的将来可能是 SQL Server 2008)为网站创建一个新数据库。 作为应用程序开发人员,我见过许多数据库使用 integer(或 bigint 等)作为用于关系的表的 ID 字段。 但最近我还看到一些数据库使用唯一标识符 (GUID) 作为 ID 字段。

我的问题是一个人比另一个人有优势吗? integer 字段对于查询和连接等会更快吗?

更新:明确地说,这是表中的主键。

I am creating a new database for a web site using SQL Server 2005 (possibly SQL Server 2008 in the near future). As an application developer, I've seen many databases that use an integer (or bigint, etc.) for an ID field of a table that will be used for relationships. But lately I've also seen databases that use the unique identifier (GUID) for an ID field.

My question is whether one has an advantage over the other? Will integer fields be faster for querying and joining, etc.?

UPDATE: To make it clear, this is for a primary key in the tables.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(6

马蹄踏│碎落叶 2024-08-03 15:08:29

由于高度随机性,GUID 作为聚集键存在问题。 Paul Randal 在上一期 Technet 杂志问答专栏中解决了此问题:我'我想使用 GUID 作为聚集索引键,但其他人认为这可能会导致索引的性能问题。 这是真的吗?如果是,您能解释一下原因吗?

现在请记住,讨论专门针对聚集索引。 您说您想使用该列作为“ID”,但不清楚您是将其作为聚集键还是仅作为主键。 通常,两者重叠,因此我假设您想将其用作聚集索引。 我上面提到的文章的链接解释了为什么这是一个糟糕的选择的原因。

对于非聚集索引,GUID 仍然存在一些问题,但不像它们是表的最左侧聚集键时那么大。 同样,GUID 的随机性会引入页面分裂和碎片,无论是在非聚集索引级别(一个小得多的问题)。

关于 GUID 的使用有许多都市传说,它们根据 GUID 的大小(16 字节)与 int(4 字节)相比对它们进行谴责,并承诺如果使用它们,性能会受到严重影响。 这有点夸张了。 在正确设计的数据模型上,大小为 16 的密钥仍然可以是非常高性能的密钥。 虽然确实是 int 的 4 倍大会导致索引中出现更多低密度非叶页,但对于绝大多数表来说,这并不是一个真正值得关注的问题。 B 树结构是一种自然良好的平衡树,并且树遍历的深度很少成为问题,因此基于 GUID 键而不是 INT 键来查找值在性能上是相似的。 叶页遍历(即表扫描)不会查看非叶页,并且 GUID 大小对页大小的影响通常非常小,因为记录本身明显大于引入的额外 12 字节通过 GUID。 因此,我对基于“是 16 字节与 4 字节”的传闻建议持保留态度。 根据具体情况进行分析,并确定大小影响是否会产生真正的差异:表中有多少其他列(即 GUID 大小对叶页的影响有多大)以及有多少列引用正在使用它(即,由于需要存储更大的外键,因此会增加多少其他表)。

我在对 GUID 的一种临时辩护中指出了所有这些细节,因为它们最近受到了很多负面报道,其中一些是不值得的。 它们有其优点,并且在任何分布式系统中都是不可或缺的(当您谈论数据移动时,无论是通过复制还是同步框架或其他方式)。 我见过由于 GUID 的不良声誉而做出的错误决定,而这些决定是在没有经过适当考虑的情况下被回避的。 但确实如此,如果您必须使用 GUID 作为聚集键,请确保解决随机性问题:尽可能使用顺序 guid

最后,回答您的问题:如果您没有具体理由使用 GUID,请使用 INT。

GUIDs are problematic as clustered keys because of the high randomness. This issue was addressed by Paul Randal in the last Technet Magazine Q&A column: I'd like to use a GUID as the clustered index key, but the others are arguing that it can lead to performance issues with indexes. Is this true and, if so, can you explain why?

Now bear in mind that the discussion is specifically about clustered indexes. You say you want to use the column as 'ID', that is unclear if you mean it as clustered key or just primary key. Typically the two overlap, so I'll assume you want to use it as clustered index. The reasons why that is a poor choice are explained in the link to the article I mentioned above.

For non clustered indexes GUIDs still have some issues, but not nearly as big as when they are the leftmost clustered key of the table. Again, the randomness of GUIDs introduces page splits and fragmentation, be it at the non-clustered index level only (a much smaller problem).

There are many urban legends surrounding the GUID usage that condemn them based on their size (16 bytes) compared to an int (4 bytes) and promise horrible performance doom if they are used. This is slightly exaggerated. A key of size 16 can be a very peformant key still, on a properly designed data model. While is true that being 4 times as big as a int results in more a lower density non-leaf pages in indexes, this is not a real concern for the vast majority of tables. The b-tree structure is a naturally well balanced tree and the depth of tree traversal is seldom an issue, so seeking a value based on GUID key as opposed to a INT key is similar in performance. A leaf-page traversal (ie. a table scan) does not look at the non-leaf pages, and the impact of GUID size on the page size is typically quite small, as the record itself is significantly larger than the extra 12 bytes introduced by the GUID. So I'd take the hear-say advice based on 'is 16 bytes vs. 4' with a, rather large, grain of salt. Analyze on individual case by case and decide if the size impact makes a real difference: how many other columns are in the table (ie. how much impact has the GUID size on the leaf pages) and how many references are using it (ie. how many other tables will increase because of the fact they need to store a larger foreign key).

I'm calling out all these details in a sort of makeshift defense of GUIDs because they been getting a lot of bad press lately and some is undeserved. They have their merits and are indispensable in any distributed system (the moment you're talking data movement, be it via replication or sync framework or whatever). I've seen bad decisions being made out based on the GUID bad reputation when they were shun without proper consideration. But is true, if you have to use a GUID as clustered key, make sure you address the randomness issue: use sequential guids when possible.

And finally, to answer your question: if you don't have a specific reason to use GUIDs, use INTs.

唔猫 2024-08-03 15:08:29

GUID 将占用更多空间并且比 int 慢 - 即使您使用 newsequentialid() 函数。 如果您要进行复制或使用同步框架,您几乎必须使用 guid。

The GUID is going to take up more space and be slower than an int - even if you use the newsequentialid() function. If you are going to do replication or use the sync framework you pretty much have to use a guid.

栀子花开つ 2024-08-03 15:08:29

INT 为 4 字节,BIGINT 为 8 字节,GUIDS 为 16 字节。 表示数据所需的空间越多,处理数据所需的资源就越多——磁盘空间、内存等。因此(a)它们速度较慢,但​​(b)这可能仅在数据量成为问题时才重要(数百万个数据) 。

GUID 的优点是它们(几乎)是全局唯一的 使用正确的算法生成一个 guid(SQL Server xxxx 将使用正确的算法),并且没有两个 guid 是相似的——无论您有多少台计算机生成它们,无论生成它们的频率如何。 (这在使用 72 年之后不再适用——我忘记了细节。)

如果您需要跨多个服务器生成唯一标识符,GUID 可能会很有用。 如果您需要 mondo 性能且值低于 20 亿,那么整数可能就可以了。 最后,也许是最重要的,如果您的数据具有自然键,请坚持使用它们并忘记代理值。

INTs are 4 bytes, BIGINTs ar 8 bytes, and GUIDS are 16 bytes. The more space required to represent the data, the more resources required to process it -- disk space, memory, etc. So (a) they're slower, but (b) this probably only matters if volume is an issue (millions of rows, or thousands of transactions in very, very little time.)

The advantage of GUIDs is that they are (pretty much) Globally Unique. Generate a guid using the proper algorithm (and SQL Server xxxx will use the proper algorithm), and no two guids will ever be alike--no matter how many computers you have generating them, no matter how frequently. (This does not apply after 72 years of usage--I forget the details.)

If you need unique identifiers generated across multiple servers, GUIDs may be useful. If you need mondo perforance and under 2 billion values, ints are probably fine. Lastly and perhaps most importantly, if your data has natural keys, stick with them and forget the surrogate values.

や三分注定 2024-08-03 15:08:29

如果你确实必须有一个唯一的 ID,那么 GUID。 这意味着如果您要合并、同步、复制,您可能应该使用 GUID。

对于不太健壮的东西,一个 int 应该足够了,具体取决于表将增长的大小。

与大多数情况一样,正确的答案是,这取决于情况。

if you positively, absolutely have to have a unique ID, then GUID. Meaning if you're ever gonna merge, sync, replicate, you probably should use a GUID.

For less robust things, an int, should suffice depending upon how large the table will grow.

As in most cases, the proper answer is, it depends.

人海汹涌 2024-08-03 15:08:29

将它们用于复制等,作为主键。

金伯利·L·特里普文章

  • 反对:空格、非严格单调、页面分割、书签/RID 等
  • 对于:呃...

Use them for replication etc, not as primary keys.

Kimberly L Tripp article

  • Against: Space, not strictly monotonic, page splits, bookmark/RIDs etc
  • For: er...
瞄了个咪的 2024-08-03 15:08:29

完全同意 JBrooks 的观点。
我想说的是,当您的表很大并且您使用带有 JOINS 的选择时,尤其是对于派生表,使用 GUID 会显着降低性能。

Fully agreed with JBrooks.
I want to say that when your table is large, and you use selects with JOINS, especially with derived tables, using GUIDs can significally decrease performance.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文