SQL GUID 与整数

发布于 2024-08-31 20:02:35 字数 566 浏览 4 评论 0原文

我最近开始了一项新工作,注意到所有 SQL 表都使用 GUID 数据类型作为主键。

在我之前的工作中,我们使用整数(自动增量)作为主键,在我看来,它更容易使用。

例如,假设您有两个相关的表;产品和产品类型 - 我可以轻松地交叉检查两个表的“ProductTypeID”列中的特定行,以快速映射我头脑中的数据,因为它很容易存储数字(2、4、45 等),而不是(E75B92A3- 3299-4407-A913-C5CA196B3CAB)。

额外的挫败感来自于我想要了解表之间的关系,遗憾的是没有数据库图表:(

很多人说 GUID 更好,因为您可以在 C# 代码中定义唯一标识符,例如使用 NewID() 而不需要要求 SQL SERVER 来执行此操作 - 这也允许您临时知道 ID 是什么......但我发现仍然可以检索“下一个自动递增整数”,

一位 DBA 承包商报告说 。如果我们使用 Integer 类型而不是 GUID,我们的查询速度可能会快 30%...

为什么存在 GUID 数据类型,它真正提供了什么优势?...即使它是某些专业人士的选择为什么实施它一定有一些充分的理由?

I have recently started a new job and noticed that all the SQL tables use the GUID data type for the primary key.

In my previous job we used integers (Auto-Increment) for the primary key and it was a lot more easier to work with in my opinion.

For example, say you had two related tables; Product and ProductType - I could easily cross check the 'ProductTypeID' column of both tables for a particular row to quickly map the data in my head because its easy to store the number (2,4,45 etc) as opposed to (E75B92A3-3299-4407-A913-C5CA196B3CAB).

The extra frustration comes from me wanting to understand how the tables are related, sadly there is no Database diagram :(

A lot of people say that GUID's are better because you can define the unique identifer in your C# code for example using NewID() without requiring SQL SERVER to do it - this also allows you to know provisionally what the ID will be.... but I've seen that it is possible to still retrieve the 'next auto-incremented integer' too.

A DBA contractor reported that our queries could be up to 30% faster if we used the Integer type instead of GUIDS...

Why does the GUID data type exist, what advantages does it really provide?... Even if its a choice by some professional there must be some good reasons as to why its implemented?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(6

隱形的亼 2024-09-07 20:02:35

在某些情况下,GUID 非常适合用作身份字段:

  • 当您有多个 SQL 实例(不同的服务器)并且您需要稍后组合不同的更新而不影响引用完整性时
  • 创建数据的断开连接的客户端 - 这样他们就可以创建数据而不必担心ID 字段已被采用

GUID 被生成为全局唯一的,这就是它们适合此类场景的原因。

GUIDs are good as identity fields in certain cases:

  • When you have multiple instances of SQL (different servers) and you need to combine the different updates later on without affecting referential integrity
  • Disconnected clients that create data - this way they can create data without worrying that the ID field is already taken

GUIDs are generated to be globally unique, which is why they are suited for such scenarios.

ゞ花落谁相伴 2024-09-07 20:02:35

与这里大多数人的鼓吹相反,我认为 GUID 更像是一种瘟疫,而不是一种祝福。原因如下:

GUID 似乎是主键的自然选择 - 如果您确实必须这样做,您可能会争论将其用作表的主键。我强烈建议不要这样做,即使用 GUID 列作为聚集键,这是 SQL Server 默认执行的操作,除非您明确告诉它不要这样做。

您确实需要区分两个问题:

  1. 主键是一个逻辑构造 - 唯一且可靠地标识表中每一行的候选键之一。这实际上可以是任何东西 - 一个 INT、一个 GUID、一个字符串 - 选择对您的场景最有意义的内容。

  2. 聚集键(定义表上“聚集索引”的一列或多列) - 这是一个物理与存储相关的东西,在这里,小、稳定、不断增加的数据类型是您的最佳选择 - INT 或 BIGINT 作为您的默认选项。

默认情况下,SQL Server 表上的主键也用作聚簇键 - 但不必如此!我个人看到,将以前基于 GUID 的主键/聚集键分解为两个单独的键 - GUID 上的主(逻辑)键和单独的 INT IDENTITY(1, 1)专栏。

正如金伯利·特里普 - 索引女王 - 和其他人已经说过很多次 - GUID 作为集群键并不是最佳的,因为由于它的随机性,它将导致大量页面和索引碎片以及通常较差的性能。

是的,我知道 - SQL Server 2005 及更高版本中有 newsequentialid() - 但即便如此,它也不是真正完全顺序的,因此也遇到了与 GUID 相同的问题 - 只是稍微不那么突出,所以。另外,您只能将它用作表中列的默认值 - 您无法在 T-SQL 代码中获取新的顺序 GUID(如触发器或其他东西) - 另一个主要缺点。

然后还有另一个问题需要考虑:表上的聚集键也将添加到表上每个非聚集索引的每个条目中 - 因此您确实希望确保它尽可能小。通常,具有 2+ 十亿行的 INT 对于绝大多数表来说应该足够了 - 与作为集群键的 GUID 相比,您可以在磁盘和服务器内存中节省数百兆字节的存储空间。

快速计算 - 使用 INT 与 GUID 作为主键和聚集键:

  • 具有 1'000'000 行的基表(3.8 MB 与 15.26 MB)
  • 6 个非聚集索引(22.89 MB 与 91.55 MB)

总计:25 MB 与. 106 MB - 而且仅在一张表上!

还有一些值得深思的东西 - Kimberly Tripp 写的很棒的东西 - 读它,再读它,消化它!这确实是 SQL Server 索引的福音。

Marc

Contrary to what most folks here seem to preach, I see GUID's as more of a plague than a blessing. Here's why:

GUIDs may seem to be a natural choice for your primary key - and if you really must, you could probably argue to use it for the PRIMARY KEY of the table. What I'd strongly recommend not to do is use the GUID column as the clustering key, which SQL Server does by default, unless you specifically tell it not to.

You really need to keep two issues apart:

  1. the primary key is a logical construct - one of the candidate keys that uniquely and reliably identifies every row in your table. This can be anything, really - an INT, a GUID, a string - pick what makes most sense for your scenario.

  2. the clustering key (the column or columns that define the "clustered index" on the table) - this is a physical storage-related thing, and here, a small, stable, ever-increasing data type is your best pick - INT or BIGINT as your default option.

By default, the primary key on a SQL Server table is also used as the clustering key - but that doesn't need to be that way! I've personally seen massive performance gains when breaking up the previous GUID-based Primary / Clustered Key into two separate key - the primary (logical) key on the GUID, and the clustering (ordering) key on a separate INT IDENTITY(1,1) column.

As Kimberly Tripp - the Queen of Indexing - and others have stated a great many times - a GUID as the clustering key isn't optimal, since due to its randomness, it will lead to massive page and index fragmentation and to generally bad performance.

Yes, I know - there's newsequentialid() in SQL Server 2005 and up - but even that is not truly and fully sequential and thus also suffers from the same problems as the GUID - just a bit less prominently so. Plus, you can only use it as a default for a column in your table - you cannot get a new sequential GUID in T-SQL code (like a trigger or something) - another major drawback.

Then there's another issue to consider: the clustering key on a table will be added to each and every entry on each and every non-clustered index on your table as well - thus you really want to make sure it's as small as possible. Typically, an INT with 2+ billion rows should be sufficient for the vast majority of tables - and compared to a GUID as the clustering key, you can save yourself hundreds of megabytes of storage on disk and in server memory.

Quick calculation - using INT vs. GUID as Primary and Clustering Key:

  • Base Table with 1'000'000 rows (3.8 MB vs. 15.26 MB)
  • 6 nonclustered indexes (22.89 MB vs. 91.55 MB)

TOTAL: 25 MB vs. 106 MB - and that's just on a single table!

Some more food for thought - excellent stuff by Kimberly Tripp - read it, read it again, digest it! It's the SQL Server indexing gospel, really.

Marc

嗼ふ静 2024-09-07 20:02:35

INT

优点

数字值(特别是整数)在连接、索引和条件中使用时性能更好。
如果显示数值,则应用程序用户更容易理解它们。

缺点

如果您的表很大,它很可能会用完它,并且在某些数值之后将没有其他标识可供使用。

GUID

优点

在整个服务器中是唯一的。

缺点

在连接、索引和条件中使用时,字符串值的性能不如整数值最佳。
比 INT 需要更多的存储空间。

功劳转到:http:// /blog.sqlauthority.com/2010/04/28/sql-server-guid-vs-int-your-opinion/

INT

Advantage:

Numeric values (and specifically integers) are better for performance when used in joins, indexes and conditions.
Numeric values are easier to understand for application users if they are displayed.

Disadvantage:

If your table is large, it is quite possible it will run out of it and after some numeric value there will be no additional identity to use.

GUID

Advantage:

Unique across the server.

Disadvantage:

String values are not as optimal as integer values for performance when used in joins, indexes and conditions.
More storage space is required than INT.

credit goes to : http://blog.sqlauthority.com/2010/04/28/sql-server-guid-vs-int-your-opinion/

和我恋爱吧 2024-09-07 20:02:35

有大量关于使用 GUID 作为 PK 的 Google 文章,几乎所有文章都说了同样的话,即 DBA 承包商所说的——没有 GUID 作为键,查询会更快。

我在实践中看到的主要用途(我们从未将它们用作 PK)是复制。 uniqueidentifier 的 MSDN 页面也有同样的说法。

There are a ton of Google-able articles on using GUIDs as PKs and almost all of them say the same thing your DBA contractor says -- queries are faster without GUIDs as keys.

The primary use I've seen in practice (we've never used them as PKs) is with replication. The MSDN page for uniqueidentifier says about the same.

晨曦÷微暖 2024-09-07 20:02:35

它是全局唯一的,因此表中的每条记录都有一个 GUID,世界上任何类型的其他项目都不会共享该 GUID。如果您需要这种唯一标识(如果您正在复制数据库或组合来自多个源的数据),那么这会很方便。否则,你的 dba 是正确的 - GUID 比整数更大且效率更低,并且你可以加快你的数据库速度(30%?也许......)

It is globally unique, so that each record in your table has a GUID that is shared by no other item of any kind in the world. Handy if you need this kind of exclusive identification (if you are replicating the database, or combining data from multiple source). Otherwise, your dba is correct - GUIDs are much larger and less efficient that integers, and you could speed up your db (30%? maybe...)

和我恋爱吧 2024-09-07 20:02:35

它们基本上使您免于更复杂的使用逻辑

set @InsertID = scope_identity() 

They basically save you from more sometimes complicated logic of using

set @InsertID = scope_identity() 
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文