主键 - 本机键、序列键或 GUID 键?

发布于 2024-07-29 02:12:06 字数 920 浏览 3 评论 0原文

在阅读,然后阅读 这个 (具有讽刺意味的是,它引用了其他两个)我发现自己想知道这个主题的讨论有多大? 我是一个 SQL Server 人员,所以我倾向于使用以 int 形式自动生成的 Identity。 然而,当我知道我需要在服务器和服务器之间进行某种形式的复制或在客户端和服务器之间进行同步时,我倾向于使用 GUID 作为密钥。

问题:我是否应该始终使用 GUID 作为所有表的主键,以防万一我将来可能需要这种可能的可扩展性? 这是否使我的架构更加灵活,因为它可以随时在平台之间迁移? 这是否可以帮助我通过不嵌入平台特定功能来保持 ORM 的灵活性(无论风格如何)?

回复:

@David Archer:根据您的评论,我更新了我的帖子,不再说“自然钥匙”。 您是正确的,自然键定义为这样。 感谢您的指正。

In reading this and this and then reading this (which references the other two ironically) I find myself wondering just how big the discussion of this topic is? I am a SQL Server guy and so I tend to use an Identity that is auto generated in the form of an int. However, when I know that I will need some form of replication between server and server or synchronization between client and server I tend to use a GUID as my key.

Question: Should I just use a GUID as my primary key across all tables at all times just in case I might need this possible scalability down the road? Does this make my schema more flexible in that it can be migrated between platforms at all times? Does this help me maintain the flexibility of my ORM (regardless of flavor) by not embedding platform specific features?

Responses:

@David Archer: Per your comment I updated my post to not say "Natural Key". Your are correct in that natural key is defined as such. Thanks for the correction.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(6

帅哥哥的热头脑 2024-08-05 02:12:06

我倾向于更喜欢应用程序生成的主键,通常使用 NHibernate 实现的 lo/hi 算法(当我在项目中使用它时)。 否则,顺序 GUID 也能正常工作。 这不仅仅是我的建议,而是 几个拥有 做这整个开发工作的时间比我自己长得多。

我看到使用数据库生成的主键的问题是,您必须访问数据库才能获取这些标识值,而不是在将其保存到数据库之前设置好所有内容。 由于这个事实,它通常也会破坏 NHibernate 中的工作单元模式。 如果您没有在应用程序中使用 UoW 模式,那么显然这个缺点不适用。

如果您使用 GUID 进行 PK,那么您肯定希望使用顺序 GUID 来消除索引碎片。 这也为您提供了另一张海报提到的“粗略排序顺序”,尽管我通常有一个 DateInserted 列或类似的此类内容。

加入 GUID 列已被证明性能相当低与 4 字节整数相比,开销更大,我敢说,对于非大型数据集,性能差异微不足道。

自然键是魔鬼的产物。 :)

I tend to prefer application-generated primary keys, typically using the lo/hi algorithm as implemented by NHibernate (when I'm using it on a project). Otherwise, sequential GUIDs work just as well. This isn't just my advice but rather of several folks who have been doing this whole development thing a lot longer than myself.

The problem I see with using DB generated primary keys is that you have to hit the database to get those identity values versus having everything set up before you persist it to the database. It typically breaks the Unit of Work pattern in NHibernate as well due to this fact. If you're not using the UoW pattern in your app, then obviously this drawback doesn't apply.

If you are using GUIDs for your PK, you definitely want to use sequential GUIDs to eliminate index fragmentation. This also gives you the "rough sort order" that another poster mentioned although I'd typically have a DateInserted column or similiar for those types of things.

Joining on a GUID column has been shown to have a fairly minimal performance overhead versus your 4-byte integer and I'd venture to say that for non-large datasets, the performance difference is trivial.

Natural keys are the spawn of the devil. :)

青巷忧颜 2024-08-05 02:12:06

您可能不应该使用原始 GUID 作为主键。 这样做会导致数据大量碎片化。 SQL Server 有一个 函数 为您提供“顺序 guid”帮助缓解这个问题。 此处对此主题进行了深入讨论。 对此的另一个精彩讨论是 这里 ...

这显示了碎片量
对于随机指南来说非常重要
(建议“碎片化
以百分比表示”应该接近于零
尽可能)。 使用的页数
通过随机引导提高了 40%,并且
每页使用的空间量是
更少,因此磁盘空间
所需的量将会增加。

You should probably not use raw GUIDs as your primary key. Doing so leads to substantial fragmentation of your data. SQL Server has a function to give you a "sequential guid" to help mitigate this problem. There is a good discussion of this topic in depth here. Another excellent discussion of that is here ...

This shows the amount of fragmentation
for random guids is very significant
(It is recomended that "Fragmentation
in Percent" should be as near to zero
as possible). The number of pages used
by random guid is 40% higher and the
amount of space used on each page is
less, therefore the disc space
required will increase.

帅气称霸 2024-08-05 02:12:06

我会避免使用 GUIDS 作为主键,除非您知道您确实需要它(即用于多系统同步等)。

在 SQL Server 复制领域,复制表中的行会添加一个 guid 以实现唯一性,因此如果您有需要,以后很有可能建立这种设计。

至于碎片,还要考虑磁盘空间的成本。 如果您的行数要低于 10,000 行(在表中),这可能不是一个大问题,但如果您的系统必须支持超过 10,000 行(在表中),您会发现性能和磁盘存储成本(以及索引碎片)使用 Big Ints(大整数)+identity(自动编号)可以更好地满足体积的需求。

我会完全避免使用自然键 - 即使它们周围的逻辑发生变化的风险也会使其风险太大(例如,如果它们突然变得不唯一)。

I'd avoid GUIDS for Primary Keys unless you know you are really going to need it (i.e. for multi-system synchronization, etc).

In the land of SQL Server replication, a guid is added to rows in replicated tables to achieve uniqueness, so it's quite possible to establish this design later if you have the need.

As to fragmentation, also consider the cost to your disk space. If you are going to be under 10,000 rows (in a table) this is probably not a huge problem but if your system has to support above 10,000 rows (in a table) you'll find performance and disk storage cost (and index frangmentation) is better served by the use of Big Ints (large integers) + identity (autonumber) which scale well to volume.

I'd avoid natural keys altogether - even the risk of logic changing around them makes it too risky IMHO (e.g. if they suddenly become non-unique).

情深如许 2024-08-05 02:12:06

我支持大多数其他回答者的说法,即您应该避免将 GUID 作为 SQL Server 中的聚集键 - 如果您确实愿意,您可以将它们用作主键,但不要在其上聚集您的表。

主键是唯一标识每一行的键的逻辑概念 - 在这里,GUID 很有意义,因为它几乎可以保证是唯一的。

但聚集键是一个物理概念,它对表中的行进行物理排序,由于其随机性,GUID 不太适合。 即使您一遍又一遍地重新组织索引(以及表数据),这也会导致大量索引碎片,从而导致性能不佳。

此外,由于聚集索引键被用作查找表中的行的查找值,因此它也将被添加到表中每个非聚集索引的每个条目中,这里的大小GUID(16 字节)与 INT(4 字节)开始发挥作用 - 您可能会浪费大量空间来跟踪查找值。

我所知道的关于主/聚集索引和 GUID 的最佳讨论是 Kim Tripp 的几篇文章,他是 SQL Server 领域的索引女王 - 请查看它们!

她对聚集索引的最终要求是:小、稳定、唯一,并且希望能够不断增加。 GUID 违反了其中两个(较小且不断增加)。 即使是 SQL Server 中的 NEWSEQUENTIALGUID() 函数生成的 GUID 也不是完全、真正连续的 - 所以我也不会使用它们。

马克

I support most of the other answerers in saying you should avoid GUIDs as your clustered key in SQL Server - if you really want to, you could use them as primary key, but don't cluster your table on it.

The primary key is the logical concept of a key to uniquely identify each row - here, a GUID can make sense since it's pretty much guaranteed to be unique.

But the clustered key is a physical concept which physically orders the rows in the table, and here due to their random nature, GUIDs are poorly suited. This will lead to massive index fragmentation and thus to poor performance, even if you keep reorganizing your index (and thus table data) over and over again.

Furthermore, since the clustered index key is being used as the lookup value to find the row in the table, it will be added to each and every entry of each and every non-clustered index on your table, too, and here the size of the GUID (16 bytes) vs. INT (4 bytes) comes into play - you potentially waste a lot of space just for keeping track of the lookup values.

The best discussion of primary / clustered indices and GUIDs I know of is the couple of article by Kim Tripp, the Queen of Indexing in SQL Server land - check them out!

Her ultimate requirements for a clustered index are: small, stable, unique, and hopefully ever-increasing. GUID's violate two of those (small and ever-increasing). Even the GUIDs generated by the NEWSEQUENTIALGUID() function in SQL Server aren't totally and truly sequential - so I wouldn't use those either.

Marc

笔芯 2024-08-05 02:12:06

我已经被“自然键”的改变或重复所困扰过太多次,以至于从未考虑过使用它们。 我决定是否使用序列或 GUID 作为键取决于我是否希望阅读或说出其中之一。

I have been burned too many times by "natural keys" changing or being duplicated to ever consider using them. My decision whether to use sequences or GUIDs for keys is determined by whether I expect to ever read or speak one of them.

丢了幸福的猪 2024-08-05 02:12:06

我对此没有太多经验,但使用 GUID 加入让我感到畏缩。 4 字节与 36 字节相比似乎很糟糕。

不过,我已经开始使用 GUID 作为公共标识符,而不是身份字段本身。 看一下上面的 URL,1156712。如果由于某种原因 SO 必须与另一个类似的应用程序(例如 SU)合并,这些问题 id 会发生冲突,必须更改它的 URL,从而弄乱任何硬编码链接,并且可能谷歌统计也是如此。 然而,如果公开标识每个元素的方式是通过使用 GUID 并且内部联接使用 int 或 bigint 字段,那么您可以两全其美。

使用这种方法仍然可以进行合并。 如果发现冲突,可以动态生成新的内部标识符,而不会中断应用程序的其余部分。

I haven't had much experience with this but using a GUIDs to join makes me cringe. 4 bytes versus 36 seems icky.

However I have begun using GUIDs as public identifiers rather than the identity fields themselves. Take a look at the URL above, 1156712. If for some reason SO had to be merged with another similar application (say SU) these question ids would collide one or the other would have to change it's URL messing up any hard coded links and probably google stats as well. Whereas if the way every element was publicly identified was through the use of a GUID and internal joins used the int or bigint fields you could have the best of both worlds.

Merges are still possible too using this approach. If a conflict is found a new internal identifier can be generated on the fly without disrupting the rest of the application.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文