Guid 主键/外键困境 SQL Server
我面临着将主键从 int identities 更改为 Guid 的困境。我会直接提出我的问题。这是一款典型的零售管理应用程序,具有 POS 和后台功能。大约有100张桌子。该数据库与其他数据库同步并接收/发送新数据。
大多数表不会频繁执行插入、更新或选择语句。然而,有些确实有频繁的插入和选择,例如。产品和订单表。
有些表中最多有 4 个外键。如果我将主键从“int”更改为“Guid”,则在从具有许多外键的表中插入或查询数据时是否会出现性能问题。我知道有人说过索引会碎片化并且 16 字节是一个问题。
在我的情况下,空间不会成为问题,显然索引碎片也可以使用“NEWSEQUENTIALID()”函数来处理。有人可以告诉我,根据经验,Guid 在具有许多外键的表中是否会出现问题。
我将非常感谢你对此的想法......
I am faced with the dilemma of changing my primary keys from int identities to Guid. I'll put my problem straight up. It's a typical Retail management app, with POS and back office functionality. Has about 100 tables. The database synchronizes with other databases and receives/ sends new data.
Most tables don't have frequent inserts, updates or select statements executing on them. However, some do have frequent inserts and selects on them, eg. products and orders tables.
Some tables have upto 4 foreign keys in them. If i changed my primary keys from 'int' to 'Guid', would there be a performance issue when inserting or querying data from tables that have many foreign keys. I know people have said that indexes will be fragmented and 16 bytes is an issue.
Space wouldn't be an issue in my case and apparently index fragmentation can also be taken care of using 'NEWSEQUENTIALID()' function. Can someone tell me, from there experience, if Guid will be problematic in tables with many foreign keys.
I'll be much appreciative of your thoughts on it...
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(6)
GUID 似乎是主键的自然选择 - 如果您确实必须这样做,您可能会争论将其用作表的主键。我强烈建议不要这样做,即使用 GUID 列作为聚集键,这是 SQL Server 默认执行的操作,除非您明确告诉它不要这样做。
您确实需要区分两个问题:
1)主键是一个逻辑构造 - 唯一且可靠地标识表中每一行的候选键之一。这实际上可以是任何东西——一个 INT、一个 GUID、一个字符串——选择对你的场景最有意义的。
2)聚集键(定义表上“聚集索引”的一列或多列) - 这是一个物理与存储相关的东西,这里是一个小的、稳定、不断增加的数据类型是您的最佳选择 - INT 或 BIGINT 作为您的默认选项。
默认情况下,SQL Server 表上的主键也用作聚簇键 - 但不必如此!我个人看到,将以前基于 GUID 的主键/聚集键分解为两个单独的键 - GUID 上的主(逻辑)键和单独的 INT IDENTITY(1, 1)专栏。
正如金伯利·特里普 - 索引女王 - 和其他人已经说过很多次 - GUID 作为集群键并不是最佳的,因为由于它的随机性,它将导致大量页面和索引碎片以及通常较差的性能。
是的,我知道 - SQL Server 2005 及更高版本中有
newsequentialid()
- 但即便如此,它也不是真正完全顺序的,因此也遇到了与 GUID 相同的问题 - 只是稍微不那么突出,所以。然后还有另一个问题需要考虑:表上的聚集键也将添加到表上每个非聚集索引的每个条目中 - 因此您确实希望确保它尽可能小。通常,具有 2+ 十亿行的 INT 对于绝大多数表来说应该足够了 - 与作为集群键的 GUID 相比,您可以在磁盘和服务器内存中节省数百兆字节的存储空间。
快速计算 - 使用 INT 与 GUID 作为主键和聚集键:
总计:25 MB 与. 106 MB - 而且仅在一个表上!
还有一些值得深思的东西 - Kimberly Tripp 写的很棒的东西 - 读它,再读它,消化它!这确实是 SQL Server 索引的福音。
因此,如果您确实必须将主键更改为 GUID - 尝试确保主键不是聚集键,并且表上仍然有一个 INT IDENTITY 字段用作聚集键。否则,你的表现肯定会下降并受到严重打击。
GUIDs may seem to be a natural choice for your primary key - and if you really must, you could probably argue to use it for the PRIMARY KEY of the table. What I'd strongly recommend not to do is use the GUID column as the clustering key, which SQL Server does by default, unless you specifically tell it not to.
You really need to keep two issues apart:
1) the primary key is a logical construct - one of the candidate keys that uniquely and reliably identifies every row in your table. This can be anything, really - an INT, a GUID, a string - pick what makes most sense for your scenario.
2) the clustering key (the column or columns that define the "clustered index" on the table) - this is a physical storage-related thing, and here, a small, stable, ever-increasing data type is your best pick - INT or BIGINT as your default option.
By default, the primary key on a SQL Server table is also used as the clustering key - but that doesn't need to be that way! I've personally seen massive performance gains when breaking up the previous GUID-based Primary / Clustered Key into two separate key - the primary (logical) key on the GUID, and the clustering (ordering) key on a separate INT IDENTITY(1,1) column.
As Kimberly Tripp - the Queen of Indexing - and others have stated a great many times - a GUID as the clustering key isn't optimal, since due to its randomness, it will lead to massive page and index fragmentation and to generally bad performance.
Yes, I know - there's
newsequentialid()
in SQL Server 2005 and up - but even that is not truly and fully sequential and thus also suffers from the same problems as the GUID - just a bit less prominently so.Then there's another issue to consider: the clustering key on a table will be added to each and every entry on each and every non-clustered index on your table as well - thus you really want to make sure it's as small as possible. Typically, an INT with 2+ billion rows should be sufficient for the vast majority of tables - and compared to a GUID as the clustering key, you can save yourself hundreds of megabytes of storage on disk and in server memory.
Quick calculation - using INT vs. GUID as Primary and Clustering Key:
TOTAL: 25 MB vs. 106 MB - and that's just on a single table!
Some more food for thought - excellent stuff by Kimberly Tripp - read it, read it again, digest it! It's the SQL Server indexing gospel, really.
So if you really must change your primary keys to GUIDs - try to make sure the primary key isn't the clustering key, and you still have an INT IDENTITY field on the table that is used as the clustering key. Otherwise, your performance is sure to tank and take a severe hit .
使用 guid 而不是 int 的缺点:
在连接、索引和条件中使用时,字符串值的性能不如整数值。比 INT 需要更多的存储空间。
生成的 GUID 应该部分连续以获得最佳性能(例如,SQL 2005 上的 newsequentialid())并启用聚集索引
以获取更多详细信息:
http://www.codinghorror.com/blog/2007/03/primary-keys-ids-versus-guids.html
http://blog .sqlauthority.com/2010/04/28/sql-server-guid-vs-int-your-opinion/
Disadvantage of using guid over int:
String values are not as optimal as integer values for performance when used in joins, indexes and conditions. More storage space is required than INT.
The generated GUIDs should be partially sequential for best performance (eg, newsequentialid() on SQL 2005) and to enable use of clustered indexes
for more detail :
http://www.codinghorror.com/blog/2007/03/primary-keys-ids-versus-guids.html
http://blog.sqlauthority.com/2010/04/28/sql-server-guid-vs-int-your-opinion/
我的看法是:在内部使用 autoincrement int 作为 PK,并在每个主表上有一个唯一的 Guid 列,用于跨数据库移动行。
导出数据时加入此列,不导出int,导入数据时映射回int。
特别是在大容量的情况下,int 更小且更快。
My take is: Use autoincrement int as PK on the inside and have a unique Guid column on each primary table that you use to move rows across databases.
Join this column when you export data, do not export the int, and map it back to int when you import data.
Especially in large volumes, int are much smaller and faster.
相对于整数,GUID 确实会对性能产生影响,但这种影响可能很小,具体取决于您的应用程序,因此如果不进行测试就无法确定。我曾经将一个应用程序从整数转换为 GUID,其中有一些非常大的表,其中有许多外键,同时进行大量修改和查询(每天要处理数十万条记录)。通过探查器运行时,速度会变慢,但从用户的角度来看并没有明显的差异。
所以答案是“视情况而定”。就像所有与性能有关的事情一样,在尝试之前您无法真正确定。
GUIDs do have a performance impact relative to ints, but that impact may be minimal depending on your application so there's no way to be certain without testing. I once converted over an application from ints to GUIDs with some very large tables with many foreign keys doing both very heavy modifications and queries (on the order of hundreds of thousands of records turning over daily.) Things were a slower when run through a profiler, but there wasn't a noticeable difference from the user's perspective.
So the answer is "it depends." Like all things dealing with performance, you can't really be sure until you try it.
使用 GUID 或 int 进行 PK 实际上取决于场景。从 INT 更改为 GUID 会对性能造成影响。 GUID 比 INT 大 4 倍。 这里有一篇好文章关于使用 GUID 的优点和缺点。
为什么你必须从 Integers 进行更改?
Using GUID or int for PK really depends on the scenario. There will be a performance hit changing from INT to GUID. GUID are 4 times bigger than an INT. There is a good article here about the pros and cons of using GUIDs.
Why do you have to change from Integers anyway?
本斯·埃格·本泽尔兹(Bir kod Kullanmamız gerekli durumlarda kullanılabilir)。 Ama Performansa etkisinin göz önünde bulundurulmalıdır。
身份是 pk ve fk olarak kullanırken 执行 açından daha iyidir。
Bu yüzden duruma bağlı olarak guid ya clustered key kullanımı yapabiliriz。
bence eğer benzersiz bir kod kullanmamız gerekli durumlarda kullanılabilir. Ama performansa etkisinin göz önünde bulundurulmalıdır.
Identıty bir pk ve fk olarak kullanırken performans açısından daha iyidir.
Bu yüzden duruma bağlı olarak guid ya clustered key kullanımı yapabiliriz.