具有聚集 GUID PK 的 SQL Server 数据库 - 切换聚集索引还是切换到顺序(梳)GUID?
我们有一个数据库,其中所有的 PK 都是 GUID,并且大多数 PK 也是表的聚集索引。我们知道这很糟糕(由于 GUID 的随机性)。因此,这里似乎基本上有两个选择(除了完全将 GUID 作为 PK 丢弃之外,我们不能这样做(至少目前不能))。
- 我们可以将 GUID 生成算法更改为 NHibernate 使用的算法,详细信息请参见 这篇文章,或者
- 我们可以,对于使用最频繁的表,更改为不同的聚集索引,例如 IDENTITY 列,并保留“随机”GUID 作为 PK。
在这种情况下是否可以给出一些一般性建议?
所讨论的应用程序有 500 多个表,目前最大的一个表大约有 150 万行,少数表大约有 500,000 行,其余的表要少得多(其中大多数远低于 10K)。
此外,该应用程序已经安装在多个客户站点,因此我们必须考虑对现有客户可能产生的负面影响。
谢谢!
We have a database in which all the PKs are GUIDs, and most of the PKs are also the clustered index for the table. We know that this is bad (due to the random nature of GUIDs). So, it seems there are basically two options here (short of throwing out GUIDs as PKs altogether, which we cannot do (at least not at this time)).
- We could change the GUID generation algorithm to e.g. the one that NHibernate uses, as detailed in this post, or
- we could, for the tables that are under the heaviest use, change to a different clustered index, e.g. an IDENTITY column, and keep the "random" GUIDs as PKs.
Is it possible to give any general recommendations in such a scenario?
The application in question has 500+ tables, the largest one presently at about 1,5 million rows, a few tables around 500 000 rows, and the rest significantly lower (most of them well below 10K).
Furthermore, the application is installed at several customer sites already, so we have to take any possible negative effects for existing customer into consideration.
Thanks!
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
我的观点很明确:使用 INT IDENTITY 作为集群键。这是迄今为止最好、最优化的集群键,因为它:
顺序 GUID 肯定比常规随机 GUID 好很多,但仍然比 INT 大四倍(16 字节 vs 4 字节) ),如果表中有很多行,并且该表上也有很多非聚集索引,这将是一个因素。聚集键被添加到每个非聚集索引中,因此这显着增加了 16 字节与 4 字节大小的负面影响。更多字节意味着磁盘上和 SQL Server RAM 中的页面更多,从而意味着更多的磁盘 I/O 和 SQL Server 的更多工作。
在适当的情况下,您绝对可以将 GUID 保留为主键 - 但在这种情况下,我强烈建议向该表添加一个单独的 INT IDENTITY 并使该 INT 成为集群键。我自己对许多大型表进行了这样的操作,结果令人惊讶 - 表碎片从 99% 下降到百分之几,而且性能也好得多。
查看 Kimberly Tripp 的精彩系列文章,了解为什么 GUID 在 SQL Server 中作为集群键不好:
Marc
My opinion is clear: use an INT IDENTITY for your clustering key. That's by far the best, most optimal clustering key, because its:
Sequential GUID's are definitely a lot better than regular random GUIDs, but there's still four times larger than an INT (16 vs 4 byte) and this will be a factor if you have lots of rows in your table, and lots of non-clustered indices on that table, too. The clustering key is being added to each and every non-clustered index, so that significantly increases the negative effect of having 16 vs 4 bytes in size. More bytes means more pages on disk and in SQL Server RAM and thus more disk I/O and more work for SQL Server.
You can definitely keep the GUID as the primary key, where appropriate - but in that case, I'd strongly recommend adding a separate INT IDENTITY to that table and make that INT the clustering key. I've done that myself with a number of large tables, and the results are astonishing - the table fragmentation is down from 99 and more percent down to a few percent, and performance is much better.
Check out Kimberly Tripp's excellent series on why GUID's are bad as clustering keys in SQL Server here:
Marc
如果您能够轻松地将 guid 生成更改为顺序 guid 生成,那么这可能是您的快速获胜选择。顺序 guid 将停止表上的碎片,同时保留为聚集索引。然而,顺序引导的主要缺点是它们会变得可猜测,这通常是不希望的,这也是首先使用引导的原因。
如果您沿着聚集主键的身份路径,然后仅在 guid 列上建立索引,那么您的 guid 索引上仍然会出现大量碎片。然而,表将不再碎片化这一事实将是一个巨大的收获。
最后,我知道你说过你现在不能这样做,但是,如果你根本不需要使用 guid 作为索引,那么你就可以消除所有这些问题。
If you are able to change your guid generation to a sequential guid generation easily then that is probably your quick win option. The sequential guid will stop the fragmentation on the table whilst remaining as your clustered index. The major downside with a sequential guid though is that they then become guessable which is often not desired and the reason guids are used in the first place.
If you go down the Identity route for your clustered primary key and then just an index on your guid column then you will still get a lot of fragmentation on your guid index. However the fact that the table will no longer get fragmented will be a massive gain.
Finally though, I know you said you can't do this for now, but, if you don't NEED to use guids as an index at all then you remove all of these problems.