SQL Server 上的 Int PK 内连接与 Guid PK 内连接。执行计划
我刚刚对 Int PK join 与 Guid PK 进行了一些测试。
表结构和记录数量如下所示:
在这两种情况下,使用 EF4 进行 CRUD 操作的性能非常相似。
众所周知,在连接中使用 Int PK 比字符串具有更好的性能。 所以带有 INNER JOINS 的 SQL Server 执行计划是完全不同的
这是一个执行计划:
据我了解,根据执行计划从上面看,Int join 具有更好的性能,因为它为聚集索引扫描占用的资源更少,而且它有两种方式,我对吗?
也许有人可以更详细地解释这个执行计划?
这个例子是否足以说明 Int PK 在连接方面有更好的性能?
I just did some testing for Int PK join Vs Guid PK.
Tables structure and number of records looking like that:
Performance of CRUD operations using EF4 are pretty similar in both cases.
There is well known statement that Int PK has better performance rather than strings when used in joins.
So SQL server execution plan with INNER JOINS are completely different
Here is an execution plan:
As i understand according with execution plan from above Int join has better performance because it is taking less resources for Clustered index scan and it is go in two ways, am i right?
May be some one may explain this execution plan in more details?
Is this example is enough to show that Int PK has better performance in joins?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
Kimberly Tripp(索引女王)有一篇关于该主题的优秀博客文章:
磁盘空间很便宜......这不是重点!
她很好地展示了“磁盘空间很便宜 - 使用 GUID”的论点而不是 INT 没有伤害”在很多方面都是完全虚假的。
Kimberly Tripp (the Queen of Indexing) has an excellent blog post on the topic:
Disk space is cheap.... that's not the point!
She nicely shows how the argument of "disk space is cheap - using GUID instead of INT doesn't hurt" is totally bogus in many ways.
我不完全确定我理解您想要实现的目标或从这次测试中找出什么,但当我读到您的问题时,我脑海中突然浮现出一些随机的想法...
1)在现实生活中的用例中,您可能不会将两个整个表连接在一起,但在其他列等上会有过滤器,从而减少一个或两个表中要连接的记录。这将影响哪种类型的连接算法最合适/最有效。
上面的计划是将两个表连接在一起的结果,但如果您要在其他列上过滤一个或两个表,那么优化器可能会采用完全不同的连接类型。
2) 连接 GUID 列时哪种连接类型最好,很大程度上取决于 GUID 的生成方式。如果您要加入大量完全随机的 guid(例如,使用 SQL Server 的 NewID() 或 CLR Guid.NewGuid() 生成),那么哈希联接可能是最佳选择。但是,如果您要连接较小的顺序集合(newsequentialid() / UuidCreateSequential()),甚至是相同的 guid,那么循环连接通常是最有效的选择。
优化器使用索引统计信息来确定要使用的连接类型,但有时对于具有许多 guid 连接的复杂查询,可能需要使用优化器提示强制连接类型。
简而言之,如果您想要决定是否应该使用 GUID 还是 INT PK,那么更真实的测试是更好的选择。创建与您的用例相匹配的表,用大量实际的示例数据填充它们,并执行您设想的一些类型的查询。将两个虚拟表的全部内容连接在一起并不能真正说明使用 Guid 键所带来的 I/O 影响,也不能说明涉及 int 与 guid 键的其他查询的执行计划。
如果使用 Guid 键,请考虑生成它们的不同选项,并记住,如果您要加入大量记录,则使用顺序 guid 通常是避免过多页面读取的好方法...
I'm not totally sure I understand what you are trying to achieve or find out from this test, but here are a few random throughts that popped into my mind as I read your question...
1) In a real life use case you are probably not going to join two entire tables together, but there will be filters on other columns etc, reducing the records to be joined in one or both tables. This will affect what type of join algorithm is most suitable/most effective.
The plans above are the results of joining two tables together, but if you were to filter one or both of the tables on some other column then the optimizer might go for a completely different join type.
2) Which type of join is best when joining GUID columns depend a lot on how the guids are generated. If you're joining a lot of guids that are completely random (e.g. generated with SQL Server's NewID() or CLR Guid.NewGuid()) then a hash join is probably the best choice. If however you're joining a smaller set of sequential (newsequentialid() / UuidCreateSequential()), or even identical guids, then a loop join can often be the most efficient choice.
The optimizer uses index statistics to determine what type of join to use, but sometimes for complex queries with many guid joins it can be necessary to force join type with optimizer hints.
In short, if what you're trying to do is decide whether you should use GUID or INT PKs then a more real-world test is a better choice. Create tables matching your use case, populate them with an ample amount of somewhat realistic sample data and do some of the types of queries you envision that you will be doing down the line. Joining the entire contents of two dummy tables together doesn't really tell anything about the I/O impact you could see from using Guid keys, or what the execution plan will look like for other queries involving int vs guid keys.
If using Guid keys, consider the different options for generating them and keep in mind that using sequential guids is often a good way to avoid excessive page reads if you're joining a lot of records...
如果你考虑一下计算机内部如何比较值,它就会变得显而易见。
单,操作。
几条指令(或一条冗长的指令)
一)。
此外,GUID 使用 4 倍的空间,这将产生更多的分页、更差的缓存使用率等等。
马克提到的金伯利·特里普的帖子证明了这一点。
If you think about how, internally, a computer compares values, it becomes obvious.
single, operation.
several instructions (or one lengthy
one).
In addition, GUIDs use 4 times as much space, which will produce more paging, poorer cache usage and so forth.
Kimberly Tripp's post mentioned by Marc proves this.