关于如何读取SQL执行计划的问题
我已执行查询并包含实际执行计划。我对一个哈希匹配感兴趣,因为它的子树使用索引扫描而不是索引查找。当我将鼠标悬停在该哈希匹配上时,会出现一个名为“探测残差”的部分。我曾以为这就是我所加入的价值观。我在这里是正确的还是有更好的解释来解释这意味着什么?
我的第二个问题是关于它使用的索引。在我的示例中,我非常确定这个特定的连接正在连接两列。正在扫描的索引中包含这两列以及连接中未使用的另一列。我的印象是这会导致索引查找而不是扫描。我在这一点上有误吗?
I have executed a query and included the Actual Execution Plan. There is one Hash Match that is of interest to me because it's subtree uses a Index Scan instead of an index seek. When I mouse over this Hash Match there is a section called "Probe Residual". I had assumed that this is whatever values I am joining on. Am I correct here or is there a better explanation of what that means?
The second question I had is regarding the indexes it uses. In my example I am pretty sure this particular join is joining on two columns. The index that it is Scanning has both of these columns in it as well as another column that is not used in the join. I was under the impression that this would result in an Index Seek rather than a Scan. Am I mistaken on this?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(4)
哈希连接通常(总是?)使用扫描或至少是范围扫描。哈希联接的工作原理是扫描左联接表和右联接表(或表中的范围)并构建一个内存中哈希表,其中包含扫描“看到”的所有值。
在您的情况中发生的情况是这样的:QO 注意到它可以从恰好包含该列(作为键或包含列)的非聚集索引中获取列 C 的所有值。作为非聚集索引可能相当狭窄,因此扫描整个非聚集索引的 IO 总量并不夸张。 QO 还认为系统有足够的 RAM 来在内存中存储哈希表。将此查询的成本(端到端扫描非聚集索引,例如 10000 页)与使用搜索的嵌套循环的成本(例如,每个 2-3 页进行 5000 个探测)的成本进行比较,扫描因需要较少的 IO 而获胜。当然,这很大程度上是我的猜测,但我试图从 QO 的角度来展示这个案例,并且该计划可能是最优的。
促成这一特定计划选择的因素包括:
对于数量较大的估计对于候选者来说,比哈希连接更好的选择就是合并连接,并且该选择需要对输入进行预排序。如果左侧都可以提供保证连接列上的顺序的访问路径,并且右侧也有类似的可能性,那么您最终可能会得到合并连接,这是最快的连接。
A Hash Join will generally (always?) use a scan or at least a range scan. A hash join works by scanning both left and right join tables (or a range in the tables) and building an in-memory hash table that contains all values 'seen' by the scans.
What happened in your case is this: the QO noticed that it can obtain all the values of a column C from a non-clustered index that happens to contain this column (as a key or as an included column). Being a non-clustered index is probably fairly narrow, so the total amount of IO to scan the entire non-clustered index is not exaggerate. The QO also considered that the system has enough RAM to store a hash table in memory. When compared the cost of this query (a scan of a non-clustered index end-to-end for, say, 10000 pages) with the cost of a nested loop that used seeks (say 5000 probes at 2-3 pages each) the scan won as requiring less IO. Of course, is largely speculation on my part, but I'm trying to present the case from the QO point of view, and the plan is likely optimal.
Factors that contributed to this particular plan choice would be:
For a large estimate of the number of candidates, a better choice than the hash join is only the merge-join, and that one requires the input to be presorted. If both the left side can offer an access path that guarantees an order on the joined column and the right side has a similar possibility then you may end up with the merge join, which is the fastest join.
这篇博文可能会回答您的第一个问题。
至于第二个,优化器可能会在多种情况下选择索引扫描。超出我的想象:
如果索引中的大多数行将由查询选择
如果您在查询的 where 子句中使用函数
对于前两种情况,执行扫描会更有效,因此优化器会选择扫描而不是扫描寻找。对于第三种情况,优化器别无选择。
This blog post will probably answer your first question.
As for your second, index scans might be selected by the optimizer in a number of situations. Off the top of my head:
If most of the rows in the index will be selected by the query
If you are using functions in the where clause of your query
For the first two cases, it's more efficient to do a scan, so the optimizer chooses it over a seek. For the third case, the optimizer has no choice.
1/ 哈希匹配意味着它采用等式联接中使用的列的哈希值,但需要包括联接中涉及的所有其他列(对于 > 等),以便也可以检查它们。这就是剩余列发挥作用的地方。2
/ 如果索引查找可以直接转到您想要的行,则可以完成索引查找。也许您正在对列应用计算并使用它?然后它将使用索引作为数据的较小版本,但仍然需要检查每一行(对每一行应用计算)。
1/ A Hash Match means that it takes a hash of columns used in an equality join, but needs to include all the other columns involved in the join (for >, etc) so that they can be checked too. This is where residual columns come in.
2/ An Index Seek can be done if it can go straight to the rows you want. Perhaps you're applying a calculation to the columns and using that? Then it will use the index as a smaller version of the data, but will still need to check every row (applying the calculation on each one).
查看 simple-talk.com 上有关执行计划的优秀文章:
他们还有一本免费的电子书SQL Server 执行计划供下载。
Check out those excellent articles on execution plans on simple-talk.com:
They also have a free e-book SQL Server execution plans for download.