相似查询的不同执行计划

发布于 2024-09-04 22:41:18 字数 617 浏览 7 评论 0原文

我正在运行两个非常相似的更新查询,但由于我不知道的原因,它们使用完全不同的执行计划。通常这不会成为问题,但它们都更新完全相同数量的行,但其中一个使用的执行计划远不如另一个,4 秒 vs 2 分钟,当扩大规模时,这给我带来了一个大问题。

这两个查询之间的唯一区别是一个使用列 CLI,另一个使用 DLI。这些列的数据类型完全相同,索引也完全相同,但对于 DLI 查询执行计划,不使用索引。

非常感谢任何关于为什么会发生这种情况的帮助。

-- Query 1
UPDATE a
 SET DestKey = ( 
 SELECT TOP 1 b.PrefixKey 
 FROM refPrefixDetail AS b 
 WHERE a.DLI LIKE b.Prefix + '%' 
 ORDER BY len(b.Prefix) DESC )
FROM CallData AS a

-- Query 2
UPDATE a
 SET DestKey = ( 
 SELECT TOP 1 b.PrefixKey 
 FROM refPrefixDetail b 
 WHERE a.CLI LIKE b.Prefix + '%' 
 ORDER BY len(b.Prefix) DESC )
FROM CallData AS a

I am running two very similar update queries but for a reason unknown to me they are using completely different execution plans. Normally this wouldn't be a problem but they are both updating exactly the same amount of rows but one is using an execution plan that is far inferior to the other, 4 secs vs 2 mins, when scaled up this is causing me a massive problem.

The only difference between the two queries is one is using the column CLI and the other DLI. These columns are exactly the same datatype, and are both indexed exactly the same, but for the DLI query execution plan, the index is not used.

Any help as to why this is happening is much appreciated.

-- Query 1
UPDATE a
 SET DestKey = ( 
 SELECT TOP 1 b.PrefixKey 
 FROM refPrefixDetail AS b 
 WHERE a.DLI LIKE b.Prefix + '%' 
 ORDER BY len(b.Prefix) DESC )
FROM CallData AS a

-- Query 2
UPDATE a
 SET DestKey = ( 
 SELECT TOP 1 b.PrefixKey 
 FROM refPrefixDetail b 
 WHERE a.CLI LIKE b.Prefix + '%' 
 ORDER BY len(b.Prefix) DESC )
FROM CallData AS a

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

蓝天 2024-09-11 22:41:18

检查表中这两列的统计信息(列的数据值如何分布在所有行中)。这可能会解释差异...这些列之一可能具有值的分布,这可能会导致查询在处理过程中需要检查比其他查询所需的行数要多得多的行数(该数字或者更新的行由 Top 1 部分控制(记住),那么查询优化器可能会选择不使用索引...更新统计信息将使它们更准确,但如果值的分布是优化器选择的不使用索引,那么你可能会运气不好......

了解索引的工作原理在这里很有用。索引是节点的树形结构,其中每个节点(从根节点开始)包含的信息允许查询处理器根据它正在“搜索”的值来确定下一步要转到树的哪个分支。它类似于二叉树,不同之处在于数据库中的树不是二叉树,每个级别的每个节点下可能有 2 个以上的分支。

因此,对于索引,要从根到叶级别遍历索引,需要处理器为索引层次结构中的每个级别读取一次索引。 (例如,如果索引的深度为 5 级,则它需要为其搜索的每条记录执行 5 次 I/O 操作。

因此,在本例中,假设查询需要检查索引中超过大约 20% 的记录表,(基于您要搜索的列的值分布),那么查询优化器会对自己说,“自己,找到 20% 的记录,每个记录搜索 5 个 I/O,是相等的与读取整个表的 I/O 数量相同。”,因此它只是忽略索引并执行表扫描。

除了在查询中添加附加条件以进一步限制记录数量之外,实际上没有办法避免这种情况查询必须检查才能生成结果......

Examine the statistics on these two columns on the table (How the data values for the columns are distributed among all the rows). This will propbably explain the difference... One of these columns may have a distribution of values that could cause the query, in processsing, to need to examine a substantially higher number of rows than would be required by the other query, (The number or rows updated is controlled by the Top 1 part remember) then it is possible that the query optimizer will choose not to use the index... Updating statistics will make them more accurate, but if the distribution of values is such that the optimizer chooses not to use the index, then you may be out of luck...

Understanding how indices work is useful here. An index is a tree-structure of nodes, where each node (starting with a root node) contains information that allows the query processor to determine which branch of the tree to go to next, based on the value it is "searching" for. It is analogous to a binary-Tree except that in databases the trees are not binary, at each level there may be more than 2 branches below each node.

So, for an index, to traverse the index, from the root to the leaf level, requires that the processor read the index once for each level in the index hiearchy. (if the index is 5 levels deep for example, it needs to do 5 I/O operations for each record it searches for.

So in this example, say, if the query need to examine more than approximately 20% of the records in the table, (based on the value distribution of the column you are searching against), then the query optimizer will say to itself, "self, to find 20% of the records, with five I/O s per each record search, is equal to the same number of I/Os as reading the entire table.", so it just ignores the index and does a Table scan.

There's really no way to avoid this except by adding additonal criteria to your query to furthur restrict the number of records the query must examine to generate it's results....

青朷 2024-09-11 22:41:18

尝试更新您的统计数据。如果这没有帮助,请尝试重建索引。有可能每列数据的基数差异较大,导致选择不同的执行计划。

Try updating your statistics. If that does not help try rebuilding your indexes. It is possible that the cardinality of the data in each column is quite different, causing different execution plans to be selected.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文